Spacy pipeline

If gatenlp has been installed with the spacy extra (pip install gatenlp[spacy] or pip install gatenlp[all]) you can run a Spacy pipeline on a document and get the result as gatenlp annotations.

from gatenlp import Document
from gatenlp.lib_spacy import AnnSpacy
import spacy

print("SpaCy version:", spacy.__version__)
SpaCy version: 3.3.1
# In order to use the English pipeline with Spacy, the model has to get downloaded first
from spacy.cli import download as spacy_download
spacy_download("en_core_web_sm")

Collecting en-core-web-sm==3.3.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0-py3-none-any.whl (12.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.8/12.8 MB 12.2 MB/s eta 0:00:00
Requirement already satisfied: spacy<3.4.0,>=3.3.0.dev0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from en-core-web-sm==3.3.0) (3.3.1)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (1.8.2)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (1.0.6)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (3.0.6)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (1.0.1)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (4.62.3)
Requirement already satisfied: wasabi<1.1.0,>=0.9.1 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (0.9.1)
Requirement already satisfied: packaging>=20.0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (21.3)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (2.0.6)
Requirement already satisfied: typing-extensions<4.2.0,>=3.7.4 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (3.10.0.2)
Requirement already satisfied: pathy>=0.3.5 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (0.6.1)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (2.0.6)
Requirement already satisfied: jinja2 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (3.1.2)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.9 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (3.0.9)
Requirement already satisfied: typer<0.5.0,>=0.3.0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (0.4.1)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (2.4.3)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (2.26.0)
Requirement already satisfied: setuptools in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (50.3.0.post20201006)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (3.3.0)
Requirement already satisfied: thinc<8.1.0,>=8.0.14 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (8.0.17)
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (0.7.5)
Requirement already satisfied: numpy>=1.15.0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (1.21.4)
Requirement already satisfied: zipp>=0.5 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from catalogue<2.1.0,>=2.0.6->spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (3.6.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from packaging>=20.0->spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (3.0.6)
Requirement already satisfied: smart-open<6.0.0,>=5.0.0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from pathy>=0.3.5->spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (5.2.1)
Requirement already satisfied: idna<4,>=2.5 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (3.3)
Requirement already satisfied: certifi>=2017.4.17 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (2020.6.20)
Requirement already satisfied: charset-normalizer~=2.0.0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (2.0.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (1.26.7)
Requirement already satisfied: click<9.0.0,>=7.1.1 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from typer<0.5.0,>=0.3.0->spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (8.1.3)
Requirement already satisfied: MarkupSafe>=2.0 in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from jinja2->spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (2.0.1)
Requirement already satisfied: importlib-metadata in /disk10t1/data/johann/software/anaconda/envs/gatenlp-37/lib/python3.7/site-packages (from click<9.0.0,>=7.1.1->typer<0.5.0,>=0.3.0->spacy<3.4.0,>=3.3.0.dev0->en-core-web-sm==3.3.0) (4.8.2)
Installing collected packages: en-core-web-sm
  Attempting uninstall: en-core-web-sm
    Found existing installation: en-core-web-sm 3.2.0
    Uninstalling en-core-web-sm-3.2.0:
      Successfully uninstalled en-core-web-sm-3.2.0
Successfully installed en-core-web-sm-3.3.0

[notice] A new release of pip available: 22.1.2 -> 22.3.1
[notice] To update, run: pip install --upgrade pip
✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')
doc = Document.load("https://gatenlp.github.io/python-gatenlp/testdocument2.txt")
doc

Annotating the document using Spacy

In order to annotate one or more documents using Spacy, first create a AnnSpacy annotator object and the run the document(s) through this annotator:

spacy_pipeline = spacy.load("en_core_web_sm")
spacy_annotator = AnnSpacy(pipeline=spacy_pipeline)
doc = spacy_annotator(doc)
doc

Notebook last updated

import gatenlp
print("NB last updated with gatenlp version", gatenlp.__version__)
NB last updated with gatenlp version 1.0.8a1