Stanza pipeline
If gatenlp
has been installed with the stanza extra (pip install gatenlp[stanza]
or pip install gatenlp[all]
) you can run a Stanford Stanza pipeline on a document and get the result as gatenlp
annotations.
from gatenlp import Document
from gatenlp.lib_stanza import AnnStanza
import stanza
# In order to use the English pipeline with stanza, the model has to get downloaded first
stanza.download('en')
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.2.2.json: 0%| …
2021-09-12 20:25:46,231|INFO|stanza|Downloading default packages for language: en (English)...
Downloading http://nlp.stanford.edu/software/stanza/1.2.2/en/default.zip: 0%| | 0.00/412M [00:00<?,…
2021-09-12 20:28:44,342|INFO|stanza|Finished downloading models and saved to /home/johann/stanza_resources.
doc = Document.load("https://gatenlp.github.io/python-gatenlp/testdocument2.txt")
doc
Annotating the document using Stanza
In order to annotate one or more documents using Stanza, first create a AnnStanza annotator object and the run the document(s) through this annotator:
stanza_annotator = AnnStanza(lang="en")
2021-09-12 20:28:44,689|INFO|stanza|Loading these models for language: en (English):
=========================
| Processor | Package |
-------------------------
| tokenize | combined |
| pos | combined |
| lemma | combined |
| depparse | combined |
| sentiment | sstplus |
| ner | ontonotes |
=========================
2021-09-12 20:28:44,691|INFO|stanza|Use device: cpu
2021-09-12 20:28:44,692|INFO|stanza|Loading: tokenize
2021-09-12 20:28:44,697|INFO|stanza|Loading: pos
2021-09-12 20:28:44,954|INFO|stanza|Loading: lemma
2021-09-12 20:28:44,991|INFO|stanza|Loading: depparse
2021-09-12 20:28:45,369|INFO|stanza|Loading: sentiment
2021-09-12 20:28:45,766|INFO|stanza|Loading: ner
2021-09-12 20:28:46,371|INFO|stanza|Done loading processors!
doc = stanza_annotator(doc)
doc