Stanza pipeline

If gatenlp has been installed with the stanza extra (pip install gatenlp[stanza] or pip install gatenlp[all]) you can run a Stanford Stanza pipeline on a document and get the result as gatenlp annotations.

from gatenlp import Document
from gatenlp.lib_stanza import AnnStanza
import stanza

print("Stanza version:", stanza.__version__)
Stanza version: 1.3.0
# In order to use the English pipeline with stanza, the model has to get downloaded first
stanza.download('en')
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.3.0.json:   0%|   …


2022-11-09 22:01:30,835|INFO|stanza|Downloading default packages for language: en (English)...
2022-11-09 22:01:36,778|INFO|stanza|File exists: /data/johann/stanza_resources/en/default.zip.
2022-11-09 22:01:44,940|INFO|stanza|Finished downloading models and saved to /data/johann/stanza_resources.
doc = Document.load("https://gatenlp.github.io/python-gatenlp/testdocument2.txt")
doc

Annotating the document using Stanza

In order to annotate one or more documents using Stanza, first create a AnnStanza annotator object and the run the document(s) through this annotator:

stanza_annotator = AnnStanza(lang="en")
2022-11-09 22:01:45,098|INFO|stanza|Loading these models for language: en (English):
============================
| Processor    | Package   |
----------------------------
| tokenize     | combined  |
| pos          | combined  |
| lemma        | combined  |
| depparse     | combined  |
| sentiment    | sstplus   |
| constituency | wsj       |
| ner          | ontonotes |
============================

2022-11-09 22:01:45,121|INFO|stanza|Use device: gpu
2022-11-09 22:01:45,121|INFO|stanza|Loading: tokenize
2022-11-09 22:01:59,883|INFO|stanza|Loading: pos
2022-11-09 22:02:00,295|INFO|stanza|Loading: lemma
2022-11-09 22:02:00,528|INFO|stanza|Loading: depparse
2022-11-09 22:02:03,370|INFO|stanza|Loading: sentiment
2022-11-09 22:02:03,943|INFO|stanza|Loading: constituency
2022-11-09 22:02:04,823|INFO|stanza|Loading: ner
2022-11-09 22:02:08,313|INFO|stanza|Done loading processors!
doc = stanza_annotator(doc)
doc

Notebook last updated

import gatenlp
print("NB last updated with gatenlp version", gatenlp.__version__)
NB last updated with gatenlp version 1.0.8a1