GATE Cloud Annotator

The GateCloudAnnotator is an annotator that uses the GATE Cloud web service to annotate documents: for each document that gets processed, data is sent to a HTTP endpoint, processed there and information is sent back that is then used to annotate the document.

from gatenlp import Document
from gatenlp.processing.client.gatecloud import GateCloudAnnotator

Lets try annotating a document with the English Named Entity Recognizer on GATE cloud (https://cloud.gate.ac.uk/shopfront/displayItem/annie-named-entity-recognizer).

The information page for that service shows that the following annotation types can be requested of which the first 5 are requested by default if no alternate list is specified:

  • Address (included by default)
  • Date (included by default)
  • Location (included by default)
  • Organization (included by default)
  • Person (included by default)
  • Money
  • Percent
  • Token
  • SpaceToken
  • Sentence

We create a GateCloudAnnotator an specify the full list of all supported annotation types. We also specify the URL of the service endpoint as provided on the info page and specify that the annotations should be put into the annotation set "ANNIE". Note that a limited number of documents can be annotated for free and without authentication, so we do not need to specify the api_key and api_password parameters.

annotator = GateCloudAnnotator(
    url="https://cloud-api.gate.ac.uk/process-document/annie-named-entity-recognizer", 
    outset_name="ANNIE", 
    ann_types=":Address,:Date,:Location,:Organization,:Person,:Money,:Percent,:Token,:SpaceToken,:Sentence"
)
# an example document to annotate
doc = Document("Barack Obama visited Microsoft in New York last May.")
# Run the annotator and show the annotated document
doc = annotator(doc)
doc

Notebook last updated

import gatenlp
print("NB last updated with gatenlp version", gatenlp.__version__)
NB last updated with gatenlp version 1.0.8a1