Package gatenlp

The following classes are imported into the gatenlp package by default: Span, Annotation, AnnotationSet, ChangeLog, Document as well as GateNlpPr and interact for the GATE Python plugin.

Where to find other important classes:

Expand source code
"""
The following classes are imported into the gatenlp package by default: `gatenlp.span.Span`,
`gatenlp.annotation.Annotation`,
`gatenlp.annotation_set.AnnotationSet`, `gatenlp.changelog.ChangeLog`, `gatenlp.document.Document` as well
as `GateNlpPr` and `interact` for the GATE Python plugin.

Where to find other important classes:

* corpora, document sources, document destinations: in `gatenlp.corpora`
* `gatenlp.gateworker.gateworker.GateWorker`, `gatenlp.gateworker.gateworkerannotator.GateWorkerAnnotator`
   in `gatenlp.gateworker`
* `gatenlp.lib_spacy.AnnSpacy` in `gatenlp.lib_spacy`
* `gatenlp.lib_stanza.AnnStanza` in `gatenlp.lib_stanza`
* TODO: include all the others!
"""

# NOTE: do not place a comment at the end of the version assignment
# line since we parse that line in a shell script!
# __version__ = "0.9.9"
from gatenlp.version import __version__

try:
    import sortedcontainers
except Exception:
    import sys
    print(
        "ERROR: required package sortedcontainers cannot be imported!", file=sys.stderr
    )
    print(
        "Please install it, using e.g. 'pip install -U sortedcontainers'",
        file=sys.stderr,
    )
    sys.exit(1)
# TODO: check version of sortedcontainers (we have 2.1.0)

from gatenlp.utils import init_logger

logger = init_logger("gatenlp")
from gatenlp.span import Span
from gatenlp.annotation import Annotation
from gatenlp.annotation_set import AnnotationSet
from gatenlp.changelog import ChangeLog
from gatenlp.document import Document
from gatenlp.gate_interaction import _pr_decorator as GateNlpPr
from gatenlp.gate_interaction import interact

# Importing GateWorker or other classes which depend on any package other than sortedcontains will
# break the Python plugin!
# from gatenlp.gateworker import GateWorker, GateWorkerAnnotator


def init_notebook():   # pragma: no cover
    """
    Helper method to initialize a Jupyter or similar notebook.
    """
    from gatenlp.serialization.default_htmlannviewer import init_javascript
    from gatenlp.gatenlpconfig import gatenlpconfig

    init_javascript()
    gatenlpconfig.notebook_js_initialized = True

Sub-modules

gatenlp.annotation

Module for Annotation class which represents information about a span of text in a document.

gatenlp.annotation_set

Module for AnnotationSet class which represents a named collection of annotations which can arbitrarily overlap.

gatenlp.annotation_utils

Module defining several utility functions for annotating documents in various ways.

gatenlp.changelog

Module for ChangeLog class which represents a log of changes to any of the components of a Document: document features, annotations, annotation features.

gatenlp.changelog_consts

Module for defining the constants used in the changelog module

gatenlp.chunking

Module for chunking-related methods and annotators.

gatenlp.corpora

Module that defines base and implementation classes for representing document collections …

gatenlp.document

Module that implements the Document class for representing gatenlp documents with features and annotation sets.

gatenlp.features

Module that implements class Feature for representing features.

gatenlp.gate_interaction

Support for interacting between a GATE (java) process and a gatenlp (Python) process. This is used by the Java GATE Python plugin.

gatenlp.gatenlpconfig

Module that provides the class GatenlpConfig and the instance gatenlpconfig which stores various global configuration options.

gatenlp.gateworker

Module for interacting with a Java GATE process.

gatenlp.impl

This subpackage contains modules for (temporary) implementation of datastructures and algorithms needed. Some of these may get replaced by other …

gatenlp.lang

Subpackage for future language-specific resources and annotators

gatenlp.lib_spacy

Support for using spacy: convert from spacy to gatenlp documents and annotations.

gatenlp.lib_stanza

Support for using stanford stanza (see https://stanfordnlp.github.io/stanza/): convert from stanford Stanza output to gatenlp documents and annotations.

gatenlp.offsetmapper

Module that implements the OffsetMapper class for mapping between Java-style and Python-style string offsets. Java strings are represented as UTF16 …

gatenlp.pam

Subpackage for modules related to pattern matching.

gatenlp.processing

Package for annotators, and other things related to processing documents.

gatenlp.serialization

Subpackage for modules related to serialization.

gatenlp.span

Module for Span class

gatenlp.urlfileutils

Module for functions that help reading binary and textual data from either URLs or local files.

gatenlp.utils

Various utilities that could be useful in several modules.

gatenlp.version
gatenlp.visualization

Functions

def init_notebook()

Helper method to initialize a Jupyter or similar notebook.

Expand source code
def init_notebook():   # pragma: no cover
    """
    Helper method to initialize a Jupyter or similar notebook.
    """
    from gatenlp.serialization.default_htmlannviewer import init_javascript
    from gatenlp.gatenlpconfig import gatenlpconfig

    init_javascript()
    gatenlpconfig.notebook_js_initialized = True