Versions and changes

1.0.5 (2021-10-07)

Changes that break backwards compatibility:

AnnotationSet.with_type() previously returned a detached set with all annotations if no types were specified, this now returns a detached set with no annotations which is more logical.
API changes:
pam.pampac.actions.AddAnn: parameter anntype has been changed to type
The Feature() constructor kw arg logger has been changed to _change_logger and deepcopy has been changed to _deepcopy
Pampac: use the term "matches" instead of "data" for the named information stored for each named pattern that fits the document. A single one of these is often called "match info" and the index for a specific info is now called "matchidx" instead of "dataidx". See issue #89
Parameter spacetoken_type for AnnSpacy and spacy2gatenlp has been changed to space_token_type to conform to the parameter name used for AnnStanza and stanza2gatenlp.
Stanford Stanza support now requires Stanza version 1.3.0 or higher
Changes to lib_spacy: new parameter containing_anns to apply the spacy pipeline only to the part of the document covered by each of the annotations in the annotation set or iterator. New parameters component_cfg to specify a component config for Spacy and retrieve_spans to retrieve additional span types to retrieve.
Several bugfixes in Pampac.

Other changes and improvements:

New method AnnotationSet.create_from(anniterable) to create a detached, immutable annotation set from an iterable of annotations
New method Document.anns(annspec) creates a detached set of all annotations that match the specification
New method Document.yield_anns(annspec) yields all annotations which match the specification
Fixed bug in Token Gazetteer: issue #93
Pampac: there is now a PampacAnnotator class to simplify using Pampac in a pipeline.
Pampac: New parameter containing_anns for Pampac.run: if specified, runs the rules on each span of each of the containing annotations
Pampac: a Result is now an Iterable of match infos.
Pampac: the .within(..) .contains(..) etc. constraints now allow to use a separate annotation set, e.g. .within("Person", annset=doc.annset("Other")). See issue #57
Pampac: RemoveAnn action has been added
Pampac: UpdateAnnFeatures has been improved
Pampac: AddAnn action supports getter helpers in feature values
Span objects are now immutable. Equality and hashing of Span objects are based on their start and end offsets.
Annotation equality and hashing has been changed back to the Python default: variables compare only equal if they reference the same object and hashing is based on object identity. For comparing annotations by content, the methods ann.equal(other) (compare content without annotation id) and ann.same(other) (compare content including annotation id) have been implemented.
Documents can be saved in "tweet-v1" format
Fixed a problem with the HTML viewer: leading and multiple whitespace annotations now show correctly.

1.0.4 (2021-04-10)

The GateWorkerAnnotator parameters have been changed: instead of parameters gatehom and port, the parameter gateworker now needs to receive a GateWorker instance. Also the update_document parameter has been added and now allows both updating and replacing the Python document from the Java GATE document
Issue #66: make it possible to show annotations over new-lines in the html ann viewer
Issue #65: provide ParagraphTokenizer and SplitPatternTokenizer to easily annotate paragraphs and other spans separated by some split pattern
Issue #73: pickle document with offset index created
Issue #68: rename the main development branch from "master" to "main"
Issue #74: fix a bug in PAMPAC related to matching an annotation after some text
Various improvements, additions and bug fixes in Pampac
Issue #75: GateWorker now shows any Java exception when starting the Java process fails
Issue #76: GateWorker has a new method loadPipelineFromUri(uri)
Issue #77: GateWorkerAnnotator now automatically loads a pipeline from a URL if the string passed to the pipeline parameter looks like a URL or if it is the result of urllib.parse.urlparse. It is always treated like a file if it is a pathlib.Path
added the Actions action for Pampac to recursively wrap several actions into one
allow each Rule to have any number of actions, change signature to Rule(patter, *actions, priority=0)
The Pampac AddAnn action does not require a value for the name parameter any more, if not specified, the full span of the match is used.
New method add_anns(anniterable) to add an iterable of annotations to a set
The document viewer now also works in Google Colab
The GateWorker can now be used as context manager: with GateWorker() as gw:

1.0.3.1 (2021-03-01)

add training course slides
fix issue #63: could not import html document from a local file

1.0.3 (2021-02-22)

Fix issues with logging and error handling in executor module
Improve/add/change document sources/destination JsonLinesFile
add Span.embed method
Implement multi-word tokens (MWTs) for the Stanza annotator
Add support for space tokens for the Stanza annotator
Support showing annotations over trailing spaces in the html ann viewer
Add the Document.attach(annset) method (mostly for internal use only!)
Add the ConllUFileSource to import CoNLL-U corpora
Fix a problem in the html ann viewer where unnecessary spans were created
Add option to the Document.show() method to style the document text div

1.0.2 (2021-02-09)

Fix issue #56: Rename GateSlave to GateWorker

1.0.1 (2021-02-07)

Initial release ~

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search