Versions and changes
(upcoming)
- The GateWorkerAnnotator parameters have been changed: instead of parameters gatehom and port,
the parameter gateworker now needs to receive a GateWorker instance.
Also the
update_document
parameter has been added and now allows both updating and replacing the Python document from the Java GATE document - Issue #66: make it possible to show annotations over new-lines in the html ann viewer
- Issue #65: provide ParagraphTokenizer and SplitPatternTokenizer to easily annotate paragraphs and other spans separated by some split pattern
- Issue #73: pickle document with offset index created
- Issue #68: rename the main development branch from "master" to "main"
- Issue #74: fix a bug in PAMPAC related to matching an annotation after some text
- Various improvements, additions and bug fixes in Pampac
- Issue #75: GateWorker now shows any Java exception when starting the Java process fails
- Issue #76: GateWorker has a new method
loadPipelineFromUri(uri)
- Issue #77: GateWorkerAnnotator now automatically loads a pipeline from a URL if the string
passed to the
pipeline
parameter looks like a URL or if it is the result of urllib.parse.urlparse. It is always treated like a file if it is a pathlib.Path - added the
Actions
action for Pampac to recursively wrap several actions into one - allow each Rule to have any number of actions, change signature to
Rule(patter, *actions, priority=0)
- The Pampac AddAnn action does not require a value for the name parameter any more, if not specified, the full span of the match is used.
- New method
add_anns(anniterable)
to add an iterable of annotations to a set - The document viewer now also works in Google Colab
- The GateWorker can now be used as context manager:
with GateWorker() as gw:
1.0.3.1 (2021-03-01)
- add training course slides
- fix issue #63: could not import html document from a local file
1.0.3 (2021-02-22)
- Fix issues with logging and error handling in executor module
- Improve/add/change document sources/destination JsonLinesFile
- add
Span.embed
method - Implement multi-word tokens (MWTs) for the Stanza annotator
- Add support for space tokens for the Stanza annotator
- Support showing annotations over trailing spaces in the html ann viewer
- Add the
Document.attach(annset)
method (mostly for internal use only!) - Add the ConllUFileSource to import CoNLL-U corpora
- Fix a problem in the html ann viewer where unnecessary spans were created
- Add option to the
Document.show()
method to style the document text div
1.0.2 (2021-02-09)
- Fix issue #56: Rename GateSlave to GateWorker
1.0.1 (2021-02-07)
- Initial release ~