Package opennlp.tools.formats.masc
Class MascDocument
java.lang.Object
opennlp.tools.formats.masc.MascDocument
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionbooleanChecks whether there is NER by GATE-5.0 ANNIE.booleanChecks whether there is Penn tagging produced by GATE-5.0 ANNIE.static MascDocumentparseDocument(String path, InputStream f_primary, InputStream f_seg, InputStream f_penn, InputStream f_s, InputStream f_ne) Initializes aMascDocumentwith all the stand-off annotations translated into the internal structure.read()voidreset()Resets the reading of sentences to the beginning of the document.
-
Constructor Details
-
MascDocument
-
-
Method Details
-
parseDocument
public static MascDocument parseDocument(String path, InputStream f_primary, InputStream f_seg, InputStream f_penn, InputStream f_s, InputStream f_ne) throws IOException Initializes aMascDocumentwith all the stand-off annotations translated into the internal structure.- Parameters:
path- The path where the document header is.f_primary- Thefilewith the raw corpus text.f_seg- Thefilewith segmentation into quarks.f_ne- Thefilewith named entities.f_penn- Thefilewith tokenization and Penn POS tags produced by GATE-5.0 ANNIE application.f_s- Thefilewith sentence boundaries.- Returns:
- A document containing the text and its annotations. Immutability is not guaranteed yet.
- Throws:
IOException- if the raw data cannot be read or the alignment of the raw data with annotations fails
-
hasPennTags
public boolean hasPennTags()Checks whether there is Penn tagging produced by GATE-5.0 ANNIE.- Returns:
trueif this file has aligned tags/tokens,falseotherwise.
-
hasNamedEntities
public boolean hasNamedEntities()Checks whether there is NER by GATE-5.0 ANNIE.- Returns:
trueif this file has named entities,falseotherwise.
-
read
- Returns:
- Retrieves the next sentence or
nullif end of document reached.
-
reset
public void reset()Resets the reading of sentences to the beginning of the document.
-