module__org.bibliome.alvisnlp.modules.treetagger.TreeTaggerReader

Jump to bottom

Robert Bossy edited this page Jul 27, 2017 · 1 revision

#org.bibliome.alvisnlp.modules.treetagger.TreeTaggerReader

Synopsis

Read files in tree-tagger output format and creates a document for each file read.

Description

Each document contains a single section named sectionName; its contents is constructed by concatenating the first column of each token separated with a space character.

org.bibliome.alvisnlp.modules.treetagger.TreeTaggerReader keeps the tree-tagger tokenization in annotations added into the layer wordLayerName. The POS tag and lemma are recorded in the annotation's posFeatureKey and lemmaFeatureKey features respectively.

The document identifier is the path of the corresponding file.

Parameters

sectionName

Optional

Type: String

Name of the section of each document.

sourcePath

Optional

Type: SourceStream

Path to the source directory or source file.

constantAnnotationFeatures

Optional

Constant features to add to each annotation created by this module

constantDocumentFeatures

Optional

Constant features to add to each document created by this module

constantSectionFeatures

Optional

Constant features to add to each section created by this module

lemmaFeatureKey

Optional

Type: String

Name of the feature where to store word lemmas.

posFeatureKey

Optional

Type: String

Name of the feature where to store word POS tags.

charset

Default value: UTF-8

Type: String

Character set of input files.

sentenceLayerName

Default value: sentences

Type: String

Name of the layer where to store sentence annotations.

wordLayerName

Default value: words

Type: String

Name of the layer where to store word annotations.

AlvisNLP/ML Wiki

User guides

Developer guides

Clone this wiki locally