Skip to content

module__org.bibliome.alvisnlp.modules.treetagger.TreeTaggerReader

Robert Bossy edited this page Jul 27, 2017 · 1 revision

#org.bibliome.alvisnlp.modules.treetagger.TreeTaggerReader

Synopsis

Read files in tree-tagger output format and creates a document for each file read.

Description

Each document contains a single section named sectionName; its contents is constructed by concatenating the first column of each token separated with a space character.

org.bibliome.alvisnlp.modules.treetagger.TreeTaggerReader keeps the tree-tagger tokenization in annotations added into the layer wordLayerName. The POS tag and lemma are recorded in the annotation's posFeatureKey and lemmaFeatureKey features respectively.

The document identifier is the path of the corresponding file.

Parameters

Optional

Type: String

Name of the section of each document.

Optional

Type: SourceStream

Path to the source directory or source file.

Optional

Type: Mapping

Constant features to add to each annotation created by this module

Optional

Type: Mapping

Constant features to add to each document created by this module

Optional

Type: Mapping

Constant features to add to each section created by this module

Optional

Type: String

Name of the feature where to store word lemmas.

Optional

Type: String

Name of the feature where to store word POS tags.

Default value: UTF-8

Type: String

Character set of input files.

Default value: sentences

Type: String

Name of the layer where to store sentence annotations.

Default value: words

Type: String

Name of the layer where to store word annotations.

Clone this wiki locally