Skip to content

module__TyDIProjector

Robert Bossy edited this page Jul 27, 2017 · 1 revision

#org.bibliome.alvisnlp.modules.projectors.TyDIProjector

Synopsis

Projects terms from a TiDI export.

This module is obsolete, superceded by org.bibliome.alvisnlp.modules.trie.TyDIExportProjector

Description

org.bibliome.alvisnlp.modules.projectors.TyDIProjector reads different files from a TyDI text export, resolves all synonymies and projects the terms into sections.

The parameters lemmaFile, synonymsFile, quasiSynonymsFile, acronymsFile and typographicVariationsFile point to the paths to the corresponding TyDI file export.

The parameters normalizeSpace, ignoreCase, ignoreDiacritics and ignoreWhitespace control the matching of entries on the sections.

The subject parameter specifies which text of the section should be matched. There are two options:

  • the entries are matched on the contents of the section, subject can also control if matches boundaries coincide with word delimiters;
  • the entries are matched on the feature value of annotations of a given layer separated by a whitespace, in this way entries can be searched against word lemmas for instance.

org.bibliome.alvisnlp.modules.projectors.TyDIProjector creates an annotation for each matched entry and adds these annotations to the layer named targetLayerName. The created annotations will have a feature named canonicalFormFeature containing the canonical form of the matched term. In addition, the created annotations will have the feature keys and values defined in constantAnnotationFeatures.

Parameters

Optional

Type: SourceStream

Path to the file containing lemmas.

Optional

Type: SourceStream

Path to the merged terms file.

Optional

Type: SourceStream

Path to the quasi-synonyms file.

Optional

Type: SourceStream

Path to the synonyms file.

Optional

Type: String

Name of the layer where to put match annotations.

Optional

Type: SourceStream

Path to the acronyms file.

Optional

Type: Mapping

Constant features to add to each annotation created by this module

Optional

Type: TargetStream

Path of the file where to save the dictionary.

Optional

Type: SourceStream

Path to the typographic variations file.

Default value: lemma

Type: String

Feature where to store the term canonical form.

Default value: true

Type: Expression

Only process document that satisfy this filter.

Default value: false

Type: Boolean

Either to stop when a duplicate entry is seen.

Default value: false

Type: Boolean

Match ignoring case.

Default value: false

Type: Boolean

Match ignoring diacritics.

Default value: false

Type: Boolean

Match ignoring whitespace characters.

Default value: add

Type: MultipleValueAction

Either to stop when multiple entries with the same key is seen.

Default value: false

Type: Boolean

Match normalizing whitespace.

Default value: true

Type: Expression

Process only sections that satisfy this filter.

Default value: org.bibliome.alvisnlp.modules.projectors.ContentsSubject@3ce1e309

Type: Subject

Subject on which to project the dictionary.

Clone this wiki locally