Skip to content

Software and data accompanying paper Neural Networks for Featureless Named Entity Recognition in Czech

License

Notifications You must be signed in to change notification settings

strakova/ner_tsd2016

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Software and data accompanying paper Neural Networks for Featureless Named Entity Recognition in Czech

This repository contains the source code and data used in the following paper:

The repository contains:

  • training scripts (Perl pipeline and NN implemented in Lua using Torch)
  • all versions of CNEC corpus (CNEC 1.0, CNEC 1.1, CNEC 1.1 Konkol's Extended, CNEC 2.0, CNEC 2.0 Konkol's Extended)
  • (the English NER CoNLL 2013 corpus must be copied to data/CoNLL2003_English/ because of licensing issues)
  • scripts used to generate Czech and English word embeddings
  • the gazetteers for Czech and English
  • various preprocessing tools

In order to run the pipeline, you have to:

  1. Compute the word embeddings using the scripts in word-embeddings/ directory. In addition to downloading the data, you will need Czech and English POS tagger and lemmatizer models czech-morfflex-pdt-131112 and english-morphium-wsj-140407.
  2. You need to preprocess the NER corpus you wish to use using the utils/make_data.sh script. This script also need the above POS tagger and lemmatizer models. Note that the script uses hardcoded paths to the models.
  3. In order to start the training, run src/train_all.sh. By default, the script trains all NER corpora on all configurations, so you should choose only the ones you are interested in. Note that the src/precompute_data.sh script use hardcoded paths of word embeddings.

If you find the software useful, please cite the paper:

@Inbook{Strakova2016,
  author="Strakov{\'a}, Jana and Straka, Milan and Haji{\v{c}}, Jan",
  editor="Sojka, Petr and Hor{\'a}k, Ale{\v{s}} and Kope{\v{c}}ek, Ivan and Pala, Karel",
  title="Neural Networks for Featureless Named Entity Recognition in Czech",
  bookTitle="Text, Speech, and Dialogue: 19th International Conference, TSD 2016, Brno , Czech Republic, September 12-16, 2016, Proceedings",
  year="2016",
  publisher="Springer International Publishing",
  address="Cham",
  pages="173--181",
  isbn="978-3-319-45510-5",
  doi="10.1007/978-3-319-45510-5_20",
  url="http://dx.doi.org/10.1007/978-3-319-45510-5_20"
}

About

Software and data accompanying paper Neural Networks for Featureless Named Entity Recognition in Czech

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published