Skip to content

German parliamentary debates, manually annotated for subjective expressions and their opinion roles (ORL)

Notifications You must be signed in to change notification settings

umanlp/GePaDe-ORL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GePaDe-ORL

This repository contains the data and supplementary materials for our ParlaCLARIN-2024 paper.

Annotation example

The repository contains the GePaDe-ORL corpus, with manual annotations of subjective expressions and their opinion holders and targets.

Data

The data is available in json format.

The json dictionary includes the annotations for 13,222 sentences/clauses with 3,322 subjective expressions. For each sentence, we add the list of tokens (word forms) and lemmas (automatically predicted using [spacy]()) and an annotation dictionary that encodes whether this sentence includes a subjective expression and, if true, the token position of the subjective expression, its view (either Agent, Patient or Speaker view) and a list with role annotations for each sentence token.

Example:

   "20003_Zusatzpunkt_2_FDP_Brandenburg_ID20306600_18.11.2021-5": {
      "words": [
         "Sie",
         "litten",
         "oftmals",
         "unter",
         "sozialer",
         "Isolation",
         "und",
         "unter",
         "Bewegungsmangel",
         "."
      ],
      "lemmas": [
         "sie",
         "leiden",
         "oftmals",
         "unter",
         "sozial",
         "Isolation",
         "und",
         "unter",
         "Bewegungsmangel",
         "--"
      ],
      "annotations": {
         "1": {
            "predicate": "SE-A",
            "roles": [
               "B-Holder",
               "B-V",
               "_",
               "B-Target",
               "I-Target",
               "I-Target",
               "I-Target",
               "I-Target",
               "I-Target",
               "_"
            ]
         }
      }
   }

The example above encodes a sentence where "leiden" (suffer) triggers a subjective expression with Agent view (Agent view: the agent of the sentence is the opion holder while the Patient encodes the target of the opinion). The key of the "annotations" dictionary points to the token at position "1" (the verb "leiden") and the role list states for each sentence token whether it fills a role for the respective subjective expression or not. We use the BIO scheme to mark the beginning of a multiword role. "B-V" marks the position of the subjective expression.

The table below shows the distribution of views and labels in the corpus. The annotation guidelines (in German) can be found here. Examples for the additional labels (Effect, Other) are included in the paper.

Agent Patient Speaker Total
SE2,3251388593,322
Roles (all)4,5942781,5036,375
Target2,422 109 752 3,283
Holder 1,998 116 12 2,126
Other 1 0 643 644
PTC 142 4 53 199
SVC 31 5 38 74
Effect 0 44 5 49

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Referencing

If you're using this data, please cite the following paper:

@InProceedings{rehbein-ponzetto-2024-gepade_orl,
  author    = {Ines Rehbein  and  Ponzetto, Simone Paolo},
  title     = {A New Resource and Baselines for Opinion Role Labelling in German Parliamentary Debates},
  booktitle = {Proceedings of the ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora},
  month     = {May},
  year      = {2024},
  address   = {Torino, Italia},
  publisher = {Association for Computational Linguistics},
  url       = {http://www.aclweb.org/anthology/}
}

About

German parliamentary debates, manually annotated for subjective expressions and their opinion roles (ORL)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published