Text Corpus for Disease Names and Adverse Effects
On this page, the corpus associated with the following publication is available:
Harsha Gurulingappa, Roman Klinger, Martin Hofmann-Apitius, and Juliane Fluck. An Empirical Evaluation of Resources for the Identification of Diseases and Adverse Effects in Biomedical Literature. In 2nd Workshop on Building and evaluating resources for biomedical text mining (7th edition of the Language Resources and Evaluation Conference), Valetta, Malta, May 2010
Annotated entity classes:
- DISEASE (for diseases)
- ADVERSE (for adverse effects)
Each Entry starts with a ### followed by its PMID number
The columns:
- Token
- Start Index
- End Index
- Full untokenized Entities
- Class (B-class|I-class|O)
- B- means: Beginning of an entity
- I- means: Continuation of an entity
- O means: None of the defined entities