Corpora for Chemical Entity Recognition

Download

Corpus in IOB Format, gzipped, original version used in the paper [Kolarik et al., 2008]

Original version as used for experiments described in the paper

Corpus in IOB Format, gzipped [Kolarik et al., 2008]

Reannotated version, small differences to the version used for the paper

Corpus in IOB Format, gzipped Version 3 [Kolarik et al., 2008]

Reannotated version in comparison to the paper, offset errors corrected

Training Corpus for IUPAC and IUPAC-like Chemical Names [Klinger et al., 2008]

Training Corpus of 463 abstracts with 3712 IUPAC entities, 321 partial entities and 1039 MODIFIER entities.

Sampled Test Corpus for IUPAC and IUPAC-like Chemical Names [Klinger et al., 2008]

Sampled Test Corpus of 1000 abstracts (included as full text in header of each instance). It includes 151 IUPAC entities.

Literature

[Kolarik et al. 2008] Corinna Kolářik, Roman Klinger, Christoph M. Friedrich, Martin Hofmann-Apitius, and Juliane Fluck. Chemical Names: Terminological Resources and Corpora Annotation. In Workshop on Building and evaluating resources for biomedical text mining (6th edition of the Language Resources and Evaluation Conference), Marrakech, Morocco, 2008

[Klinger et al. 2008] Roman Klinger, Corinna Kolářik, Juliane Fluck, Martin Hofmann-Apitius, and Christoph M. Friedrich. Detection of IUPAC and IUPAC-like Chemical Names. Bioinformatics, 24(13):i268-i276, 2008.

Corpora for Named Entity Recognition of Chemical Compounds

Download