Skip to content

NCBI Disease Corpus

Information ExtractionNamed Entity Recognition (NER)English

NCBI Disease Corpus is a information extraction dataset in English from Dogan et al. with 6,892 records in Text format.

About NCBI Disease Corpus

Dataset contains 6,892 disease mentions, which are mapped to 790 unique disease concepts. Of these, 88% link to a MeSH identifier, while the rest contain an OMIM identifier.

Details

Task
Information Extraction, Named Entity Recognition (NER)
Language
English
Format
Text
Rows / instances
6,892
Creator
Dogan et al.
Year
2014
Download Paper

Related Information Extraction, Named Entity Recognition (NER) datasets

FAQ