Question 1

What is the NCBI Disease Corpus dataset?

Accepted Answer

Dataset contains 6,892 disease mentions, which are mapped to 790 unique disease concepts. Of these, 88% link to a MeSH identifier, while the rest contain an OMIM identifier.

Question 2

Is NCBI Disease Corpus a benchmark?

Accepted Answer

NCBI Disease Corpus is a dataset for training or evaluation; it isn't tracked as a standard LLM benchmark in our catalog.

Question 3

Where can I download NCBI Disease Corpus?

Accepted Answer

NCBI Disease Corpus is available at its source: https://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/.

NCBI Disease Corpus

About NCBI Disease Corpus

Details

Related Information Extraction, Named Entity Recognition (NER) datasets

FAQ