Token Classification Datasets
There are 5 token classification datasets in our directory. Each links to its source, paper, and download — browse the full list below or filter by language.
Token Classification is the task of labelling individual tokens in a sequence, used for tasks like part-of-speech tagging. We catalog 5 datasets for it.
Updated June 2026
- aps/super_glueText Classification, Token Classification, Question AnsweringEN
- Open-Orca/OpenOrcaText Classification, Token Classification, Table Question Answering, Question Answering, Zero Shot Classification, Summarization, Feature Extraction, Text GenerationEN
- proj-persona/PersonaHubText Generation, Text Classification, Token Classification, Fill Mask, Table Question AnsweringEN, ZH
- ade-benchmark-corpus/ade_corpus_v2Text Classification, Token ClassificationEN
- Open-Orca/SlimOrcaText Classification, Token Classification, Table Question Answering, Question Answering, Zero Shot Classification, Summarization, Feature Extraction, Text GenerationEN