Skip to content

jat-project/jat-dataset-tokenized

General NLPEnglish

Jat-project/jat-dataset-tokenized is a General NLP-focused dataset in English that provides 32,006,524 labeled examples distributed in Parquet format. And falls in the 10M<n<100M size category, and has been downloaded 324.4K times.

About jat-project/jat-dataset-tokenized

# Dataset Card for "jat-dataset-tokenized" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
32006524
Size
10M<n<100M
Creator
jat-project
Year
2026
Downloads
324388
Likes
15
Download Homepage

Related General NLP datasets

FAQ