jat-project/jat-dataset-tokenized
General NLPEnglish
Jat-project/jat-dataset-tokenized is a General NLP-focused dataset in English that provides 32,006,524 labeled examples distributed in Parquet format. And falls in the 10M<n<100M size category, and has been downloaded 324.4K times.
About jat-project/jat-dataset-tokenized
# Dataset Card for "jat-dataset-tokenized"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- 32006524
- Size
- 10M<n<100M
- Creator
- jat-project
- Year
- 2026
- Downloads
- 324388
- Likes
- 15