adams-story/datacomp200m
General NLPEnglish
The adams-story/datacomp200m dataset is a English General NLP resource from adams-story at 2026. With 22.3K downloads and 2 likes, it is actively used by the community and is a 100M<n<1B-scale dataset.
About adams-story/datacomp200m
Datacomp200m
This is a smaller version of the datacomp_1b dataset.
Filtering was done by taking all rows that had self similarity (inner product) above 0.32. This resulted in 213009083 (213 million) rows.
The results of the datacomp p...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100M<n<1B
- Creator
- adams-story
- Year
- 2026
- Downloads
- 22314
- Likes
- 2