airtrain-ai/fineweb-edu-fortified
Text GenerationENodc-by
Airtrain-ai/fineweb-edu-fortified is a text generation-focused dataset in EN that provides 322,250,000 labeled examples distributed in Parquet format. It is distributed under the odc-by license and falls in the 100M<n<1B size category, and has been downloaded 107.9K times.
About airtrain-ai/fineweb-edu-fortified
Fineweb-Edu-Fortified
The composition of fineweb-edu-fortified, produced by automatically clustering a 500k row sample in
Airtrain
What is it?
Fineweb-Edu-Fortified is a dataset derived from
Fineweb-Edu by applyi...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- 322250000
- Size
- 100M<n<1B
- Creator
- airtrain-ai
- Year
- 2026
- License
- odc-by
- Downloads
- 107896
- Likes
- 65