Skip to content

airtrain-ai/fineweb-edu-fortified

Text GenerationENodc-by

Airtrain-ai/fineweb-edu-fortified is a text generation-focused dataset in EN that provides 322,250,000 labeled examples distributed in Parquet format. It is distributed under the odc-by license and falls in the 100M<n<1B size category, and has been downloaded 107.9K times.

About airtrain-ai/fineweb-edu-fortified

Fineweb-Edu-Fortified The composition of fineweb-edu-fortified, produced by automatically clustering a 500k row sample in Airtrain What is it? Fineweb-Edu-Fortified is a dataset derived from Fineweb-Edu by applyi...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
322250000
Size
100M<n<1B
Creator
airtrain-ai
Year
2026
License
odc-by
Downloads
107896
Likes
65
Download Homepage

Related Text Generation datasets

FAQ