French Datasets
We catalog 9 French datasets for NLP and machine learning. Browse the list below or narrow down by task.
This page covers French, a major world language with rich NLP and machine-translation resources. Our directory includes 9 datasets in French.
Updated June 2026
- CC100-FrenchText CorporaFrench
- CohereLabs/aya_redteamingGeneral NLPEN, HI, FR
- DiaBLaMachine Translation, DialogueFrench, English
- FQuADQuestion Answering, Reading ComprehensionFrench
- wanng/midjourney-v5-202304-cleanText To Image, Image To TextEN, FR
- nvidia/Nemotron-Personas-FranceText GenerationFR
- manu/project_gutenbergText GenerationFR, EN, ZH
- ministere-culture/comparia-conversationsGeneral NLPFR
- PleIAs/French-PD-NewspapersText GenerationFR