Skip to content

allenai/MADLAD-400

Text GenerationEnglishodc-by

The allenai/MADLAD-400 dataset is a English text generation resource from allenai at 2023. With 45.1K downloads and 170 likes, it is actively used by the community. It is released under the odc-by license and is a n>1T-scale dataset.

About allenai/MADLAD-400

MADLAD-400 Dataset and Introduction MADLAD-400 (Multilingual Audited Dataset: Low-resource And Document-level) is a document-level multilingual dataset based on Common Crawl, covering 419 languages in total. This uses all snapshots o...

Details

Task
Text Generation
Language
English
Format
Parquet
Rows / instances
N/A
Size
n>1T
Creator
allenai
Year
2023
License
odc-by
Downloads
45129
Likes
170
Download Homepage

Related Text Generation datasets

FAQ