Skip to content

allenai/dolma

Text GenerationENodc-by

Allenai/dolma is a text generation dataset in EN from allenai in Parquet format. It is distributed under the odc-by license and falls in the n>1T size category, and has been downloaded 4.4K times.

About allenai/dolma

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Size
n>1T
Creator
allenai
Year
2023
License
odc-by
Downloads
4449
Likes
1048
Download Homepage

Related Text Generation datasets

FAQ