Skip to content

allenai/olmo-mix-1124

Text GenerationENodc-by

Allenai/olmo-mix-1124 is a text generation dataset in EN from allenai in Parquet format. It is distributed under the odc-by license and falls in the 100M<n<1B size category, and has been downloaded 57.2K times.

About allenai/olmo-mix-1124

OLMo 2 (November 2024) Pretraining set Collection of data used to train OLMo-2-1124 models. The majority of this dataset comes from DCLM-Baseline with no additional filtering, but we provide the explicit breakdowns below. Name Tokens Byte...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
allenai
Year
2024
License
odc-by
Downloads
57213
Likes
88
Download Homepage

Related Text Generation datasets

FAQ