allenai/olmo-mix-1124
Text GenerationENodc-by
Allenai/olmo-mix-1124 is a text generation dataset in EN from allenai in Parquet format. It is distributed under the odc-by license and falls in the 100M<n<1B size category, and has been downloaded 57.2K times.
About allenai/olmo-mix-1124
OLMo 2 (November 2024) Pretraining set
Collection of data used to train OLMo-2-1124 models. The majority of this dataset comes from DCLM-Baseline with no additional filtering, but we provide the explicit breakdowns below.
Name
Tokens
Byte...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100M<n<1B
- Creator
- allenai
- Year
- 2024
- License
- odc-by
- Downloads
- 57213
- Likes
- 88