Skip to content

allenai/dolma3_longmino_mix-100B-1125

General NLPENodc-by

Allenai/dolma3_longmino_mix-100B-1125 is a General NLP dataset in EN from allenai in Parquet format. It is distributed under the odc-by license, and has been downloaded 18.3K times.

About allenai/dolma3_longmino_mix-100B-1125

Dolma 3 Longmino Mix (100B) The Dolma 3 Longmino Mix (100B) is the mixture of data used for the third stage of training for Olmo 3 32B model. Dataset Sources Source Type LC-s2pdf-REX 32k-64k Synth PDFs LC-s2pdf-CWE 32k-...

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
allenai
Year
2025
License
odc-by
Downloads
18313
Likes
17
Download Homepage

Related General NLP datasets

FAQ