Skip to content

allenai/dolmino-mix-1124

Text GenerationENodc-by

The allenai/dolmino-mix-1124 dataset is a EN text generation resource from allenai at 2024. With 11.1K downloads and 97 likes, it is actively used by the community. It is released under the odc-by license and is a 100M<n<1B-scale dataset.

About allenai/dolmino-mix-1124

DOLMino dataset mix for OLMo2 stage 2 annealing training. Mixture of high-quality data used for the second stage of OLMo2 training. Source Sizes Name Category Tokens Bytes (uncompressed) Documents License DCLM HQ Web Page...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
allenai
Year
2024
License
odc-by
Downloads
11067
Likes
97
Download Homepage

Related Text Generation datasets

FAQ