Skip to content

nvidia/Nemotron-Pretraining-SFT-v1

Text GenerationEnglish

Nvidia/Nemotron-Pretraining-SFT-v1 is a text generation-focused dataset in English distributed in Parquet format.

About nvidia/Nemotron-Pretraining-SFT-v1

Nemotron-Pre-Training-Dataset-v1 Release Data Overview This pretraining dataset, for generative AI model training, preserves high-value math and code while enriching it with diverse multilingual Q&A, fueling the next generation of in...

Details

Task
Text Generation
Language
English
Format
Parquet
Rows / instances
N/A
Creator
nvidia
Year
2025
Download

Related Text Generation datasets

FAQ