Congliu/Chinese-DeepSeek-R1-Distill-data-110k
Text GenerationQuestion AnsweringZHapache-2.0
Created by Congliu at 2025, the Congliu/Chinese-DeepSeek-R1-Distill-data-110k is a text generation dataset in ZH in Parquet format. With 800 downloads and 764 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 100K<n<1M-scale dataset.
About Congliu/Chinese-DeepSeek-R1-Distill-data-110k
中文基于满血DeepSeek-R1蒸馏数据集(Chinese-Data-Distill-From-R1)
🤗 Hugging Face | 🤖 ModelScope | 🚀 Github | 📑 Blog
注意:提供了直接SFT使用的版本,点击下载。将数据中的思考和答案整合成output字段,大部分SFT代码框架均可直接直接加载训练。
本数据集为中文开源蒸馏满血R1的数据集,数据集中不仅包含math数据,还包括大量的通用类型数据,总数量为110K。
...
Details
- Task
- Text Generation, Question Answering
- Language
- ZH
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100K<n<1M
- Creator
- Congliu
- Year
- 2025
- License
- apache-2.0
- Downloads
- 800
- Likes
- 764