HuggingFaceTB/smoltalk
General NLPEN
HuggingFaceTB/smoltalk is a General NLP dataset in EN from HuggingFaceTB in Parquet format. And falls in the 1M<n<10M size category, and has been downloaded 18.1K times.
About HuggingFaceTB/smoltalk
SmolTalk
Dataset description
This is a synthetic dataset designed for supervised finetuning (SFT) of LLMs. It was used to build SmolLM2-Instruct family of models and contains 1M samples. More details in our paper https://arxiv.org/a...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 1M<n<10M
- Creator
- HuggingFaceTB
- Year
- 2024
- Downloads
- 18134
- Likes
- 415