HuggingFaceTB/smol-smoltalk
General NLPENapache-2.0
HuggingFaceTB/smol-smoltalk is a General NLP dataset in EN from HuggingFaceTB with 484,570 records in Parquet format. It is distributed under the apache-2.0 license and falls in the 100K<n<1M size category, and has been downloaded 9.9K times.
About HuggingFaceTB/smol-smoltalk
Smol-SmalTalk
This is a subset of SmolTalk dataset adapted for smol models with less than 1B parameters. We used it to build SmolLM2-360M-Instruct and
SmolLM2-135M-Instruct. We do SFT on this dataset and then DPO on UltraFeedback.
Compared to ...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- 484570
- Size
- 100K<n<1M
- Creator
- HuggingFaceTB
- Year
- 2024
- License
- apache-2.0
- Downloads
- 9903
- Likes
- 104