HuggingFaceTB/cosmopedia
General NLPENapache-2.0
The HuggingFaceTB/cosmopedia dataset is a EN General NLP resource from HuggingFaceTB at 2024 comprising 31,064,744 examples. With 19K downloads and 721 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 10M<n<100M-scale dataset.
About HuggingFaceTB/cosmopedia
Cosmopedia v0.1
Image generated by DALL-E, the prompt was generated by Mixtral-8x7B-Instruct-v0.1
Note: Cosmopedia v0.2 is available at smollm-corpus
User: What do you think "Cosmopedia" could mean? Hint: in our case it's not relate...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- 31064744
- Size
- 10M<n<100M
- Creator
- HuggingFaceTB
- Year
- 2024
- License
- apache-2.0
- Downloads
- 19030
- Likes
- 721