AI4Chem/ChemData700K
General NLPEnglishmit
Created by AI4Chem at 2024, the AI4Chem/ChemData700K is a General NLP dataset in English containing 726,776 records in Parquet format. With 366 downloads and 33 likes, it is actively used by the community. It is released under the mit license and is a 100K<n<1M-scale dataset.
About AI4Chem/ChemData700K
Introduction
ChemData is a large-scale chemistry competency instruction tuning dataset for language models, which includes nine chemistry core tasks and 730K high-quality questions and answers, sampled from 1/10 of 7 million pieces of data.
Che...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- 726776
- Size
- 100K<n<1M
- Creator
- AI4Chem
- Year
- 2024
- License
- mit
- Downloads
- 366
- Likes
- 33