Skip to content

AI4Chem/ChemData700K

General NLPEnglishmit

Created by AI4Chem at 2024, the AI4Chem/ChemData700K is a General NLP dataset in English containing 726,776 records in Parquet format. With 366 downloads and 33 likes, it is actively used by the community. It is released under the mit license and is a 100K<n<1M-scale dataset.

About AI4Chem/ChemData700K

Introduction ChemData is a large-scale chemistry competency instruction tuning dataset for language models, which includes nine chemistry core tasks and 730K high-quality questions and answers, sampled from 1/10 of 7 million pieces of data. Che...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
726776
Size
100K<n<1M
Creator
AI4Chem
Year
2024
License
mit
Downloads
366
Likes
33
Download Homepage

Related General NLP datasets

FAQ