Spawning/PD12M
General NLPEN
Spawning/PD12M is a General NLP dataset in EN from Spawning in Parquet format.
About Spawning/PD12M
PD12M
Summary
At 12.4 million image-caption pairs, PD12M is the largest public domain image-text dataset to date, with sufficient size to train foundation models while minimizing copyright concerns. Through the Source.Plus platform,...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- Spawning
- Year
- 2024