Skip to content

m-a-p/PIN-200M

General NLPEN, ZH

M-a-p/PIN-200M is a General NLP dataset in EN, ZH from m-a-p in Parquet format.

About m-a-p/PIN-200M

PIN-200M A mini version of "PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents" Paper: https://arxiv.org/abs/2406.13923 This dataset contains around 200M samples in PIN format, with around 312 TB storage....

Details

Task
General NLP
Language
EN, ZH
Format
Parquet
Rows / instances
N/A
Creator
m-a-p
Year
2026
Download

Related General NLP datasets

FAQ