IDEA-CCNL/laion2B-multi-chinese-subset
Feature ExtractionZHcc-by-4.0
The IDEA-CCNL/laion2B-multi-chinese-subset dataset is a ZH feature extraction resource from IDEA-CCNL at 2022. With 249 downloads and 42 likes, it is actively used by the community. It is released under the cc-by-4.0 license and is a 10M<n<100M-scale dataset.
About IDEA-CCNL/laion2B-multi-chinese-subset
laion2B-multi-chinese-subset
Github: Fengshenbang-LM
Docs: Fengshenbang-Docs
简介 Brief Introduction
取自Laion2B多语言多模态数据集中的中文部分,一共143M个图文对。
A subset from Laion2B (a multimodal dataset), around 143M image-text pairs (only Chinese).
...
Details
- Task
- Feature Extraction
- Language
- ZH
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10M<n<100M
- Creator
- IDEA-CCNL
- Year
- 2022
- License
- cc-by-4.0
- Downloads
- 249
- Likes
- 42