Skip to content

OpenGVLab/OmniCorpus-CC-210M

Image To TextVisual Question AnsweringEN

OpenGVLab/OmniCorpus-CC-210M is a image to text dataset in EN from OpenGVLab in Parquet format.

About OpenGVLab/OmniCorpus-CC-210M

🐳 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text This repository contains 210 million image-text interleaved documents filtered from the OmniCorpus-CC dataset, which was sourced from Common Crawl. Repo...

Details

Task
Image To Text, Visual Question Answering
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
OpenGVLab
Year
2024
Download

Related Image To Text, Visual Question Answering datasets

FAQ