Skip to content

google-research-datasets/conceptual_captions

Image To TextENBenchmark

Google-research-datasets/conceptual_captions is a image to text-focused benchmark dataset in EN that provides 8,675,436 labeled examples distributed in Parquet format. It is distributed under the other license and falls in the 1M<n<10M size category, and has been downloaded 15.5K times.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About google-research-datasets/conceptual_captions

Dataset Card for Conceptual Captions Dataset Summary Conceptual Captions is a dataset consisting of ~3.3M images annotated with captions. In contrast with the curated style of other image caption annotations, Conceptual Caption image...

Details

Task
Image To Text
Language
EN
Format
Parquet
Rows / instances
8675436
Size
1M<n<10M
Creator
google-research-datasets
Year
2022
License
other
Downloads
15489
Likes
107
Download Homepage

Related Image To Text datasets

FAQ