Image To Text Datasets
There are 18 image to text datasets in our directory. Each links to its source, paper, and download — browse the full list below or filter by language.
Image To Text is the task of generating textual descriptions or captions from images. We catalog 18 datasets for it.
Updated June 2026
- 5CD-AI/Viet-Handwriting-OCR-v2Image To TextVI
- CLIPAMharic/AmharicCLIP-annotationImage To TextAM, EN
- jackyhate/text-to-image-2MText To Image, Image To Text, Image ClassificationEN
- kakaobrain/coyo-700mText To Image, Image To Text, Zero Shot ClassificationEN
- pixparse/pdfa-eng-wdsImage To TextEN
- danielnobbe/mr-rateImage To Text, Text To Image, Image ClassificationEnglish
- ProGamerGov/synthetic-dataset-1m-dalle3-high-quality-captionsText To Image, Image Classification, Image To Text, Image Text To Text, OtherEN
- tomg-group-umd/pixelproseImage To Text, Text To Image, Visual Question AnsweringEN
- nvidia/Llama-Nemotron-VLM-Dataset-v1Visual Question Answering, Image Text To Text, Image To TextEnglish
- poloclub/diffusiondbText To Image, Image To TextEN
- zzliang/GRITText To Image, Image To Text, Object Detection, Zero Shot ClassificationEN
- MMInstruction/M3ITImage To Text, Image ClassificationEN, ZH
- wanng/midjourney-v5-202304-cleanText To Image, Image To TextEN, FR
- ShadenA/MathNetQuestion Answering, Text Generation, Image To TextEN, PT, ES
- ranjaykrishna/visual_genomeImage To Text, Object Detection, Visual Question AnsweringEN
- KBlueLeaf/danbooru2023-metadata-databaseImage Classification, Text To Image, Image To Text, Image To Image, Text Retrieval, Text Generation, Text ClassificationEN, JA
- ScienceOne-AI/S1-MMAlignImage To Text, Visual Question Answering, Feature ExtractionEN
- bevaya/pubmed-ocrImage To Text, Image Text To TextEN