Image Text To Text Datasets
There are 14 image text to text datasets in our directory. Each links to its source, paper, and download — browse the full list below or filter by language.
Image Text To Text is a machine-learning task covered in our directory. We catalog 14 datasets for it.
Updated June 2026
- mvp-lab/LLaVA-OneVision-2-DataVideo Text To Text, Visual Question Answering, Image Text To TextEN
- cua-lite/ScaleCUAImage Text To TextEnglish
- ProGamerGov/synthetic-dataset-1m-dalle3-high-quality-captionsText To Image, Image Classification, Image To Text, Image Text To Text, OtherEN
- nvidia/Llama-Nemotron-VLM-Dataset-v1Visual Question Answering, Image Text To Text, Image To TextEnglish
- vyokky/GUI-360Image Text To TextEnglish
- Xkev/LLaVA-CoT-100kVisual Question Answering, Image Text To TextEN
- nvidia/Nemotron-VLM-Dataset-v2Visual Question Answering, Image Text To Text, Video Text To TextEnglish
- xlangai/AgentNetImage Text To TextEN
- AudioVisual-Caption/ASID-1MImage Text To TextEN
- nvidia/Nemotron-Image-Training-v3Visual Question Answering, Image Text To TextEnglish
- VLR-CVC/DocVQA-2026Visual Question Answering, Document Question Answering, Image Text To Text, Question AnsweringEN
- bevaya/pubmed-ocrImage To Text, Image Text To TextEN
- multimodal-reasoning-lab/Zebra-CoTAny To Any, Image Text To Text, Visual Question AnsweringEnglish
- spatialverse/SAGE-3D_VLN_DataRobotics, Image Text To TextEnglish