image-text-to-text Models
There are 80 AI and NLP models for image-text-to-text in our directory. Browse the full list below, or explore models by provider.
image-text-to-text is a machine-learning task covered in our directory. We list 80 models for it.
Updated June 2026
- bartowski/deepreinforce-ai_Ornith-1.0-35B-GGUFimage-text-to-textbartowski
- datalab-to/chandra-ocr-2image-text-to-textdatalab-to
- dealignai/Gemma-4-31B-JANG_4M-CRACKimage-text-to-textdealignai
- docling-project/SmolDocling-256M-previewimage-text-to-textdocling-project
- empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUFimage-text-to-textempero-ai
- google/gemma-3-4b-itimage-text-to-textgoogle
- HauhauCS/Gemma4-12B-QAT-Uncensored-HauhauCS-Balancedimage-text-to-textHauhauCS
- HauhauCS/Gemma4-31B-QAT-Uncensored-HauhauCS-Balanced-MTPimage-text-to-textHauhauCS
- Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilledimage-text-to-textJackrong
- HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressiveimage-text-to-textHauhauCS
- Jackrong/Qwopus3.6-27B-v2-MTP-GGUFimage-text-to-textJackrong
- unsloth/Qwen3.6-35B-A3B-GGUFimage-text-to-textunsloth
- meta-llama/Llama-4-Scout-17B-16E-Instructimage-text-to-textmeta-llama
- google/gemma-4-26B-A4B-itimage-text-to-textgoogle
- google/gemma-4-31B-itimage-text-to-textgoogle
- Qwen/Qwen2.5-VL-7B-Instructimage-text-to-textQwen
- Qwen/Qwen3.5-9Bimage-text-to-textQwen
- Qwen/Qwen3.5-4Bimage-text-to-textQwen
- Qwen/Qwen3.6-35B-A3B-FP8image-text-to-textQwen
- Qwen/Qwen3.6-35B-A3Bimage-text-to-textQwen
- Qwen/Qwen3.6-27Bimage-text-to-textQwen
- Qwen/Qwen3-VL-32B-Instructimage-text-to-textQwen
- Qwen/Qwen3-VL-8B-Instructimage-text-to-textQwen
- Qwen/Qwen3.6-27B-FP8image-text-to-textQwen
- cyankiwi/gemma-4-26B-A4B-it-AWQ-4bitimage-text-to-textcyankiwi
- Qwen/Qwen2-VL-2B-Instructimage-text-to-textQwen
- HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressiveimage-text-to-textHauhauCS
- zai-org/GLM-OCRimage-text-to-textzai-org
- deepseek-ai/DeepSeek-OCR-2image-text-to-textdeepseek-ai
- llava-hf/llava-1.5-7b-hfimage-text-to-textllava-hf
- microsoft/Florence-2-baseimage-text-to-textmicrosoft
- Qwen/Qwen3.5-27Bimage-text-to-textQwen
- Qwen/Qwen3.5-0.8Bimage-text-to-textQwen
- moonshotai/Kimi-K2.6image-text-to-textmoonshotai
- deepseek-ai/DeepSeek-OCRimage-text-to-textdeepseek-ai
- microsoft/Florence-2-largeimage-text-to-textmicrosoft
- DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUFimage-text-to-textDavidAU
- Qwen/Qwen3.5-397B-A17Bimage-text-to-textQwen
- datalab-to/surya-ocr-2image-text-to-textdatalab-to
- DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUFimage-text-to-textDavidAU
- rednote-hilab/dots.ocrimage-text-to-textrednote-hilab
- baidu/Unlimited-OCRimage-text-to-textbaidu
- Jackrong/Qwopus3.6-27B-Coder-Compat-MTP-GGUFimage-text-to-textJackrong
- meta-llama/Llama-3.2-11B-Vision-Instructimage-text-to-textmeta-llama
- PaddlePaddle/PaddleOCR-VL-1.6image-text-to-textPaddlePaddle
- empero-ai/Qwable-9B-Claude-Fable-5-GGUFimage-text-to-textempero-ai
- nanonets/Nanonets-OCR-simage-text-to-textnanonets
- datalab-to/liftimage-text-to-textdatalab-to
- infly/Infinity-Parser2-Proimage-text-to-textinfly
- microsoft/OmniParserimage-text-to-textmicrosoft
- Gryphe/Qwen3.6-35B-A3B-StyleTuneimage-text-to-textGryphe
- ValiantLabs/Qwen3.6-27B-Esper4image-text-to-textValiantLabs
- google/gemma-3n-E4B-it-litert-previewimage-text-to-textgoogle
- Qwen/Qwen2.5-VL-3B-Instructimage-text-to-textQwen
- Qwen/Qwen3-VL-4B-Instructimage-text-to-textQwen
- Qwen/Qwen3-VL-2B-Instructimage-text-to-textQwen
- moonshotai/Kimi-K2.5image-text-to-textmoonshotai
- nvidia/LocateAnything-3Bimage-text-to-textnvidia
- google/gemma-3-27b-itimage-text-to-textgoogle
- PaddlePaddle/PaddleOCR-VLimage-text-to-textPaddlePaddle
- stepfun-ai/GOT-OCR2_0image-text-to-textstepfun-ai
- Qwen/Qwen3.5-35B-A3Bimage-text-to-textQwen
- openbmb/MiniCPM-Llama3-V-2_5image-text-to-textopenbmb
- vikhyatk/moondream2image-text-to-textvikhyatk
- Qwen/Qwen2-VL-7B-Instructimage-text-to-textQwen
- google/diffusiongemma-26B-A4B-itimage-text-to-textgoogle
- unsloth/Qwen3.6-27B-MTP-GGUFimage-text-to-textunsloth
- moonshotai/Kimi-K2.7-Codeimage-text-to-textmoonshotai
- MiniMaxAI/MiniMax-M3image-text-to-textMiniMaxAI
- sahilchachra/Unlimited-OCR-GGUFimage-text-to-textsahilchachra
- HauhauCS/Gemma4-26B-A4B-QAT-Uncensored-HauhauCS-Balanced-MTPimage-text-to-textHauhauCS
- unsloth/gemma-4-12b-it-GGUFimage-text-to-textunsloth
- unsloth/gemma-4-26B-A4B-it-qat-GGUFimage-text-to-textunsloth
- unsloth/Qwen3.6-35B-A3B-MTP-GGUFimage-text-to-textunsloth
- unsloth/gemma-4-26B-A4B-it-GGUFimage-text-to-textunsloth
- HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressiveimage-text-to-textHauhauCS
- unsloth/Qwen3.5-9B-GGUFimage-text-to-textunsloth
- unsloth/Qwen3.6-27B-GGUFimage-text-to-textunsloth
- unsloth/diffusiongemma-26B-A4B-it-GGUFimage-text-to-textunsloth
- unsloth/Qwen3.5-9B-MTP-GGUFimage-text-to-textunsloth