Skip to content

deepcs233/Visual-CoT

Image Text To TextENapache-2.0

Deepcs233/Visual-CoT is a image text to text-focused dataset in EN distributed in Parquet format. It is distributed under the apache-2.0 license, and has been downloaded 2.5K times.

About deepcs233/Visual-CoT

VisCoT Dataset Card There is a shortage of multimodal datasets for training multi-modal large language models (MLLMs) that require to identify specific regions in an image for additional attention to improve response performance. This type of...

Details

Task
Image Text To Text
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
deepcs233
Year
2024
License
apache-2.0
Downloads
2471
Likes
63
Download Homepage

Related Image Text To Text datasets

FAQ