Visual Storytelling Dataset (VIST)
Multi-Modal LearningEnglish
Visual Storytelling Dataset (VIST) is a multi-modal learning-focused dataset in English that provides 81,743 labeled examples distributed in JSON format.
About Visual Storytelling Dataset (VIST)
Dataset contains 81,743 unique photos in 20,211 sequences, aligned to descriptive and story language. VIST is previously known as "SIND", the Sequential Image Narrative Dataset (SIND).
Details
- Task
- Multi-Modal Learning
- Language
- English
- Format
- JSON
- Rows / instances
- 81,743
- Creator
- Huang et al.
- Year
- 2016