VILA1.5-13B
NVIDIAMassachusetts Institute of Technology (MIT)ChatVisual question answeringImage captioningLanguage modeling/generationQuestion answeringOpen weightscc-by-nc-4.0
The VILA1.5-13B model is an open-weights chat model from NVIDIA,Massachusetts Institute of Technology (MIT) with 13493916736.0 parameters built with transformers. With 136 downloads and 5 likes, it is widely used. It is distributed under the cc-by-nc-4.0 license.
About VILA1.5-13B
Visual language models (VLMs) rapidly progressed with the recent success of large language models. There have been growing efforts on visual instruction tuning to extend the LLM with visual inputs, but lacks an in-depth study of the visual language p
Details
- Provider
- NVIDIA,Massachusetts Institute of Technology (MIT)
- Task
- Chat,Visual question answering,Image captioning,Language modeling/generation,Question answering
- Parameters
- 13493916736.0
- Library
- transformers
- License
- cc-by-nc-4.0
- Released
- 2024-05-03
- Open weights
- Yes
- Downloads
- 136
- Likes
- 5