Skip to content

NVILA 15B

NVIDIAMassachusetts Institute of Technology (MIT)University of California (UC) BerkeleyUniversity of California San DiegoUniversity of WashingtonTsinghua UniversityVisual question answeringVideo descriptionLanguage modeling/generationQuestion answeringCharacter recognition (OCR)Open weights

Developed by NVIDIA,Massachusetts Institute of Technology (MIT),University of California (UC) Berkeley,University of California San Diego,University of Washington,Tsinghua University in 2024, NVILA 15B is a visual question answering model with 15000000000.0 parameters with openly available weights.

About NVILA 15B

Visual language models (VLMs) have made significant advances in accuracy in recent years. However, their efficiency has received much less attention. This paper introduces NVILA, a family of open VLMs designed to optimize both efficiency and accuracy

Details

Provider
NVIDIA,Massachusetts Institute of Technology (MIT),University of California (UC) Berkeley,University of California San Diego,University of Washington,Tsinghua University
Task
Visual question answering,Video description,Language modeling/generation,Question answering,Character recognition (OCR)
Parameters
15000000000.0
Released
2024-12-05
Open weights
Yes
View model source

Explore

FAQ