Skip to content

LLaVA-OV-72B

ByteDanceNanyang Technological UniversityChinese University of Hong Kong (CUHK)Hong Kong University of Science and Technology (HKUST)Image captioningVisual question answeringVideo descriptionObject recognitionAction recognitionLanguage modeling/generationOpen weights

LLaVA-OV-72B is image captioning model published by ByteDance,Nanyang Technological University,Chinese University of Hong Kong (CUHK),Hong Kong University of Science and Technology (HKUST) in 2024 featuring 72000000000.0 parameters.

About LLaVA-OV-72B

We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision

Details

Provider
ByteDance,Nanyang Technological University,Chinese University of Hong Kong (CUHK),Hong Kong University of Science and Technology (HKUST)
Task
Image captioning,Visual question answering,Video description,Object recognition,Action recognition,Language modeling/generation
Parameters
72000000000.0
Released
2024-08-06
Open weights
Yes
View model source

Explore

FAQ