Skip to content

AudioSet

Speech RecognitionVisualMulti-Lingual

AudioSet is a speech recognition-focused dataset in Multi-Lingual distributed in CSV, TFR format.

About AudioSet

Dataset consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos.

Details

Task
Speech Recognition, Visual
Language
Multi-Lingual
Format
CSV, TFR
Rows / instances
n/a
Creator
Google
Year
2017
Download Paper

Related Speech Recognition, Visual datasets

FAQ