Skip to content

CLIRMatrix

Information RetrievalMulti-Lingual

Created by Sun et al. at 2020, the CLIRMatrix is a information retrieval dataset in Multi-Lingual.

About CLIRMatrix

Dataset is a collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval extracted automatically from Wikipedia. It comprises of (1) BI-139, a bilingual dataset of queries in one language matched with relevant documents in another language for 19,182 language pairs, and (2) MULTI-8, a multilingual dataset of queries and documents jointly aligned in 8 different languages. In total, 49 million unique queries and 34 billion (query, document, label) triplets were mined.

Details

Task
Information Retrieval
Language
Multi-Lingual
Format
n/a
Rows / instances
n/a
Creator
Sun et al.
Year
2020
Download Paper

Related Information Retrieval datasets

FAQ