CLIRMatrix
Information RetrievalMulti-Lingual
Created by Sun et al. at 2020, the CLIRMatrix is a information retrieval dataset in Multi-Lingual.
About CLIRMatrix
Dataset is a collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval extracted automatically from Wikipedia. It comprises of (1) BI-139, a bilingual dataset of queries in one language matched with relevant documents in another language for 19,182 language pairs, and (2) MULTI-8, a multilingual dataset of queries and documents jointly aligned in 8 different languages. In total, 49 million unique queries and 34 billion (query, document, label) triplets were mined.
Details
- Task
- Information Retrieval
- Language
- Multi-Lingual
- Format
- n/a
- Rows / instances
- n/a
- Creator
- Sun et al.
- Year
- 2020