Skip to content

CommonCrawl

Text CorporaMulti-Lingual

CommonCrawl is a text corpora dataset in Multi-Lingual from Common Crawl Foundation with 25 records in WET format.

Details

Task
Text Corpora
Language
Multi-Lingual
Format
WET
Rows / instances
25B
Creator
Common Crawl Foundation
Year
2013-2019
Download Paper

Related Text Corpora datasets

FAQ