OpenWebTextCorpus
Text CorporaEnglish
The OpenWebTextCorpus dataset is a English text corpora resource from Gokaslan et al. at 2019 comprising 8,013,769 examples.
About OpenWebTextCorpus
Dataset contains millions of webpages text stemming from reddit urls totalling 38Gb of text data.
Details
- Task
- Text Corpora
- Language
- English
- Format
- n/a
- Rows / instances
- 8,013,769
- Creator
- Gokaslan et al.
- Year
- 2019