Aesthetics Text Corpus
Text CorporaHindi
Aesthetics Text Corpus is a text corpora dataset in Hindi from Venugopal et al. with 978 records in Text format.
About Aesthetics Text Corpus
Dataset consists of novels and short stories written in Hindi language. Novels and stories were scraped from http://hindisamay.com, http://premchand.co.in, a website dedicated to the popular novelist Premchand’s stories, and Bhandarkar Oriental Research Institute’s Digital Library (http://borilib.com). As a preprocessing step, the text was split into sentences and special characters, English tokens and Latin numbers were deleted.
Details
- Task
- Text Corpora
- Language
- Hindi
- Format
- Text
- Rows / instances
- 978
- Creator
- Venugopal et al.
- Year
- 2019