Skip to content

Aesthetics Text Corpus

Text CorporaHindi

Aesthetics Text Corpus is a text corpora dataset in Hindi from Venugopal et al. with 978 records in Text format.

About Aesthetics Text Corpus

Dataset consists of novels and short stories written in Hindi language. Novels and stories were scraped from http://hindisamay.com, http://premchand.co.in, a website dedicated to the popular novelist Premchand’s stories, and Bhandarkar Oriental Research Institute’s Digital Library (http://borilib.com). As a preprocessing step, the text was split into sentences and special characters, English tokens and Latin numbers were deleted.

Details

Task
Text Corpora
Language
Hindi
Format
Text
Rows / instances
978
Creator
Venugopal et al.
Year
2019
Download Paper

Related Text Corpora datasets

FAQ