Skip to content

ai4bharat/samanantar

Text GenerationTranslationEN, AS, BNcc-by-nc-4.0

Created by ai4bharat at 2022, the ai4bharat/samanantar is a text generation dataset in EN, AS, BN containing 49,774,246 records in Parquet format. With 2.6K downloads and 41 likes, it is actively used by the community. It is released under the cc-by-nc-4.0 license and is a 10M<n<100M-scale dataset.

About ai4bharat/samanantar

Dataset Card for Samanantar Dataset Summary Samanantar is the largest publicly available parallel corpora collection for Indic language: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu. ...

Details

Task
Text Generation, Translation
Language
EN, AS, BN
Format
Parquet
Rows / instances
49774246
Size
10M<n<100M
Creator
ai4bharat
Year
2022
License
cc-by-nc-4.0
Downloads
2588
Likes
41
Download Homepage

Related Text Generation, Translation datasets

FAQ