Skip to content

coral-nlp/german-commons

Text GenerationDEodc-by

Coral-nlp/german-commons is a text generation dataset in DE from coral-nlp in Parquet format. It is distributed under the odc-by license and falls in the 10M<n<100M size category, and has been downloaded 2K times.

About coral-nlp/german-commons

German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models A comprehensive collection of German-language text data under open licenses for training German language models. Datasheet: DATASHEET.md. Paper: arxiv.org/a...

Details

Task
Text Generation
Language
DE
Format
Parquet
Rows / instances
N/A
Size
10M<n<100M
Creator
coral-nlp
Year
2025
License
odc-by
Downloads
2014
Likes
38
Download Homepage

Related Text Generation datasets

FAQ