Skip to content

allenai/peS2o

Text GenerationFill MaskENodc-by

Allenai/peS2o is a text generation dataset in EN from allenai in Parquet format. It is distributed under the odc-by license and falls in the 10B<n<100B size category, and has been downloaded 11.8K times.

About allenai/peS2o

Pretraining Effectively on S2ORC! The peS2o dataset is a collection of ~40M creative open-access academic papers, cleaned, filtered, and formatted for pre-training of language models. It is derived from the Semantic Scholar Open Research Corpus(L...

Details

Task
Text Generation, Fill Mask
Language
EN
Format
Parquet
Rows / instances
N/A
Size
10B<n<100B
Creator
allenai
Year
2023
License
odc-by
Downloads
11767
Likes
197
Download Homepage

Related Text Generation, Fill Mask datasets

FAQ