Skip to content

defunct-datasets/the_pile_books3

Text GenerationFill MaskENmit

Defunct-datasets/the_pile_books3 is a text generation-focused dataset in EN that provides 196,639 labeled examples distributed in Parquet format. It is distributed under the mit license and falls in the 100K<n<1M size category, and has been downloaded 245 times.

About defunct-datasets/the_pile_books3

This dataset is Shawn Presser's work and is part of EleutherAi/The Pile dataset. This dataset contains all of bibliotik in plain .txt form, aka 197,000 books processed in exactly the same way as did for bookcorpusopen (a.k.a. books1). seems to be ...

Details

Task
Text Generation, Fill Mask
Language
EN
Format
Parquet
Rows / instances
196639
Size
100K<n<1M
Creator
defunct-datasets
Year
2022
License
mit
Downloads
245
Likes
152
Download Homepage

Related Text Generation, Fill Mask datasets

FAQ