Skip to content

apollo-research/Skylion007-openwebtext-tokenizer-gpt2

General NLPEnglishBenchmark

Apollo-research/Skylion007-openwebtext-tokenizer-gpt2 is a General NLP benchmark dataset in English from apollo-research with 8,824,092 records in Parquet format. And falls in the 1M<n<10M size category, and has been downloaded 17.6K times.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
8824092
Size
1M<n<10M
Creator
apollo-research
Year
2024
Downloads
17588
Likes
3
Download Homepage

Related General NLP datasets

FAQ