Skip to content

AlgorithmicResearchGroup/s2orc_arxiv

Text GenerationSummarizationFeature ExtractionENBenchmark

AlgorithmicResearchGroup/s2orc_arxiv is a text generation-focused benchmark dataset in EN distributed in Parquet format. And falls in the 1M<n<10M size category, and has been downloaded 20.5K times.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About AlgorithmicResearchGroup/s2orc_arxiv

S2ORC ArXiv A subset of the Semantic Scholar Open Research Corpus (S2ORC) filtered to ArXiv papers. Contains 2.58 million parsed scientific papers with full text, abstracts, structured sections, figures, and citation metadata. D...

Details

Task
Text Generation, Summarization, Feature Extraction
Language
EN
Format
Parquet
Rows / instances
N/A
Size
1M<n<10M
Creator
AlgorithmicResearchGroup
Year
2026
Downloads
20501
Likes
2
Download Homepage

FAQ