Question 1

What is the The Semantic Scholar Open Research Corpus (S2ORC) dataset?

Accepted Answer

Dataset contains 136M+ paper nodes with 12.7M+ full text papers and connected by 467M+ citation edges.

Question 2

Is The Semantic Scholar Open Research Corpus (S2ORC) a benchmark?

Accepted Answer

Yes — The Semantic Scholar Open Research Corpus (S2ORC) is used as an LLM benchmark. See model leaderboards in the Benchmarks section.

Question 3

Where can I download The Semantic Scholar Open Research Corpus (S2ORC)?

Accepted Answer

The Semantic Scholar Open Research Corpus (S2ORC) is available at its source: https://github.com/allenai/s2orc/.

The Semantic Scholar Open Research Corpus (S2ORC)

About The Semantic Scholar Open Research Corpus (S2ORC)