Skip to content

neuralwork/arxiver

General NLPEnglish

Neuralwork/arxiver is a General NLP-focused dataset in English distributed in Parquet format.

About neuralwork/arxiver

Arxiver Dataset Arxiver consists of 63,357 arXiv papers converted to multi-markdown (.mmd) format. Our dataset includes original arXiv article IDs, titles, abstracts, authors, publication dates, URLs and corresponding markdown files published b...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
neuralwork
Year
2024
Download

Related General NLP datasets

FAQ