princeton-nlp/SWE-bench_Verified
General NLPEnglishBenchmark
The princeton-nlp/SWE-bench_Verified dataset is a English General NLP resource from princeton-nlp at 2026 comprising 500 examples.
📊 This dataset is used as an LLM benchmark. See model leaderboards →
About princeton-nlp/SWE-bench_Verified
Dataset Summary
SWE-bench Verified is a subset of 500 samples from the SWE-bench test set, which have been human-validated for quality. SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. See this post for more...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- 500
- Creator
- princeton-nlp
- Year
- 2026