Skip to content

Microsoft Research Paraphrase Corpus (MRPC)

Paraphrasing IdentificationEnglishBenchmark

Microsoft Research Paraphrase Corpus (MRPC) is a paraphrasing identification benchmark dataset in English from Dolan et al. with 5,8 records in Text format.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About Microsoft Research Paraphrase Corpus (MRPC)

Dataset contains pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship.

Details

Task
Paraphrasing Identification
Language
English
Format
Text
Rows / instances
5,8
Creator
Dolan et al.
Year
2005
Download Paper

Related Paraphrasing Identification datasets

FAQ