Skip to content

ParaBank

Semantic Textual SimilarityEnglish

ParaBank is a semantic textual similarity-focused dataset in English that provides 79.5M references labeled examples distributed in TSV format.

About ParaBank

Dataset contains paraphrases with 79.5 million references and on average 4 paraphrases per reference.

Details

Task
Semantic Textual Similarity
Language
English
Format
TSV
Rows / instances
79.5M references
Creator
Hu et al.
Year
2019
Download Paper

Related Semantic Textual Similarity datasets

FAQ