Skip to content

SFU Opinion and Comments Corpus (SOCC)

Text CorporaText ClassificationEnglish

Created by Kolhatkar et al. at 2018, the SFU Opinion and Comments Corpus (SOCC) is a text corpora dataset in English containing 663,173 records in CSV format.

About SFU Opinion and Comments Corpus (SOCC)

Dataset contains 10,339 opinion articles (editorials, columns, and op-eds) together with their 663,173 comments from 303,665 comment threads, from the main Canadian daily in English, The Globe and Mail, from January 2012 to December 2016. In addition there's a subset annotated corpus measuring toxicity, negation and its scope, and appraisal containing 1,043 annotated comments in responses to 10 different articles covering a variety of subjects: technology, immigration, terrorism, politics, budget, social issues, religion, property, and refugees.

Details

Task
Text Corpora, Text Classification
Language
English
Format
CSV
Rows / instances
663,173
Creator
Kolhatkar et al.
Year
2018
Download Paper

Related Text Corpora, Text Classification datasets

FAQ