Skip to content

Open-Orca/SlimOrca-Dedup

Text ClassificationQuestion AnsweringText GenerationEnglish

Open-Orca/SlimOrca-Dedup is a text classification-focused dataset in English distributed in Parquet format.

About Open-Orca/SlimOrca-Dedup

Overview "SlimOrca Dedup" is a deduplicated, unfiltered subset of the SlimOrca dataset, excluding RLHF instances, resulting in 363k unique examples. Key Features Removal of RLHF instances. Deduplication using minhash and Jaccard si...

Details

Task
Text Classification, Question Answering, Text Generation
Language
English
Format
Parquet
Rows / instances
N/A
Creator
Open-Orca
Year
2023
Download

Related Text Classification, Question Answering, Text Generation datasets

FAQ