Feature Extraction Datasets
There are 6 feature extraction datasets in our directory. Each links to its source, paper, and download — browse the full list below or filter by language.
Feature Extraction is the task of turning text into dense numerical embeddings for downstream search, clustering, or retrieval. We catalog 6 datasets for it.
Updated June 2026
- OpenGVLab/InternVidFeature ExtractionEN
- sentence-transformers/embedding-training-dataFeature ExtractionEN
- Open-Orca/OpenOrcaText Classification, Token Classification, Table Question Answering, Question Answering, Zero Shot Classification, Summarization, Feature Extraction, Text GenerationEN
- agentlans/high-quality-english-sentencesText Classification, Text Generation, Feature Extraction, Sentence SimilarityEN
- Open-Orca/SlimOrcaText Classification, Token Classification, Table Question Answering, Question Answering, Zero Shot Classification, Summarization, Feature Extraction, Text GenerationEN
- ScienceOne-AI/S1-MMAlignImage To Text, Visual Question Answering, Feature ExtractionEN