arcinstitute/Stack-CellxGene45M
General NLPEnglishBenchmarkcc-by-4.0
Arcinstitute/Stack-CellxGene45M is a General NLP benchmark dataset in English from arcinstitute in Parquet format. It is distributed under the cc-by-4.0 license, and has been downloaded 15.8K times.
📊 This dataset is used as an LLM benchmark. See model leaderboards →
About arcinstitute/Stack-CellxGene45M
CellxGene 45M Collection
A curated subset of CellxGene (~45M cells) used to align the Stack model after pretraining on full human scBaseCount.
Selection Criteria
≥ 50,000 cells per dataset
≥ 5 donors per dataset
Cell Type A...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- arcinstitute
- Year
- 2026
- License
- cc-by-4.0
- Downloads
- 15814
- Likes
- 6