bigcode/starcoderdata
Text GenerationCODEBenchmark
The bigcode/starcoderdata dataset is a CODE text generation resource from bigcode at 2023. With 22K downloads and 523 likes, it is actively used by the community. It is released under the other license and is a 100M<n<1B-scale dataset.
📊 This dataset is used as an LLM benchmark. See model leaderboards →
About bigcode/starcoderdata
StarCoder Training Dataset
Dataset description
This is the dataset used for training StarCoder and StarCoderBase. It contains 783GB of code in 86 programming languages, and includes 54GB GitHub Issues + 13GB Jupyter notebooks in scri...
Details
- Task
- Text Generation
- Language
- CODE
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 100M<n<1B
- Creator
- bigcode
- Year
- 2023
- License
- other
- Downloads
- 22045
- Likes
- 523