Skip to content

codeparrot/github-code

Text GenerationCODE

The codeparrot/github-code dataset is a CODE text generation resource from codeparrot at 2022. With 633.1K downloads and 367 likes, it is actively used by the community. It is released under the other license.

About codeparrot/github-code

The GitHub Code dataest consists of 115M code files from GitHub in 32 programming languages with 60 extensions totalling in 1TB of text data. The dataset was created from the GitHub dataset on BiqQuery.

Details

Task
Text Generation
Language
CODE
Format
Parquet
Rows / instances
N/A
Creator
codeparrot
Year
2022
License
other
Downloads
633092
Likes
367
Download Homepage

Related Text Generation datasets

FAQ