bevaya/pubmed-ocr
Image To TextImage Text To TextEN
Created by bevaya at 2026, the bevaya/pubmed-ocr is a image to text dataset in EN in Parquet format. With 2.2K downloads and 71 likes, it is actively used by the community. It is released under the other license and is a 1M<n<10M-scale dataset.
About bevaya/pubmed-ocr
PubMed-OCR: PMC Open Access OCR Annotations
PubMed-OCR is an OCR-centric corpus of scientific articles derived from PubMed Central Open Access PDFs. Each page is rendered to an image and annotated with Google Cloud Vision OCR, released in a com...
Details
- Task
- Image To Text, Image Text To Text
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 1M<n<10M
- Creator
- bevaya
- Year
- 2026
- License
- other
- Downloads
- 2241
- Likes
- 71