The Penn Treebank Project is a POS-focused dataset in English that provides ~1M words labeled examples distributed in Text format.