Question 1

What are Fill Mask datasets used for?

Accepted Answer

Fill Mask datasets are collections of labelled or raw data used to train, fine-tune, and evaluate models on the fill mask task. This page lists 9 such datasets, each linking to its source and paper.

Question 2

Which Fill Mask dataset is best for benchmarking?

Accepted Answer

None of the listed Fill Mask datasets are currently tracked as standard LLM benchmarks, but many are widely used for evaluation.

Question 3

How many Fill Mask datasets are there?

Accepted Answer

We catalog 9 Fill Mask datasets in one searchable directory.

Fill Mask Datasets

What languages do fill mask datasets cover?

Explore other dataset tasks

Frequently asked questions