Subject
Difficulty

Add a question to generate a benchmark-ready item.

Check Result Detail
Duplicate options Pass All answer choices are unique.
Answer leakage Pass Distractors avoid obvious answer words.
Length balance Pass Choices have comparable length.
Question stem Pass Stem is specific and phrased as a question.

Export

Format

Why This Is Useful

Evaluation sets fail quietly when distractors are weak, duplicated, or reveal the answer. This Space teaches a better workflow: generate candidates, audit them, keep rationales, and publish the result as a versioned Hugging Face Dataset.

Benchmark Builder · Built as a practical AI/ML learning Space for the Hugging Face community.