Overview
The A11y LLM Eval report provides a summary of accessibility evaluation results for different models, instruction sets, and skills. It can help identify how different approaches impact accessibility outcomes and highlight areas for improvement. All content is generated using GitHub Copilot SDK and results are based on automated checks and curated test cases.
Run scope: 8 models | 32 prompt cases | 1280 control samples | 2 instruction sets | 1 skills
Control baseline
12%
Overall control pass rate*; best model GPT-5.4 Mini at 25%
Hardest case
Shopping Home Page | React | Dark
0% pass rate*, 15.55 avg WCAG failures
Best instruction lift
1. Basic
Best delta +48.5pp vs control
Best skill lift
Building Accessible UI
Best final-turn delta +88.1pp vs control; +5.6pp vs turn 1
* Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.
Control snapshot
Control results show how well models produce accessible code with no instructions or prompts to specifically create accessible code.
| Model | Rank | Pass rate* | Avg Total WCAG Failures |
|---|---|---|---|
| GPT-5.4 Mini | 1 | 25% | 1.46 |
| GPT-5.5 | 2 | 18% | 1.47 |
| GPT-5.4 | 3 | 15% | 1.59 |
| Claude Opus 4.7 | 4 | 14% | 4.51 |
| Gemini 3.1 Pro Preview | 5 | 8% | 3.08 |
Instruction-set snapshot
Instruction-set results show how well models produce accessible code when given specific guidance at the system/instruction level. Instructions guide the agent's behavior throughout the generation session and can improve accessibility outcomes, but they also consume context, especially when they are lengthy or combined with other instructions.
| Instruction set | Rank | Variant pass rate* | Delta vs control |
|---|---|---|---|
| 1. Basic | 1 | 60% | +48.5pp |
| 0. Minimal | 2 | 37% | +24.8pp |
Skill snapshot
Skills are reusable, task-specific packages that can include guidance, examples, supporting files, scripts, and tool-use workflows, while instruction sets are always-on guidance added to the agent's context for a run. Use instructions for broad behavior you want applied consistently across tasks; use a skill when the guidance is specialized, larger, procedural, or depends on files, scripts, or a focused sequence of tool-assisted steps. Skills keep general instructions lighter and can guide the model through a process, such as generating an answer and then reviewing it against a checklist.
| Skill | Avg final-turn pass rate* | Avg delta vs control | Best model(s) |
|---|---|---|---|
| Building Accessible UI | 86% | +74.6pp | Gemini 3.1 Pro Preview |
Variant token + pass-rate snapshot
This table compares control, instruction sets, and skill turns using per-sample averages from the evaluated run. API calls are counted from the underlying Copilot session transcript for each sample; tokens per API call are computed as average total tokens divided by average call count for that sample. The guidance token percentage reflects the share of input tokens that came from guidance files (instruction markdown or skill directory files).
| Variant | Avg pass rate* | Avg API calls | Avg tokens / API call | Avg tokens in | Avg tokens out | % of possible guidance tokens |
|---|---|---|---|---|---|---|
| Control | 12% | 4.34 | 16,896 | 69,869 | 4,608 | n/a |
| 1. Basic | 60% | 5.45 | 19,689 | 104,678 | 5,904 | 100.0% |
| 0. Minimal | 37% | 4.78 | 17,048 | 77,703 | 4,891 | 100.0% |
| Building Accessible UI - Generate (Turn 1) | 82% | 8.26 | 22,459 | 186,584 | 6,879 | 14.4% |
| Building Accessible UI - Review (Turn 2) | 86% | 11.57 | 32,731 | 383,001 | 6,346 | 18.5% |
Control summary
Control results show how well models produce accessible code with no instructions or prompts to specifically create accessible code. Models are ranked by WCAG pass rate across 32 test cases and 5 samples per test (160 samples per model). These tests do not comprehensively test all WCAG requirements, only a subset of the most common issues. WCAG failures may still exist even for passing tests.
| Model | Rank | Pass rate* | Avg Total WCAG Failures | Avg Axe WCAG Failures | Avg Assertion WCAG Failures | Avg Best Practice Failures |
|---|---|---|---|---|---|---|
| GPT-5.4 Mini | 1 | 25% | 1.46 | 0.26 | 1.20 | 1.06 |
| GPT-5.5 | 2 | 18% | 1.47 | 0.36 | 1.11 | 0.34 |
| GPT-5.4 | 3 | 15% | 1.59 | 0.24 | 1.34 | 0.70 |
| Claude Opus 4.7 | 4 | 14% | 4.51 | 3.56 | 0.96 | 3.98 |
| Gemini 3.1 Pro Preview | 5 | 8% | 3.08 | 1.30 | 1.78 | 4.46 |
| Claude Sonnet 4.6 | 6 | 6% | 10.02 | 8.81 | 1.21 | 8.59 |
| Gemini 3 Flash Preview | 7 | 6% | 3.38 | 1.32 | 2.05 | 4.71 |
| Claude Haiku 4.5 | 8 | 3% | 5.91 | 3.64 | 2.27 | 9.43 |
* Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.
Pass@k aggregates
Pass@k estimates the probability that at least one of k randomly selected samples passes. This is computed from control samples only.
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 4 | 80% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 2 | 40% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 1 | 20% | 100% | 100% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 1 | 20% | 100% | 100% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 2 | 40% | 100% | 100% |
| GPT-5.4 Mini | 5 | 5 | 100% | 100% | 100% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 3 | 60% | 100% | 100% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 2 | 40% | 100% | 100% |
| GPT-5.5 | 5 | 1 | 20% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 1 | 20% | 100% | 100% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 2 | 40% | 100% | 100% |
| Gemini 3 Flash Preview | 5 | 5 | 100% | 100% | 100% |
| Gemini 3.1 Pro Preview | 5 | 5 | 100% | 100% | 100% |
| GPT-5.4 | 5 | 2 | 40% | 100% | 100% |
| GPT-5.4 Mini | 5 | 4 | 80% | 100% | 100% |
| GPT-5.5 | 5 | 1 | 20% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 1 | 20% | 100% | 100% |
| Gemini 3 Flash Preview | 5 | 3 | 60% | 100% | 100% |
| Gemini 3.1 Pro Preview | 5 | 3 | 60% | 100% | 100% |
| GPT-5.4 | 5 | 1 | 20% | 100% | 100% |
| GPT-5.4 Mini | 5 | 4 | 80% | 100% | 100% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 1 | 20% | 100% | 100% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 1 | 20% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 1 | 20% | 100% | 100% |
| GPT-5.4 Mini | 5 | 4 | 80% | 100% | 100% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 4 | 80% | 100% | 100% |
| GPT-5.4 Mini | 5 | 3 | 60% | 100% | 100% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 1 | 20% | 100% | 100% |
| GPT-5.4 | 5 | 2 | 40% | 100% | 100% |
| GPT-5.4 Mini | 5 | 4 | 80% | 100% | 100% |
| GPT-5.5 | 5 | 2 | 40% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 3 | 60% | 100% | 100% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 1 | 20% | 100% | 100% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 1 | 20% | 100% | 100% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 2 | 40% | 100% | 100% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 1 | 20% | 100% | 100% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 1 | 20% | 100% | 100% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 1 | 20% | 100% | 100% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 1 | 20% | 100% | 100% |
| GPT-5.5 | 5 | 4 | 80% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 2 | 40% | 100% | 100% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 2 | 40% | 100% | 100% |
| GPT-5.4 Mini | 5 | 2 | 40% | 100% | 100% |
| GPT-5.5 | 5 | 2 | 40% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 2 | 40% | 100% | 100% |
| Claude Sonnet 4.6 | 5 | 1 | 20% | 100% | 100% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 1 | 20% | 100% | 100% |
| GPT-5.4 Mini | 5 | 2 | 40% | 100% | 100% |
| GPT-5.5 | 5 | 4 | 80% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 1 | 20% | 100% | 100% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 2 | 40% | 100% | 100% |
| GPT-5.5 | 5 | 2 | 40% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 5 | 100% | 100% | 100% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 2 | 40% | 100% | 100% |
| GPT-5.5 | 5 | 2 | 40% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 1 | 20% | 100% | 100% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 1 | 20% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 4 | 80% | 100% | 100% |
| Claude Sonnet 4.6 | 5 | 1 | 20% | 100% | 100% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 4 | 80% | 100% | 100% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.6 | 5 | 1 | 20% | 100% | 100% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 2 | 40% | 100% | 100% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 1 | 20% | 100% | 100% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 2 | 40% | 100% | 100% |
| GPT-5.4 | 5 | 2 | 40% | 100% | 100% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 1 | 20% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 1 | 20% | 100% | 100% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 1 | 20% | 100% | 100% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 1 | 20% | 100% | 100% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 2 | 40% | 100% | 100% |
| Claude Sonnet 4.6 | 5 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1* | pass@5* | pass@10* |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | 5 | 0 | 0% | 0% | 0% |
| Claude Opus 4.7 | 5 | 1 | 20% | 100% | 100% |
| Claude Sonnet 4.6 | 5 | 2 | 40% | 100% | 100% |
| Gemini 3 Flash Preview | 5 | 0 | 0% | 0% | 0% |
| Gemini 3.1 Pro Preview | 5 | 0 | 0% | 0% | 0% |
| GPT-5.4 | 5 | 1 | 20% | 100% | 100% |
| GPT-5.4 Mini | 5 | 0 | 0% | 0% | 0% |
| GPT-5.5 | 5 | 0 | 0% | 0% | 0% |
* Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.
Control analysis
This section summarizes where models perform well, where they struggle, and the most frequent types of accessibility issues observed across all samples.
Most common axe WCAG failures
| Rule | Impact | Failures | % of failures | Seen in models | Seen in test cases | Description |
|---|---|---|---|---|---|---|
| color-contrast | serious | 532 | 89.1% | 8 | 32 | Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds |
| aria-required-children | critical | 14 | 2.3% | 4 | 5 | Ensure elements with an ARIA role that require child roles contain them |
| button-name | critical | 14 | 2.3% | 3 | 5 | Ensure buttons have discernible text |
| link-in-text-block | serious | 8 | 1.3% | 3 | 5 | Ensure links are distinguished from surrounding text in a way that does not rely on color |
| link-name | serious | 8 | 1.3% | 2 | 4 | Ensure links have discernible text |
| aria-allowed-attr | critical | 5 | 0.8% | 3 | 4 | Ensure an element's role supports its ARIA attributes |
| listitem | serious | 5 | 0.8% | 3 | 4 | Ensure |
| aria-prohibited-attr | serious | 3 | 0.5% | 2 | 2 | Ensure ARIA attributes are not prohibited for an element's role |
| image-alt | critical | 2 | 0.3% | 1 | 1 | Ensure |
| label | critical | 2 | 0.3% | 2 | 2 | Ensure every form element has a label |
Most common axe best-practice failures
| Rule | Impact | Failures | % of failures | Seen in models | Seen in test cases | Description |
|---|---|---|---|---|---|---|
| region | moderate | 610 | 44.4% | 8 | 32 | Ensure all page content is contained by landmarks |
| landmark-one-main | moderate | 508 | 37.0% | 8 | 29 | Ensure the document has a main landmark |
| heading-order | moderate | 103 | 7.5% | 7 | 11 | Ensure the order of headings is semantically correct |
| page-has-heading-one | moderate | 71 | 5.2% | 5 | 21 | Ensure that the page, or at least one of its frames contains a level-one heading |
| landmark-complementary-is-top-level | moderate | 35 | 2.5% | 3 | 8 | Ensure the complementary landmark or aside is at top level |
| landmark-unique | moderate | 24 | 1.7% | 3 | 6 | Ensure landmarks are unique |
| aria-dialog-name | serious | 12 | 0.9% | 3 | 3 | Ensure every ARIA dialog and alertdialog node has an accessible name |
| aria-allowed-role | minor | 7 | 0.5% | 1 | 6 | Ensure role attribute has an appropriate value for the element |
| image-redundant-alt | minor | 1 | 0.1% | 1 | 1 | Ensure image alternative is not repeated as text |
| label-title-only | serious | 1 | 0.1% | 1 | 1 | Ensure that every form element has a visible label and is not solely labeled using hidden labels, or the title or aria-describedby attributes |
Assertion-level patterns (per test case)
Checkbox Group | React | Dark
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Helper text is programmatically associated | R | 90% | 36 / 40 | 0 |
| Each checkbox group has a valid role | R | 70% | 28 / 40 | 0 |
| Each checkbox group has an accessible label | R | 70% | 28 / 40 | 0 |
| Space toggles checkbox state of each checkbox | R | 10% | 4 / 40 | 0 |
| Visible label is included in accessible name | R | 8% | 3 / 39 | 1 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Checkbox Group | React | Modern
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Helper text is programmatically associated | R | 98% | 39 / 40 | 0 |
| Each checkbox group has an accessible label | R | 70% | 28 / 40 | 0 |
| Each checkbox group has a valid role | R | 68% | 27 / 40 | 0 |
| Each checkbox has an accessible name | R | 5% | 2 / 40 | 0 |
| Visible label is included in accessible name | R | 5% | 2 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Checkbox Group | Vanilla JS | Dark
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Helper text is programmatically associated | R | 100% | 40 / 40 | 0 |
| Each checkbox group has an accessible label | R | 72% | 29 / 40 | 0 |
| Each checkbox group has a valid role | R | 70% | 28 / 40 | 0 |
| Each checkbox has an accessible name | R | 2% | 1 / 40 | 0 |
| Each checkbox is in the tab order | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Checkbox Group | Vanilla JS | Modern
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Helper text is programmatically associated | R | 90% | 36 / 40 | 0 |
| Each checkbox group has an accessible label | R | 52% | 21 / 40 | 0 |
| Each checkbox group has a valid role | R | 50% | 20 / 40 | 0 |
| Visible label is included in accessible name | R | 8% | 3 / 40 | 0 |
| Each checkbox has an accessible name | R | 5% | 2 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Disclosure Widget | React | Dark
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Collapsed content is hidden from everyone | R | 77% | 24 / 31 | 9 |
| All examples have a valid semantics | R | 18% | 7 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Disclosure Widget | React | Modern
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Collapsed content is hidden from everyone | R | 84% | 32 / 38 | 2 |
| All examples have a valid semantics | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Disclosure Widget | Vanilla JS | Dark
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Collapsed content is hidden from everyone | R | 44% | 16 / 36 | 4 |
| All examples have a valid semantics | R | 10% | 4 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Disclosure Widget | Vanilla JS | Modern
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Collapsed content is hidden from everyone | R | 57% | 20 / 35 | 5 |
| All examples have a valid semantics | R | 12% | 5 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Modal Dialog | React | Dark
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 92% | 37 / 40 | 0 |
| Each modal dialog takes focus when opened | R | 65% | 26 / 40 | 0 |
| Focus is not lost when each dialog closes | R | 60% | 24 / 40 | 0 |
| Each modal dialog traps keyboard focus | R | 52% | 21 / 40 | 0 |
| Each dialog can be closed by escape key | BP | 40% | 16 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Modal Dialog | React | Modern
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 85% | 34 / 40 | 0 |
| Each modal dialog takes focus when opened | R | 48% | 19 / 40 | 0 |
| Focus is not lost when each dialog closes | R | 42% | 17 / 40 | 0 |
| Each modal dialog traps keyboard focus | R | 32% | 13 / 40 | 0 |
| Each dialog can be closed by escape key | BP | 25% | 10 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Modal Dialog | Vanilla JS | Dark
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 80% | 32 / 40 | 0 |
| Each modal dialog takes focus when opened | R | 30% | 12 / 40 | 0 |
| Each dialog can be closed by escape key | BP | 22% | 9 / 40 | 0 |
| Each dialog has a dialog role | R | 22% | 9 / 40 | 0 |
| Focus is not lost when each dialog closes | R | 22% | 9 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Modal Dialog | Vanilla JS | Modern
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 68% | 27 / 40 | 0 |
| Each modal dialog takes focus when opened | R | 35% | 14 / 40 | 0 |
| Focus is not lost when each dialog closes | R | 20% | 8 / 40 | 0 |
| Each modal dialog traps keyboard focus | R | 18% | 7 / 40 | 0 |
| Each dialog can be closed by escape key | BP | 15% | 6 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Radio Button Group | React | Dark
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Required fields are indicated visually | R | 100% | 20 / 20 | 20 |
| Helper text is programmatically associated | R | 100% | 10 / 10 | 30 |
| Each radio group has an accessible label | R | 45% | 18 / 40 | 0 |
| Arrow keys change the selected radio within each group | R | 5% | 2 / 40 | 0 |
| Each radio group is keyboard reachable | R | 5% | 2 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Radio Button Group | React | Modern
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Required fields are indicated visually | R | 100% | 21 / 21 | 19 |
| Helper text is programmatically associated | R | 100% | 18 / 18 | 22 |
| Each radio group has an accessible label | R | 42% | 17 / 40 | 0 |
| Arrow keys change the selected radio within each group | R | 5% | 2 / 40 | 0 |
| Each radio group is keyboard reachable | R | 5% | 2 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Radio Button Group | Vanilla JS | Dark
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Required fields are indicated visually | R | 100% | 33 / 33 | 7 |
| Helper text is programmatically associated | R | 100% | 18 / 18 | 22 |
| Each radio group has an accessible label | R | 50% | 20 / 40 | 0 |
| Arrow keys change the selected radio within each group | R | 2% | 1 / 40 | 0 |
| Each radio group is keyboard reachable | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Radio Button Group | Vanilla JS | Modern
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Required fields are indicated visually | R | 100% | 35 / 35 | 5 |
| Helper text is programmatically associated | R | 100% | 14 / 14 | 26 |
| Each radio group has an accessible label | R | 25% | 10 / 40 | 0 |
| Arrow keys change the selected radio within each group | R | 2% | 1 / 40 | 0 |
| Each radio group is keyboard reachable | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Shopping Home Page | React | Dark
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Has a skip navigation link | R | 100% | 40 / 40 | 0 |
| Has a single maincontent | R | 52% | 21 / 40 | 0 |
| Has a single banner | R | 25% | 10 / 40 | 0 |
| Has single h1 | BP | 5% | 2 / 40 | 0 |
| Has an h1 | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Shopping Home Page | React | Modern
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Has a skip navigation link | R | 100% | 40 / 40 | 0 |
| Has a single maincontent | R | 40% | 16 / 40 | 0 |
| Has a single banner | R | 22% | 9 / 40 | 0 |
| Has a single footer | R | 10% | 4 / 40 | 0 |
| Has single h1 | BP | 5% | 2 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Shopping Home Page | Vanilla JS | Dark
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Has a skip navigation link | R | 100% | 40 / 40 | 0 |
| Has a single maincontent | R | 32% | 13 / 40 | 0 |
| Has a single banner | R | 10% | 4 / 40 | 0 |
| Has a single footer | R | 3% | 1 / 38 | 2 |
| Has at least one navigation | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Shopping Home Page | Vanilla JS | Modern
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Has a skip navigation link | R | 100% | 40 / 40 | 0 |
| Has a single maincontent | R | 35% | 14 / 40 | 0 |
| Has a single banner | R | 10% | 4 / 40 | 0 |
| Has a single footer | R | 3% | 1 / 33 | 7 |
| Has at least one navigation | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Simple Contact Form | React | Dark | Error Message Present
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 60% | 24 / 40 | 0 |
| Required fields are indicated programmatically | R | 30% | 11 / 37 | 3 |
| Required fields are indicated visually | R | 16% | 6 / 37 | 3 |
| Helper text is programmatically associated | R | 15% | 6 / 40 | 0 |
| Placeholder text is programmatically defined as a property | R | 0% | 0 / 38 | 2 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Simple Contact Form | React | Dark | No Error Message
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 52% | 21 / 40 | 0 |
| Required fields are indicated visually | R | 36% | 14 / 39 | 1 |
| Helper text is programmatically associated | R | 18% | 7 / 40 | 0 |
| Required fields are indicated programmatically | R | 8% | 3 / 39 | 1 |
| Visible label is included in accessible name | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Simple Contact Form | React | Modern | Error Message Present
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 52% | 21 / 40 | 0 |
| Required fields are indicated programmatically | R | 23% | 8 / 35 | 5 |
| Required fields are indicated visually | R | 23% | 8 / 35 | 5 |
| Each text input has an accessible name | R | 0% | 0 / 40 | 0 |
| Each text input has textbox role | R | 0% | 0 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Simple Contact Form | React | Modern | No Error Message
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 55% | 22 / 40 | 0 |
| Required fields are indicated visually | R | 44% | 17 / 39 | 1 |
| Helper text is programmatically associated | R | 12% | 5 / 40 | 0 |
| Required fields are indicated programmatically | R | 8% | 3 / 39 | 1 |
| Visible label is included in accessible name | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Simple Contact Form | Vanilla JS | Dark | Error Message Present
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Required fields are indicated visually | R | 46% | 18 / 39 | 1 |
| Inputs use appropriate autocomplete for purpose | R | 40% | 16 / 40 | 0 |
| Helper text is programmatically associated | R | 12% | 5 / 40 | 0 |
| Required fields are indicated programmatically | R | 8% | 3 / 39 | 1 |
| Visible label is included in accessible name | R | 8% | 3 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Simple Contact Form | Vanilla JS | Dark | No Error Message
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Required fields are indicated visually | R | 70% | 28 / 40 | 0 |
| Inputs use appropriate autocomplete for purpose | R | 40% | 16 / 40 | 0 |
| Helper text is programmatically associated | R | 15% | 6 / 40 | 0 |
| Visible label is included in accessible name | R | 8% | 3 / 40 | 0 |
| Required fields are indicated programmatically | R | 5% | 2 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Simple Contact Form | Vanilla JS | Modern | Error Message Present
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Required fields are indicated visually | R | 45% | 18 / 40 | 0 |
| Inputs use appropriate autocomplete for purpose | R | 38% | 15 / 40 | 0 |
| Visible label is included in accessible name | R | 12% | 5 / 40 | 0 |
| Helper text is programmatically associated | R | 8% | 3 / 40 | 0 |
| Required fields are indicated programmatically | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Simple Contact Form | Vanilla JS | Modern | No Error Message
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Required fields are indicated visually | R | 72% | 29 / 40 | 0 |
| Inputs use appropriate autocomplete for purpose | R | 38% | 15 / 40 | 0 |
| Helper text is programmatically associated | R | 15% | 6 / 40 | 0 |
| Visible label is included in accessible name | R | 12% | 5 / 40 | 0 |
| Visual labels are defined and persistent | R | 5% | 2 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Single Checkbox | React | Dark
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Helper text is programmatically associated | R | 74% | 29 / 39 | 1 |
| Required fields are indicated visually | R | 15% | 6 / 40 | 0 |
| ARIA attributes match native checkbox attributes if used | R | 2% | 1 / 40 | 0 |
| Checked state is programmatically exposed | R | 2% | 1 / 40 | 0 |
| Each checkbox has a valid role | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Single Checkbox | React | Modern
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Helper text is programmatically associated | R | 62% | 25 / 40 | 0 |
| Required fields are indicated visually | R | 18% | 7 / 40 | 0 |
| Visual labels are defined and persistent | R | 5% | 2 / 40 | 0 |
| Each checkbox has an accessible name | R | 2% | 1 / 40 | 0 |
| Space toggles checkbox state | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Single Checkbox | Vanilla JS | Dark
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Helper text is programmatically associated | R | 80% | 32 / 40 | 0 |
| Required fields are indicated visually | R | 5% | 2 / 40 | 0 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 0 / 40 | 0 |
| Checked state is programmatically exposed | R | 0% | 0 / 40 | 0 |
| Each checkbox has a valid role | R | 0% | 0 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Single Checkbox | Vanilla JS | Modern
| Assertion | Type | Failure rate | Failures / applicable | Not applicable |
|---|---|---|---|---|
| Helper text is programmatically associated | R | 74% | 29 / 39 | 1 |
| Required fields are indicated visually | R | 20% | 8 / 40 | 0 |
| Each checkbox has an accessible name | R | 2% | 1 / 40 | 0 |
| Each checkbox is keyboard reachable | R | 2% | 1 / 40 | 0 |
| Space toggles checkbox state | R | 2% | 1 / 40 | 0 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Methodology
- This report is not used for model training, and the testing is not comprehensive. Results should be interpreted as a targeted accessibility evaluation of the included prompt cases, not a complete assessment of model quality or accessibility behavior.
- Each test uses a prompt to generate HTML. The generated HTML is then tested for accessibility.
- This report intentionally omits direct links to the generated HTML samples. Screenshots and evaluation artifacts remain embedded here, and the generated content is available upon request from mfairchild@microsoft.com.
- The prompts intentionally do not include specific accessibility instructions. The goal is to see if the LLMs produce accessible HTML by default.
- All generations are agentic: Every sample (control, variant, and skill) is generated by the GitHub Copilot SDK as an agentic session running inside a Docker sandbox. The agent can call built-in tools (e.g. file writes, shell commands) and iteratively refine its output before submitting a final result.
- Control: The model receives the test prompt with no custom accessibility instructions. This measures baseline accessibility out of the box.
- Instruction-set variants: The same test prompt is used, but the agent session includes custom instructions (delivered via
.github/copilot-instructions.mdin the sandbox working directory). This measures how much custom guidance improves accessibility. - Skills: The agent runs a multi-turn conversation with explicit turn prompts (e.g. generate, then self-review). A skill directory containing a
SKILL.mdis mounted so the agent can reference it. - The resulting HTML is rendered in a browser via Playwright (Chromium). This allows the HTML's JavaScript and CSS to execute, which can impact accessibility.
- The rendered HTML is evaluated using axe-core to identify common accessibility issues.
- A custom test script (JavaScript) is executed against the rendered page to check for accessibility requirements that are specific to the test case and not covered by axe-core. These tests look for WCAG 2.2 failures and best practices. Best practices do not impact pass/fail results.
- Each test case is run multiple times (samples) to evaluate the consistency and reliability of the LLM's output.
- By default, the harness does not explicitly set a temperature, so each provider/model uses its own default sampling behavior.
Based on 32 prompt cases and 5 samples per case (160 samples per model), this run is roughly powered to detect model WCAG pass-rate differences of about 15.7 percentage points or larger in a two-model comparison (approximate 95% confidence, 80% power; assumes independent samples).
This is a planning heuristic, not a confidence interval reported by the harness. Repeated samples within the same prompt case are correlated, so the true detectable difference may be somewhat larger.
Output format instructions
Save your answer to `index.html`. Feel free to use separate CSS and JS files in the same directory.
All tests are automatic and deterministic (no human intervention). Only a fraction of accessibility requirements in WCAG can be covered in this way. Many requirements still need a human to evaluate. As such, these tests are not comprehensive. Even if a test passes, it may still fail WCAG and contain serious accessibility issues.
Please leave feedback, review the source code, and contribute test cases, assertions, and other improvements at the GitHub Project.
Contributors
This report and evaluation harness are maintained by Michael Fairchild. For questions about methodology, generated samples, or report interpretation, contact mfairchild@microsoft.com. Contributors include Scott O'hara, Aaron Gustafson, Shawn Lauriat, Dylan Isaac, and Cameron Cundiff. This project would not be possible without the support of the Microsoft Accessibility team and the GitHub Copilot SDK team.
Glossary
Column Definitions
- Rank: The position of the model when sorted by WCAG Pass Rate (lower is better).
- WCAG Pass Rate: The percentage of samples that passed all WCAG tests, including both axe-core WCAG checks and custom WCAG assertions. This does not include best practices.
- Not applicable assertion: An assertion result that indicates a check did not apply to that sample. It is tracked at the assertion level and does not change the sample-level pass-rate denominator.
- Avg Total WCAG Failures: The average number of total WCAG failures (axe-core + assertions) per sample for the model. This does not include best practices.
- Avg Axe WCAG Failures: The average number of axe-core detected WCAG failures per sample for the model. This does not include best practices.
- Avg Assertion WCAG Failures: The average number of custom WCAG assertion failures per sample for the model. This does not include best practices.
- Avg Best Practice Failures: The average number of best practice accessibility issues (informational only) per sample for the model. This includes axe-core best practices and best practice assertions.
Other Glossary Terms
- Assertion: A specific accessibility check defined in the test script. Each assertion checks for a particular accessibility requirement or best practice for the specific test case which is not already tested by axe.
- Axe-core: An open-source accessibility testing engine developed by Deque Systems. It is widely used for automated accessibility testing of web applications. Axe-core
- Pass@k: A metric that estimates the likelihood of at least one sample passing a test when k samples are randomly selected.
- WCAG: Web Content Accessibility Guidelines, a set of guidelines for making web content more accessible to people with disabilities.
- Test Case: A specific scenario designed to evaluate the accessibility of generated HTML content. Each test case includes a prompt, expected accessibility requirements, and a test script.
Change Log
5/2026 Update
- Runtime: Migrated the harness to the GitHub Copilot SDK. All generations now run as agentic Copilot sessions inside the project-owned Docker sandbox, with Copilot session logs captured per run.
- Artifacts & Evaluation: Added per-sample working directories and multi-file output support, and now evaluate generated artifacts by serving each sample over localhost HTTP so relative CSS, JavaScript, and other assets render under real browser conditions. Empty or invalid HTML is surfaced earlier as a generation failure.
- Skills & Report: Added multi-turn skills benchmarking, expanded the HTML report with richer agent conversation inspection and skill-specific summaries, and improved report detail visibility for generated samples.
- Providers & Portability: Improved BYOK provider support with dynamic credential commands such as api_key_cmd, refreshed auth and model documentation, and replaced bash-only helper scripts with Python equivalents for better cross-platform support.
- Test Coverage: Expanded and tightened accessibility assertions across grouped controls, modal dialogs, disclosure widgets, helper text, required-state detection, skip links, and assistive-technology visibility checks.
2/2026 Update
- Test Cases: Added a test case for a simple contact form with assertions for simple form controls. Also fixed some minor bugs in other test cases.
- Instruction Sets: Added instruction set evaluation.
- Report: Updated report layout and added new sections for instruction sets and analysis. Also allow filtering by instruction set and specific assertions within test cases.
Instruction Benchmarks (vs Control)
These results show how well each instruction set performs vs the control configuration (averaged across models). Instruction sets contain specific guidance intended to improve accessibility and are appended to the system prompt.
Several instruction sets are used in this benchmark to help identify which instructions are most effective at improving accessibility. Models are ranked by average WCAG pass rate across all models and test cases for that instruction set.
Summary (ranked by avg WCAG pass rate)
| Rank | Instruction Set | Avg Control Pass Rate* | Avg Instruction Set Pass Rate* | Δ Avg Pass Rate* |
|---|---|---|---|---|
| 1 | 1. Basic | 12% | 60% | +48.5pp |
| 2 | 0. Minimal | 12% | 37% | +24.8pp |
* Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.
Instruction benchmark details
This section includes per-model benchmark results and the full text of each instruction set.
Instruction sets
0. Minimal
Minimal reminder that all output must be accessible.
Variant samples per (test, model): 5
Generation mode: copilot_agent
All output MUST be accessible.
1. Basic
Basic reminder that all output must be accessible (includes slightly more instructions than minimal).
Variant samples per (test, model): 5
Generation mode: copilot_agent
--- description: "Accessibility coding rules: WCAG 2.2 AA conformance, semantic structure, keyboard support, focus management. Apply when writing or modifying UI code." applyTo: "**" --- # Accessibility instructions (standard) Conform to [WCAG 2.2 Level AA](https://www.w3.org/TR/WCAG22/). Do not claim output is "fully accessible" — state what was addressed and any known limitations. ## Implementation priority Use the first option that fits: 1. Existing accessible component in the project / design system. 2. A component library already in use. 3. Native platform semantics (`<button>`, `<a href>`, `<input>`, `<label>`, `<fieldset>`/`<legend>`, `<dialog>`, `<nav>`, `<main>`, `<h1>`–`<h6>`). 4. Native element + minimum necessary ARIA. 5. Fully custom ARIA widget — only when nothing above fits, and only with complete APG keyboard, focus, and state behavior. No ARIA is better than bad ARIA. Don't duplicate native semantics (no `role="button"` on `<button>`). Don't use `role="menu"` for site navigation. ## Structure - Use landmark elements (`<header>`, `<nav>`, `<main>`, `<footer>`). Exactly one `<main>`. Give duplicated landmarks unique accessible names. - One `<h1>` per view, typically first heading in `<main>`. Don't skip heading levels. - Set a descriptive `<title>`. - **Web pages only:** Provide a "Skip to main content" link as the first focusable element. ## Name, role, value - Every interactive element exposes an accurate accessible name. Role matches purpose. Dynamic states (pressed, expanded, selected, checked, disabled, invalid) stay in sync with visuals. - The accessible name MUST contain the visible label text. When multiple controls share a label (e.g. many "Remove" buttons), add context: `aria-label="Remove item: Socks"`. ## Keyboard and focus - All functionality can be achieved by both mouse an keyboard; tab order matches reading/visual order. - Focus is always visible — do not remove focus outlines without an equal-or-better replacement. - Avoid keyboard traps. - Escape should close overlays. - Static content MUST NOT be sequentially focusable. Use `tabindex="-1"` only for programmatic focus targets. - Content hidden from AT (`aria-hidden="true"`) MUST NOT be focusable. - Dialogs move focus in and restore it on close. - Composite widgets (tabs, listbox, menu, grid): one tab stop total; arrow keys move focus internally via roving `tabindex` or `aria-activedescendant`. ## Forms - Every form field has a visual and programmatic label (`<label for>` or wrapping `<label>`). Never rely on placeholder alone. - Associate help/error text via `aria-describedby`. - Group related options (checkboxes, radios) with `<fieldset>` + `<legend>`. - Required fields: visible indicator (e.g., `*`) AND `required` / `aria-required="true"`. Never color alone. This is a MUST when the form contains both required and optional fields. - Invalid fields: `aria-invalid="true"`; remove when corrected. Error messages explain how to fix. - On submit with invalid input, focus the first invalid control. Don't disable submit solely to prevent submission. ## Contrast and color - Text contrast ≥ 4.5:1 (≥ 3:1 for large text: 24px regular or 18.66px bold). - Focus indicators and key control boundaries ≥ 3:1 vs adjacent colors. - Never use color as the only cue for meaning (error, success, required, selected). - Use design tokens / CSS custom properties. Avoid `opacity`, `rgba`, `hsla` on text and essential affordances — contrast becomes background-dependent. - Ensure contrast in all states: default, hover, active, focus, visited, disabled. ## Forced colors / OS settings - Never override OS high-contrast, reduced-motion, or color-scheme preferences without good reason. - Do not use `forced-color-adjust: none` without good reason (e.g., data-viz where color needs to remain the same). - In `@media (forced-colors: active)`, use system color keywords (`ButtonText`, `ButtonBorder`, `CanvasText`, `Canvas`) — never fixed hex/RGB. - Use `currentColor` for SVG `fill`/`stroke` so icons inherit the foreground. - If relying on `box-shadow` for focus, add a transparent `outline` so focus renders in forced colors. ## Reflow (SC 1.4.10) - Content MUST be able to 320 CSS pixels wide without two-dimensional scrolling for multi-line text. - For multi-column layouts that are not necessary to convey meaning or important to the UX of the interface, content stacks; text wraps; controls remain operable. - Use fluid `flex`/`grid`. Set `max-width: 100%` on media, `min-width: 0` on flex/grid children, `overflow-wrap: anywhere` for long strings. - Exception: inherently 2D components (large tables, maps, charts, media, interfaces with toolbars or interfaces that require 2D layout) may scroll horizontally at component level; the surrounding view still reflows. ## Graphics - Informative `<img>` → meaningful `alt`. Decorative `<img>` → `alt=""`. - Informative `<svg>` → `role="img"` with `aria-label` / `aria-labelledby`. Decorative SVG/graphics → `aria-hidden="true"`. ## Navigation - Use `<nav>` with lists and links — not `role="menu"` / `role="menubar"`. - Expandable navigation: toggle uses `button[aria-expanded]`. Escape MAY close sub-navigations. ## Tables and grids - Static tabular data: `<table>` with `<th>` for column/row headers. - Use `role="grid"` only for genuinely interactive tabular UIs, with proper row/cell nesting and arrow-key navigation. ## Status messages - Announce dynamic updates (loading, success, failure, error, validation summaries) via `aria-live="polite"` or `aria-live="assertive"`. ## Final verification Before finalizing, verify: landmarks + one `<h1>`; keyboard operability with visible focus and no traps; visible labels included in accessible names; form labels + required + error association + focus-first-invalid; contrast thresholds; forced-colors adaptation; reflow at 320px; image alternatives; table header associations.
Results
| Model | Instruction Set | Control Pass Rate* | Instruction Set Pass Rate* | Δ Pass Rate* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0. Minimal | 3% | 11% | +8.1pp |
| Claude Haiku 4.5 | 1. Basic | 3% | 19% | +16.2pp |
| Claude Opus 4.7 | 0. Minimal | 14% | 42% | +27.5pp |
| Claude Opus 4.7 | 1. Basic | 14% | 64% | +49.4pp |
| Claude Sonnet 4.6 | 0. Minimal | 6% | 9% | +2.5pp |
| Claude Sonnet 4.6 | 1. Basic | 6% | 21% | +15.0pp |
| GPT-5.4 | 0. Minimal | 15% | 38% | +22.5pp |
| GPT-5.4 | 1. Basic | 15% | 84% | +69.4pp |
| GPT-5.4 Mini | 0. Minimal | 25% | 46% | +20.6pp |
| GPT-5.4 Mini | 1. Basic | 25% | 81% | +55.6pp |
| GPT-5.5 | 0. Minimal | 18% | 64% | +46.2pp |
| GPT-5.5 | 1. Basic | 18% | 89% | +71.9pp |
| Gemini 3 Flash Preview | 0. Minimal | 6% | 31% | +25.6pp |
| Gemini 3 Flash Preview | 1. Basic | 6% | 49% | +43.8pp |
| Gemini 3.1 Pro Preview | 0. Minimal | 8% | 53% | +45.0pp |
| Gemini 3.1 Pro Preview | 1. Basic | 8% | 75% | +66.9pp |
* Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.
Instruction set analysis vs control
This section highlights where each instruction set helped (or hurt) compared to the control, aggregated across all samples for that instruction set.
* Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.
0. Minimal — overall Δ pass rate +24.8pp
Overall: Control 12% (n=1280) → Variant 37% (n=1280). Avg WCAG failures/sample: 3.93 → 2.63 (Δ -1.30).
Most improved test cases
| Test case | Control pass rate* | Variant pass rate* | Δ pass rate* | Δ avg WCAG failures |
|---|---|---|---|---|
| Single Checkbox | Vanilla JS | Modern | 10% | 68% | +57.5pp | -0.92 |
| Single Checkbox | Vanilla JS | Dark | 5% | 60% | +55.0pp | -0.85 |
| Disclosure Widget | React | Dark | 22% | 72% | +50.0pp | -0.52 |
| Single Checkbox | React | Modern | 8% | 55% | +47.5pp | -0.60 |
| Checkbox Group | React | Modern | 0% | 42% | +42.5pp | -2.20 |
Most regressed test cases
| Test case | Control pass rate* | Variant pass rate* | Δ pass rate* | Δ avg WCAG failures |
|---|---|---|---|---|
| Simple Contact Form | Vanilla JS | Dark | Error Message Present | 22% | 18% | -5.0pp | -0.25 |
Most reduced axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| color-contrast | 41.6% | 35.0% | -6.6pp | Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds |
| button-name | 1.1% | 0.0% | -1.1pp | Ensure buttons have discernible text |
| link-name | 0.6% | 0.1% | -0.5pp | Ensure links have discernible text |
| link-in-text-block | 0.6% | 0.3% | -0.3pp | Ensure links are distinguished from surrounding text in a way that does not rely on color |
| image-alt | 0.2% | 0.0% | -0.2pp | Ensure <img> elements have alternative text or a role of none or presentation |
Most increased axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| aria-allowed-attr | 0.4% | 1.2% | +0.9pp | Ensure an element's role supports its ARIA attributes |
| aria-hidden-focus | 0.0% | 0.4% | +0.4pp | Ensure aria-hidden elements are not focusable nor contain focusable elements |
| aria-prohibited-attr | 0.2% | 0.5% | +0.3pp | Ensure ARIA attributes are not prohibited for an element's role |
| listitem | 0.4% | 0.6% | +0.2pp | Ensure <li> elements are used semantically |
| aria-conditional-attr | 0.0% | 0.1% | +0.1pp | Ensure ARIA attributes are used as described in the specification of the element's role |
Assertion analysis (vs control)
Failure rates are computed per assertion (within each test case) and compared between the variant and control.
Most improved assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| Single Checkbox | Vanilla JS | Dark | Helper text is programmatically associated | R | 80% | 10% | -70.0pp | 32 / 40 | 4 / 40 |
| Shopping Home Page | Vanilla JS | Dark | Has a skip navigation link | R | 100% | 30% | -70.0pp | 40 / 40 | 12 / 40 |
| Shopping Home Page | React | Dark | Has a skip navigation link | R | 100% | 32% | -67.5pp | 40 / 40 | 13 / 40 |
| Shopping Home Page | Vanilla JS | Modern | Has a skip navigation link | R | 100% | 35% | -65.0pp | 40 / 40 | 14 / 40 |
| Checkbox Group | Vanilla JS | Dark | Each checkbox group has an accessible label | R | 72% | 10% | -62.5pp | 29 / 40 | 4 / 40 |
| Shopping Home Page | React | Modern | Has a skip navigation link | R | 100% | 38% | -62.5pp | 40 / 40 | 15 / 40 |
| Single Checkbox | Vanilla JS | Modern | Helper text is programmatically associated | R | 74% | 12% | -61.9pp | 29 / 39 | 5 / 40 |
| Checkbox Group | Vanilla JS | Dark | Each checkbox group has a valid role | R | 70% | 10% | -60.0pp | 28 / 40 | 4 / 40 |
| Disclosure Widget | React | Modern | Collapsed content is hidden from everyone | R | 84% | 29% | -55.3pp | 32 / 38 | 11 / 38 |
| Disclosure Widget | React | Dark | Collapsed content is hidden from everyone | R | 77% | 24% | -53.1pp | 24 / 31 | 9 / 37 |
Most regressed assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| Single Checkbox | Vanilla JS | Dark | Required fields are indicated visually | R | 5% | 15% | +10.0pp | 2 / 40 | 6 / 40 |
| Shopping Home Page | React | Dark | Has a single footer | R | 0% | 5% | +5.0pp | 0 / 39 | 2 / 40 |
| Single Checkbox | React | Dark | Visible label is included in accessible name | R | 0% | 5% | +5.0pp | 0 / 39 | 2 / 40 |
| Single Checkbox | Vanilla JS | Modern | Visible label is included in accessible name | R | 2% | 8% | +5.0pp | 1 / 40 | 3 / 40 |
| Single Checkbox | React | Modern | Visible label is included in accessible name | R | 0% | 3% | +2.6pp | 0 / 38 | 1 / 39 |
| Modal Dialog | Vanilla JS | Dark | Each modal dialog takes focus when opened | R | 30% | 32% | +2.5pp | 12 / 40 | 13 / 40 |
| Checkbox Group | React | Dark | ARIA attributes match native checkbox attributes if used | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Checkbox Group | React | Dark | Each checkbox has a valid role | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Checkbox Group | React | Modern | Visual labels are defined and persistent | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Checkbox Group | Vanilla JS | Dark | Visual labels are defined and persistent | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
All assertion deltas (per test case)
Checkbox Group | React | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Helper text is programmatically associated | R | 90% | 44% | -46.4pp | 36 / 40 | 17 / 39 |
| Each checkbox group has an accessible label | R | 70% | 20% | -50.0pp | 28 / 40 | 8 / 40 |
| Each checkbox group has a valid role | R | 70% | 18% | -52.5pp | 28 / 40 | 7 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Each checkbox has a valid role | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Checked state is programmatically exposed | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Each checkbox has an accessible name | R | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
| Each checkbox is in the tab order | R | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
| Space toggles checkbox state of each checkbox | R | 10% | 2% | -7.5pp | 4 / 40 | 1 / 40 |
| Visual labels are defined and persistent | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 8% | 0% | -7.7pp | 3 / 39 | 0 / 40 |
| Required fields are indicated programmatically | R | - | 0% | - | 0 / 0 | 0 / 1 |
| Required fields are indicated visually | R | - | 0% | - | 0 / 0 | 0 / 1 |
Checkbox Group | React | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Helper text is programmatically associated | R | 98% | 53% | -44.9pp | 39 / 40 | 20 / 38 |
| Each checkbox group has an accessible label | R | 70% | 18% | -52.5pp | 28 / 40 | 7 / 40 |
| Each checkbox group has a valid role | R | 68% | 15% | -52.5pp | 27 / 40 | 6 / 40 |
| Visible label is included in accessible name | R | 5% | 3% | -2.4pp | 2 / 40 | 1 / 39 |
| Visual labels are defined and persistent | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox is in the tab order | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Space toggles checkbox state of each checkbox | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each checkbox has an accessible name | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | - | - | - | 0 / 0 | 0 / 0 |
| Required fields are indicated visually | R | - | - | - | 0 / 0 | 0 / 0 |
Checkbox Group | Vanilla JS | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Helper text is programmatically associated | R | 100% | 60% | -40.0pp | 40 / 40 | 24 / 40 |
| Each checkbox group has a valid role | R | 70% | 10% | -60.0pp | 28 / 40 | 4 / 40 |
| Each checkbox group has an accessible label | R | 72% | 10% | -62.5pp | 29 / 40 | 4 / 40 |
| Visual labels are defined and persistent | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Each checkbox has an accessible name | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Each checkbox is in the tab order | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Space toggles checkbox state of each checkbox | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Visible label is included in accessible name | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | - | - | - | 0 / 0 | 0 / 0 |
| Required fields are indicated visually | R | - | - | - | 0 / 0 | 0 / 0 |
Checkbox Group | Vanilla JS | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Helper text is programmatically associated | R | 90% | 57% | -32.5pp | 36 / 40 | 23 / 40 |
| Each checkbox group has an accessible label | R | 52% | 25% | -27.5pp | 21 / 40 | 10 / 40 |
| Each checkbox group has a valid role | R | 50% | 22% | -27.5pp | 20 / 40 | 9 / 40 |
| Each checkbox has an accessible name | R | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
| Each checkbox is in the tab order | R | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
| Space toggles checkbox state of each checkbox | R | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
| Visible label is included in accessible name | R | 8% | 2% | -5.0pp | 3 / 40 | 1 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | - | - | - | 0 / 0 | 0 / 0 |
| Required fields are indicated visually | R | - | - | - | 0 / 0 | 0 / 0 |
Disclosure Widget | React | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from everyone | R | 77% | 24% | -53.1pp | 24 / 31 | 9 / 37 |
| All examples have a valid semantics | R | 18% | 0% | -17.5pp | 7 / 40 | 0 / 40 |
Disclosure Widget | React | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from everyone | R | 84% | 29% | -55.3pp | 32 / 38 | 11 / 38 |
| All examples have a valid semantics | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
Disclosure Widget | Vanilla JS | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from everyone | R | 44% | 22% | -21.9pp | 16 / 36 | 9 / 40 |
| All examples have a valid semantics | R | 10% | 0% | -10.0pp | 4 / 40 | 0 / 40 |
Disclosure Widget | Vanilla JS | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from everyone | R | 57% | 25% | -32.1pp | 20 / 35 | 10 / 40 |
| All examples have a valid semantics | R | 12% | 0% | -12.5pp | 5 / 40 | 0 / 40 |
Modal Dialog | React | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 92% | 72% | -20.0pp | 37 / 40 | 29 / 40 |
| Each modal dialog takes focus when opened | R | 65% | 15% | -50.0pp | 26 / 40 | 6 / 40 |
| Focus is not lost when each dialog closes | R | 60% | 12% | -47.5pp | 24 / 40 | 5 / 40 |
| Each modal dialog traps keyboard focus | R | 52% | 8% | -45.0pp | 21 / 40 | 3 / 40 |
| Each dialog has a dialog role | R | 35% | 5% | -30.0pp | 14 / 40 | 2 / 40 |
| Each dialog can be closed by escape key | BP | 40% | 5% | -35.0pp | 16 / 40 | 2 / 40 |
| Closed dialogs are not exposed to assistive technology | R | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
Modal Dialog | React | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 85% | 78% | -7.5pp | 34 / 40 | 31 / 40 |
| Each modal dialog takes focus when opened | R | 48% | 22% | -25.0pp | 19 / 40 | 9 / 40 |
| Focus is not lost when each dialog closes | R | 42% | 20% | -22.5pp | 17 / 40 | 8 / 40 |
| Closed dialogs are not exposed to assistive technology | R | 8% | 8% | +0.0pp | 3 / 40 | 3 / 40 |
| Each dialog has a dialog role | R | 20% | 8% | -12.5pp | 8 / 40 | 3 / 40 |
| Each dialog can be closed by escape key | BP | 25% | 8% | -17.5pp | 10 / 40 | 3 / 40 |
| Each modal dialog traps keyboard focus | R | 32% | 8% | -25.0pp | 13 / 40 | 3 / 40 |
Modal Dialog | Vanilla JS | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 80% | 60% | -20.0pp | 32 / 40 | 24 / 40 |
| Each modal dialog takes focus when opened | R | 30% | 32% | +2.5pp | 12 / 40 | 13 / 40 |
| Each modal dialog traps keyboard focus | R | 18% | 8% | -10.0pp | 7 / 40 | 3 / 40 |
| Each dialog can be closed by escape key | BP | 22% | 5% | -17.5pp | 9 / 40 | 2 / 40 |
| Closed dialogs are not exposed to assistive technology | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Each dialog has a dialog role | R | 22% | 2% | -20.0pp | 9 / 40 | 1 / 40 |
| Focus is not lost when each dialog closes | R | 22% | 2% | -20.0pp | 9 / 40 | 1 / 40 |
Modal Dialog | Vanilla JS | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 68% | 52% | -15.0pp | 27 / 40 | 21 / 40 |
| Each modal dialog takes focus when opened | R | 35% | 30% | -5.0pp | 14 / 40 | 12 / 40 |
| Focus is not lost when each dialog closes | R | 20% | 20% | +0.0pp | 8 / 40 | 8 / 40 |
| Each dialog can be closed by escape key | BP | 15% | 15% | +0.0pp | 6 / 40 | 6 / 40 |
| Each dialog has a dialog role | R | 15% | 15% | +0.0pp | 6 / 40 | 6 / 40 |
| Each modal dialog traps keyboard focus | R | 18% | 10% | -7.5pp | 7 / 40 | 4 / 40 |
| Closed dialogs are not exposed to assistive technology | R | 2% | 5% | +2.5pp | 1 / 40 | 2 / 40 |
Radio Button Group | React | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Helper text is programmatically associated | R | 100% | 100% | +0.0pp | 10 / 10 | 4 / 4 |
| Required fields are indicated visually | R | 100% | 100% | +0.0pp | 20 / 20 | 30 / 30 |
| Each radio group has an accessible label | R | 45% | 12% | -32.5pp | 18 / 40 | 5 / 40 |
| ARIA attributes match native radio attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 20 | 0 / 30 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Arrow keys change the selected radio within each group | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Each radio group is keyboard reachable | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Each radio has an accessible name | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
Radio Button Group | React | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated visually | R | 100% | 100% | +0.0pp | 21 / 21 | 25 / 25 |
| Helper text is programmatically associated | R | 100% | 89% | -11.1pp | 18 / 18 | 8 / 9 |
| Each radio group has an accessible label | R | 42% | 5% | -37.5pp | 17 / 40 | 2 / 40 |
| ARIA attributes match native radio attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 21 | 0 / 25 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Arrow keys change the selected radio within each group | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Each radio group is keyboard reachable | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Each radio has an accessible name | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
Radio Button Group | Vanilla JS | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Helper text is programmatically associated | R | 100% | 100% | +0.0pp | 18 / 18 | 1 / 1 |
| Required fields are indicated visually | R | 100% | 97% | -2.8pp | 33 / 33 | 35 / 36 |
| Each radio group has an accessible label | R | 50% | 8% | -42.5pp | 20 / 40 | 3 / 40 |
| ARIA attributes match native radio attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 33 | 0 / 36 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Arrow keys change the selected radio within each group | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each radio group is keyboard reachable | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each radio has an accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
Radio Button Group | Vanilla JS | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Helper text is programmatically associated | R | 100% | 100% | +0.0pp | 14 / 14 | 7 / 7 |
| Required fields are indicated visually | R | 100% | 90% | -9.7pp | 35 / 35 | 28 / 31 |
| Each radio group has an accessible label | R | 25% | 5% | -20.0pp | 10 / 40 | 2 / 40 |
| Arrow keys change the selected radio within each group | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Each radio group is keyboard reachable | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Each radio has an accessible name | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Visible label is included in accessible name | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| ARIA attributes match native radio attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 35 | 0 / 31 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
Shopping Home Page | React | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a skip navigation link | R | 100% | 32% | -67.5pp | 40 / 40 | 13 / 40 |
| Has a single banner | R | 25% | 15% | -10.0pp | 10 / 40 | 6 / 40 |
| Has a single maincontent | R | 52% | 12% | -40.0pp | 21 / 40 | 5 / 40 |
| Has a single footer | R | 0% | 5% | +5.0pp | 0 / 39 | 2 / 40 |
| Has an h1 | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has at least one h2 | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has at least one navigation | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has single h1 | BP | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
Shopping Home Page | React | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a skip navigation link | R | 100% | 38% | -62.5pp | 40 / 40 | 15 / 40 |
| Has a single banner | R | 22% | 18% | -5.0pp | 9 / 40 | 7 / 40 |
| Has a single maincontent | R | 40% | 15% | -25.0pp | 16 / 40 | 6 / 40 |
| Has a single footer | R | 10% | 5% | -5.0pp | 4 / 40 | 2 / 40 |
| Has an h1 | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has at least one h2 | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has at least one navigation | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has single h1 | BP | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
Shopping Home Page | Vanilla JS | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a skip navigation link | R | 100% | 30% | -70.0pp | 40 / 40 | 12 / 40 |
| Has a single maincontent | R | 32% | 12% | -20.0pp | 13 / 40 | 5 / 40 |
| Has a single banner | R | 10% | 10% | +0.0pp | 4 / 40 | 4 / 40 |
| Has an h1 | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Has at least one h2 | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Has single h1 | BP | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Has at least one navigation | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has a single footer | R | 3% | 0% | -2.6pp | 1 / 38 | 0 / 37 |
Shopping Home Page | Vanilla JS | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a skip navigation link | R | 100% | 35% | -65.0pp | 40 / 40 | 14 / 40 |
| Has a single banner | R | 10% | 12% | +2.5pp | 4 / 40 | 5 / 40 |
| Has a single maincontent | R | 35% | 12% | -22.5pp | 14 / 40 | 5 / 40 |
| Has an h1 | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Has at least one h2 | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Has single h1 | BP | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Has at least one navigation | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has a single footer | R | 3% | 0% | -3.0pp | 1 / 33 | 0 / 39 |
Simple Contact Form | React | Dark | Error Message Present
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 60% | 45% | -15.0pp | 24 / 40 | 18 / 40 |
| Required fields are indicated visually | R | 16% | 10% | -6.0pp | 6 / 37 | 4 / 39 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 38 | 0 / 33 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 15% | 0% | -15.0pp | 6 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 30% | 0% | -29.7pp | 11 / 37 | 0 / 39 |
Simple Contact Form | React | Dark | No Error Message
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 52% | 48% | -5.0pp | 21 / 40 | 19 / 40 |
| Required fields are indicated visually | R | 36% | 25% | -10.9pp | 14 / 39 | 10 / 40 |
| Required fields are indicated programmatically | R | 8% | 2% | -5.2pp | 3 / 39 | 1 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 34 | 0 / 30 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 18% | 0% | -17.5pp | 7 / 40 | 0 / 40 |
Simple Contact Form | React | Modern | Error Message Present
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 52% | 48% | -5.0pp | 21 / 40 | 19 / 40 |
| Required fields are indicated visually | R | 23% | 15% | -7.5pp | 8 / 35 | 6 / 39 |
| Required fields are indicated programmatically | R | 23% | 5% | -17.7pp | 8 / 35 | 2 / 39 |
| Helper text is programmatically associated | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Visible label is included in accessible name | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Visual labels are defined and persistent | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 33 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
Simple Contact Form | React | Modern | No Error Message
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 55% | 35% | -20.0pp | 22 / 40 | 14 / 40 |
| Required fields are indicated visually | R | 44% | 32% | -11.1pp | 17 / 39 | 13 / 40 |
| Each text input has textbox role | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 36 | 0 / 32 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 8% | 0% | -7.7pp | 3 / 39 | 0 / 40 |
| Helper text is programmatically associated | R | 12% | 0% | -12.5pp | 5 / 40 | 0 / 40 |
Simple Contact Form | Vanilla JS | Dark | Error Message Present
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 40% | 35% | -5.0pp | 16 / 40 | 14 / 40 |
| Required fields are indicated visually | R | 46% | 35% | -11.2pp | 18 / 39 | 14 / 40 |
| Required fields are indicated programmatically | R | 8% | 8% | -0.2pp | 3 / 39 | 3 / 40 |
| Visible label is included in accessible name | R | 8% | 2% | -5.0pp | 3 / 40 | 1 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 35 | 0 / 28 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 12% | 0% | -12.5pp | 5 / 40 | 0 / 40 |
Simple Contact Form | Vanilla JS | Dark | No Error Message
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated visually | R | 70% | 35% | -35.0pp | 28 / 40 | 14 / 40 |
| Inputs use appropriate autocomplete for purpose | R | 40% | 32% | -7.5pp | 16 / 40 | 13 / 40 |
| Visible label is included in accessible name | R | 8% | 5% | -2.5pp | 3 / 40 | 2 / 40 |
| Helper text is programmatically associated | R | 15% | 2% | -12.5pp | 6 / 40 | 1 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 32 | 0 / 23 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
Simple Contact Form | Vanilla JS | Modern | Error Message Present
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated visually | R | 45% | 38% | -7.5pp | 18 / 40 | 15 / 40 |
| Inputs use appropriate autocomplete for purpose | R | 38% | 35% | -2.5pp | 15 / 40 | 14 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 36 | 0 / 28 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 8% | 0% | -7.5pp | 3 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 12% | 0% | -12.5pp | 5 / 40 | 0 / 40 |
Simple Contact Form | Vanilla JS | Modern | No Error Message
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated visually | R | 72% | 48% | -25.0pp | 29 / 40 | 19 / 40 |
| Inputs use appropriate autocomplete for purpose | R | 38% | 22% | -15.0pp | 15 / 40 | 9 / 40 |
| Helper text is programmatically associated | R | 15% | 2% | -12.5pp | 6 / 40 | 1 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 32 | 0 / 22 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 12% | 0% | -12.5pp | 5 / 40 | 0 / 40 |
Single Checkbox | React | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Helper text is programmatically associated | R | 74% | 25% | -49.4pp | 29 / 39 | 10 / 40 |
| Required fields are indicated visually | R | 15% | 12% | -2.5pp | 6 / 40 | 5 / 40 |
| Visible label is included in accessible name | R | 0% | 5% | +5.0pp | 0 / 39 | 2 / 40 |
| Required fields are indicated programmatically | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Checked state is programmatically exposed | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each checkbox has an accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each checkbox is keyboard reachable | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Space toggles checkbox state | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
Single Checkbox | React | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated visually | R | 18% | 15% | -2.5pp | 7 / 40 | 6 / 40 |
| Helper text is programmatically associated | R | 62% | 12% | -50.0pp | 25 / 40 | 5 / 40 |
| Visible label is included in accessible name | R | 0% | 3% | +2.6pp | 0 / 38 | 1 / 39 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Required fields are indicated programmatically | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Visual labels are defined and persistent | R | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox is keyboard reachable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has an accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Space toggles checkbox state | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
Single Checkbox | Vanilla JS | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated visually | R | 5% | 15% | +10.0pp | 2 / 40 | 6 / 40 |
| Helper text is programmatically associated | R | 80% | 10% | -70.0pp | 32 / 40 | 4 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox is keyboard reachable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Space toggles checkbox state | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
Single Checkbox | Vanilla JS | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Helper text is programmatically associated | R | 74% | 12% | -61.9pp | 29 / 39 | 5 / 40 |
| Required fields are indicated visually | R | 20% | 10% | -10.0pp | 8 / 40 | 4 / 40 |
| Visible label is included in accessible name | R | 2% | 8% | +5.0pp | 1 / 40 | 3 / 40 |
| Each checkbox has an accessible name | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Each checkbox is keyboard reachable | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Space toggles checkbox state | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
1. Basic — overall Δ pass rate +48.5pp
Overall: Control 12% (n=1280) → Variant 60% (n=1280). Avg WCAG failures/sample: 3.93 → 1.63 (Δ -2.30).
Most improved test cases
| Test case | Control pass rate* | Variant pass rate* | Δ pass rate* | Δ avg WCAG failures |
|---|---|---|---|---|
| Single Checkbox | Vanilla JS | Modern | 10% | 80% | +70.0pp | -1.32 |
| Single Checkbox | React | Modern | 8% | 78% | +70.0pp | -1.03 |
| Single Checkbox | Vanilla JS | Dark | 5% | 75% | +70.0pp | -0.88 |
| Checkbox Group | Vanilla JS | Dark | 0% | 68% | +67.5pp | -2.85 |
| Checkbox Group | React | Modern | 0% | 62% | +62.5pp | -2.20 |
Most reduced axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| color-contrast | 41.6% | 24.0% | -17.6pp | Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds |
| button-name | 1.1% | 0.0% | -1.1pp | Ensure buttons have discernible text |
| aria-required-children | 1.1% | 0.2% | -0.9pp | Ensure elements with an ARIA role that require child roles contain them |
| link-name | 0.6% | 0.0% | -0.6pp | Ensure links have discernible text |
| link-in-text-block | 0.6% | 0.2% | -0.4pp | Ensure links are distinguished from surrounding text in a way that does not rely on color |
Most increased axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| aria-prohibited-attr | 0.2% | 1.0% | +0.8pp | Ensure ARIA attributes are not prohibited for an element's role |
| aria-allowed-attr | 0.4% | 1.0% | +0.6pp | Ensure an element's role supports its ARIA attributes |
| aria-hidden-focus | 0.0% | 0.2% | +0.2pp | Ensure aria-hidden elements are not focusable nor contain focusable elements |
| definition-list | 0.0% | 0.1% | +0.1pp | Ensure <dl> elements are structured correctly |
| nested-interactive | 0.1% | 0.2% | +0.1pp | Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies |
Assertion analysis (vs control)
Failure rates are computed per assertion (within each test case) and compared between the variant and control.
Most improved assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| Radio Button Group | Vanilla JS | Dark | Helper text is programmatically associated | R | 100% | 0% | -100.0pp | 18 / 18 | 0 / 9 |
| Shopping Home Page | Vanilla JS | Dark | Has a skip navigation link | R | 100% | 0% | -100.0pp | 40 / 40 | 0 / 40 |
| Shopping Home Page | Vanilla JS | Modern | Has a skip navigation link | R | 100% | 5% | -95.0pp | 40 / 40 | 2 / 40 |
| Shopping Home Page | React | Dark | Has a skip navigation link | R | 100% | 8% | -92.5pp | 40 / 40 | 3 / 40 |
| Shopping Home Page | React | Modern | Has a skip navigation link | R | 100% | 8% | -92.5pp | 40 / 40 | 3 / 40 |
| Radio Button Group | React | Dark | Helper text is programmatically associated | R | 100% | 16% | -84.2pp | 10 / 10 | 3 / 19 |
| Single Checkbox | Vanilla JS | Dark | Helper text is programmatically associated | R | 80% | 2% | -77.5pp | 32 / 40 | 1 / 40 |
| Checkbox Group | Vanilla JS | Dark | Helper text is programmatically associated | R | 100% | 23% | -76.9pp | 40 / 40 | 9 / 39 |
| Disclosure Widget | React | Modern | Collapsed content is hidden from everyone | R | 84% | 8% | -75.9pp | 32 / 38 | 3 / 36 |
| Checkbox Group | React | Dark | Helper text is programmatically associated | R | 90% | 15% | -74.6pp | 36 / 40 | 6 / 39 |
Most regressed assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| Simple Contact Form | Vanilla JS | Dark | No Error Message | Required fields are indicated programmatically | R | 5% | 15% | +10.0pp | 2 / 40 | 6 / 40 |
| Single Checkbox | React | Modern | Visible label is included in accessible name | R | 0% | 8% | +7.5pp | 0 / 38 | 3 / 40 |
| Single Checkbox | Vanilla JS | Dark | Visible label is included in accessible name | R | 0% | 5% | +5.0pp | 0 / 40 | 2 / 40 |
| Radio Button Group | React | Modern | Required fields are indicated programmatically | R | 0% | 3% | +3.1pp | 0 / 21 | 1 / 32 |
| Single Checkbox | Vanilla JS | Modern | Visible label is included in accessible name | R | 2% | 5% | +2.6pp | 1 / 40 | 2 / 39 |
| Checkbox Group | React | Modern | ARIA attributes match native checkbox attributes if used | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Radio Button Group | React | Dark | ARIA attributes match native radio attributes if used | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Radio Button Group | React | Dark | Checked state is programmatically exposed | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Simple Contact Form | React | Dark | Error Message Present | Visible label is included in accessible name | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Simple Contact Form | Vanilla JS | Modern | Error Message Present | Required fields are indicated programmatically | R | 2% | 5% | +2.5pp | 1 / 40 | 2 / 40 |
All assertion deltas (per test case)
Checkbox Group | React | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated programmatically | R | - | 100% | - | 0 / 0 | 1 / 1 |
| Helper text is programmatically associated | R | 90% | 15% | -74.6pp | 36 / 40 | 6 / 39 |
| Each checkbox group has a valid role | R | 70% | 2% | -67.5pp | 28 / 40 | 1 / 40 |
| Each checkbox group has an accessible label | R | 70% | 2% | -67.5pp | 28 / 40 | 1 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each checkbox has an accessible name | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Each checkbox is in the tab order | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 8% | 0% | -7.7pp | 3 / 39 | 0 / 40 |
| Space toggles checkbox state of each checkbox | R | 10% | 0% | -10.0pp | 4 / 40 | 0 / 40 |
| Required fields are indicated visually | R | - | 0% | - | 0 / 0 | 0 / 1 |
Checkbox Group | React | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated programmatically | R | - | 100% | - | 0 / 0 | 1 / 1 |
| Helper text is programmatically associated | R | 98% | 25% | -72.5pp | 39 / 40 | 10 / 40 |
| Each checkbox group has an accessible label | R | 70% | 12% | -57.5pp | 28 / 40 | 5 / 40 |
| Each checkbox group has a valid role | R | 68% | 8% | -60.0pp | 27 / 40 | 3 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Each checkbox is in the tab order | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Space toggles checkbox state of each checkbox | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Each checkbox has an accessible name | R | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
| Visible label is included in accessible name | R | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated visually | R | - | 0% | - | 0 / 0 | 0 / 1 |
Checkbox Group | Vanilla JS | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Helper text is programmatically associated | R | 100% | 23% | -76.9pp | 40 / 40 | 9 / 39 |
| Each checkbox group has an accessible label | R | 72% | 2% | -70.0pp | 29 / 40 | 1 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has an accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each checkbox is in the tab order | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Space toggles checkbox state of each checkbox | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each checkbox group has a valid role | R | 70% | 0% | -70.0pp | 28 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | - | 0% | - | 0 / 0 | 0 / 1 |
| Required fields are indicated visually | R | - | 0% | - | 0 / 0 | 0 / 1 |
Checkbox Group | Vanilla JS | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Helper text is programmatically associated | R | 90% | 35% | -55.0pp | 36 / 40 | 14 / 40 |
| Each checkbox group has an accessible label | R | 52% | 5% | -47.5pp | 21 / 40 | 2 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has an accessible name | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Each checkbox is in the tab order | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Space toggles checkbox state of each checkbox | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 8% | 0% | -7.5pp | 3 / 40 | 0 / 40 |
| Each checkbox group has a valid role | R | 50% | 0% | -50.0pp | 20 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | - | - | - | 0 / 0 | 0 / 0 |
| Required fields are indicated visually | R | - | - | - | 0 / 0 | 0 / 0 |
Disclosure Widget | React | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from everyone | R | 77% | 20% | -57.4pp | 24 / 31 | 8 / 40 |
| All examples have a valid semantics | R | 18% | 0% | -17.5pp | 7 / 40 | 0 / 40 |
Disclosure Widget | React | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from everyone | R | 84% | 8% | -75.9pp | 32 / 38 | 3 / 36 |
| All examples have a valid semantics | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
Disclosure Widget | Vanilla JS | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from everyone | R | 44% | 11% | -33.9pp | 16 / 36 | 4 / 38 |
| All examples have a valid semantics | R | 10% | 0% | -10.0pp | 4 / 40 | 0 / 40 |
Disclosure Widget | Vanilla JS | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from everyone | R | 57% | 2% | -54.6pp | 20 / 35 | 1 / 40 |
| All examples have a valid semantics | R | 12% | 0% | -12.5pp | 5 / 40 | 0 / 40 |
Modal Dialog | React | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 92% | 40% | -52.5pp | 37 / 40 | 16 / 40 |
| Each dialog can be closed by escape key | BP | 40% | 8% | -32.5pp | 16 / 40 | 3 / 40 |
| Focus is not lost when each dialog closes | R | 60% | 8% | -52.5pp | 24 / 40 | 3 / 40 |
| Each modal dialog takes focus when opened | R | 65% | 5% | -60.0pp | 26 / 40 | 2 / 40 |
| Each dialog has a dialog role | R | 35% | 2% | -32.5pp | 14 / 40 | 1 / 40 |
| Each modal dialog traps keyboard focus | R | 52% | 2% | -50.0pp | 21 / 40 | 1 / 40 |
| Closed dialogs are not exposed to assistive technology | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
Modal Dialog | React | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 85% | 35% | -50.0pp | 34 / 40 | 14 / 40 |
| Each dialog can be closed by escape key | BP | 25% | 5% | -20.0pp | 10 / 40 | 2 / 40 |
| Focus is not lost when each dialog closes | R | 42% | 5% | -37.5pp | 17 / 40 | 2 / 40 |
| Each modal dialog takes focus when opened | R | 48% | 5% | -42.5pp | 19 / 40 | 2 / 40 |
| Each dialog has a dialog role | R | 20% | 2% | -17.5pp | 8 / 40 | 1 / 40 |
| Each modal dialog traps keyboard focus | R | 32% | 2% | -30.0pp | 13 / 40 | 1 / 40 |
| Closed dialogs are not exposed to assistive technology | R | 8% | 0% | -7.5pp | 3 / 40 | 0 / 40 |
Modal Dialog | Vanilla JS | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 80% | 10% | -70.0pp | 32 / 40 | 4 / 40 |
| Each modal dialog takes focus when opened | R | 30% | 5% | -25.0pp | 12 / 40 | 2 / 40 |
| Closed dialogs are not exposed to assistive technology | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each modal dialog traps keyboard focus | R | 18% | 0% | -17.5pp | 7 / 40 | 0 / 40 |
| Each dialog can be closed by escape key | BP | 22% | 0% | -22.5pp | 9 / 40 | 0 / 40 |
| Each dialog has a dialog role | R | 22% | 0% | -22.5pp | 9 / 40 | 0 / 40 |
| Focus is not lost when each dialog closes | R | 22% | 0% | -22.5pp | 9 / 40 | 0 / 40 |
Modal Dialog | Vanilla JS | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 68% | 15% | -52.5pp | 27 / 40 | 6 / 40 |
| Each dialog can be closed by escape key | BP | 15% | 5% | -10.0pp | 6 / 40 | 2 / 40 |
| Each dialog has a dialog role | R | 15% | 5% | -10.0pp | 6 / 40 | 2 / 40 |
| Each modal dialog traps keyboard focus | R | 18% | 5% | -12.5pp | 7 / 40 | 2 / 40 |
| Focus is not lost when each dialog closes | R | 20% | 5% | -15.0pp | 8 / 40 | 2 / 40 |
| Each modal dialog takes focus when opened | R | 35% | 5% | -30.0pp | 14 / 40 | 2 / 40 |
| Closed dialogs are not exposed to assistive technology | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
Radio Button Group | React | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated visually | R | 100% | 58% | -42.4pp | 20 / 20 | 19 / 33 |
| Helper text is programmatically associated | R | 100% | 16% | -84.2pp | 10 / 10 | 3 / 19 |
| Each radio group has an accessible label | R | 45% | 8% | -37.5pp | 18 / 40 | 3 / 40 |
| ARIA attributes match native radio attributes if used | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Checked state is programmatically exposed | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Arrow keys change the selected radio within each group | R | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
| Each radio group is keyboard reachable | R | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
| Each radio has an accessible name | R | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 20 | 0 / 32 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
Radio Button Group | React | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated visually | R | 100% | 52% | -48.5pp | 21 / 21 | 17 / 33 |
| Helper text is programmatically associated | R | 100% | 26% | -73.9pp | 18 / 18 | 6 / 23 |
| Required fields are indicated programmatically | R | 0% | 3% | +3.1pp | 0 / 21 | 1 / 32 |
| Each radio group has an accessible label | R | 42% | 2% | -40.0pp | 17 / 40 | 1 / 40 |
| ARIA attributes match native radio attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Arrow keys change the selected radio within each group | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Each radio group is keyboard reachable | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Each radio has an accessible name | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
Radio Button Group | Vanilla JS | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated visually | R | 100% | 56% | -44.4pp | 33 / 33 | 20 / 36 |
| Each radio group has an accessible label | R | 50% | 10% | -40.0pp | 20 / 40 | 4 / 40 |
| Arrow keys change the selected radio within each group | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| ARIA attributes match native radio attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 33 | 0 / 36 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each radio group is keyboard reachable | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each radio has an accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 100% | 0% | -100.0pp | 18 / 18 | 0 / 9 |
Radio Button Group | Vanilla JS | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated visually | R | 100% | 65% | -35.1pp | 35 / 35 | 24 / 37 |
| Helper text is programmatically associated | R | 100% | 27% | -72.7pp | 14 / 14 | 3 / 11 |
| Each radio group has an accessible label | R | 25% | 10% | -15.0pp | 10 / 40 | 4 / 40 |
| ARIA attributes match native radio attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 35 | 0 / 37 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Arrow keys change the selected radio within each group | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each radio group is keyboard reachable | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each radio has an accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
Shopping Home Page | React | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a single banner | R | 25% | 8% | -17.5pp | 10 / 40 | 3 / 40 |
| Has a skip navigation link | R | 100% | 8% | -92.5pp | 40 / 40 | 3 / 40 |
| Has a single maincontent | R | 52% | 5% | -47.5pp | 21 / 40 | 2 / 40 |
| Has an h1 | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Has at least one h2 | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Has at least one navigation | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Has single h1 | BP | 5% | 2% | -2.5pp | 2 / 40 | 1 / 40 |
| Has a single footer | R | 0% | 0% | +0.0pp | 0 / 39 | 0 / 39 |
Shopping Home Page | React | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a skip navigation link | R | 100% | 8% | -92.5pp | 40 / 40 | 3 / 40 |
| Has a single banner | R | 22% | 5% | -17.5pp | 9 / 40 | 2 / 40 |
| Has an h1 | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has at least one h2 | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has at least one navigation | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has single h1 | BP | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Has a single footer | R | 10% | 0% | -10.0pp | 4 / 40 | 0 / 40 |
| Has a single maincontent | R | 40% | 0% | -40.0pp | 16 / 40 | 0 / 40 |
Shopping Home Page | Vanilla JS | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a single banner | R | 10% | 2% | -7.5pp | 4 / 40 | 1 / 40 |
| Has an h1 | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Has at least one h2 | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Has single h1 | BP | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Has at least one navigation | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has a single footer | R | 3% | 0% | -2.6pp | 1 / 38 | 0 / 40 |
| Has a single maincontent | R | 32% | 0% | -32.5pp | 13 / 40 | 0 / 40 |
| Has a skip navigation link | R | 100% | 0% | -100.0pp | 40 / 40 | 0 / 40 |
Shopping Home Page | Vanilla JS | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a single banner | R | 10% | 10% | +0.0pp | 4 / 40 | 4 / 40 |
| Has a skip navigation link | R | 100% | 5% | -95.0pp | 40 / 40 | 2 / 40 |
| Has a single maincontent | R | 35% | 2% | -32.5pp | 14 / 40 | 1 / 40 |
| Has an h1 | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Has at least one h2 | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Has single h1 | BP | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Has at least one navigation | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Has a single footer | R | 3% | 0% | -3.0pp | 1 / 33 | 0 / 40 |
Simple Contact Form | React | Dark | Error Message Present
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 60% | 30% | -30.0pp | 24 / 40 | 12 / 40 |
| Required fields are indicated programmatically | R | 30% | 5% | -24.7pp | 11 / 37 | 2 / 40 |
| Visible label is included in accessible name | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 38 | 0 / 22 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 15% | 0% | -15.0pp | 6 / 40 | 0 / 40 |
| Required fields are indicated visually | R | 16% | 0% | -16.2pp | 6 / 37 | 0 / 40 |
Simple Contact Form | React | Dark | No Error Message
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 52% | 30% | -22.5pp | 21 / 40 | 12 / 40 |
| Required fields are indicated programmatically | R | 8% | 8% | -0.2pp | 3 / 39 | 3 / 40 |
| Required fields are indicated visually | R | 36% | 5% | -30.9pp | 14 / 39 | 2 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 34 | 0 / 20 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 18% | 0% | -17.5pp | 7 / 40 | 0 / 40 |
Simple Contact Form | React | Modern | Error Message Present
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 52% | 30% | -22.5pp | 21 / 40 | 12 / 40 |
| Required fields are indicated programmatically | R | 23% | 2% | -20.4pp | 8 / 35 | 1 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 24 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated visually | R | 23% | 0% | -22.9pp | 8 / 35 | 0 / 40 |
Simple Contact Form | React | Modern | No Error Message
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 55% | 32% | -22.5pp | 22 / 40 | 13 / 40 |
| Required fields are indicated programmatically | R | 8% | 5% | -2.7pp | 3 / 39 | 2 / 40 |
| Required fields are indicated visually | R | 44% | 5% | -38.6pp | 17 / 39 | 2 / 40 |
| Visual labels are defined and persistent | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 36 | 0 / 20 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 39 |
| Helper text is programmatically associated | R | 12% | 0% | -12.5pp | 5 / 40 | 0 / 40 |
Simple Contact Form | Vanilla JS | Dark | Error Message Present
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 40% | 28% | -12.5pp | 16 / 40 | 11 / 40 |
| Required fields are indicated programmatically | R | 8% | 10% | +2.3pp | 3 / 39 | 4 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 35 | 0 / 18 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 8% | 0% | -7.5pp | 3 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 12% | 0% | -12.5pp | 5 / 40 | 0 / 40 |
| Required fields are indicated visually | R | 46% | 0% | -46.2pp | 18 / 39 | 0 / 40 |
Simple Contact Form | Vanilla JS | Dark | No Error Message
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 40% | 30% | -10.0pp | 16 / 40 | 12 / 40 |
| Required fields are indicated programmatically | R | 5% | 15% | +10.0pp | 2 / 40 | 6 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 32 | 0 / 12 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 8% | 0% | -7.5pp | 3 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 15% | 0% | -15.0pp | 6 / 40 | 0 / 40 |
| Required fields are indicated visually | R | 70% | 0% | -70.0pp | 28 / 40 | 0 / 40 |
Simple Contact Form | Vanilla JS | Modern | Error Message Present
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 38% | 22% | -15.0pp | 15 / 40 | 9 / 40 |
| Required fields are indicated programmatically | R | 2% | 5% | +2.5pp | 1 / 40 | 2 / 40 |
| Visual labels are defined and persistent | R | 2% | 2% | +0.0pp | 1 / 40 | 1 / 40 |
| Visible label is included in accessible name | R | 12% | 2% | -10.0pp | 5 / 40 | 1 / 40 |
| Required fields are indicated visually | R | 45% | 2% | -42.5pp | 18 / 40 | 1 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 36 | 0 / 13 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 8% | 0% | -7.5pp | 3 / 40 | 0 / 40 |
Simple Contact Form | Vanilla JS | Modern | No Error Message
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 38% | 30% | -7.5pp | 15 / 40 | 12 / 40 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 32 | 0 / 18 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 12% | 0% | -12.5pp | 5 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 15% | 0% | -15.0pp | 6 / 40 | 0 / 40 |
| Required fields are indicated visually | R | 72% | 0% | -72.5pp | 29 / 40 | 0 / 40 |
Single Checkbox | React | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Required fields are indicated visually | R | 15% | 10% | -5.0pp | 6 / 40 | 4 / 40 |
| Space toggles checkbox state | R | 2% | 5% | +2.5pp | 1 / 40 | 2 / 40 |
| Helper text is programmatically associated | R | 74% | 2% | -71.9pp | 29 / 39 | 1 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visible label is included in accessible name | R | 0% | 0% | +0.0pp | 0 / 39 | 0 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each checkbox has an accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each checkbox is keyboard reachable | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
Single Checkbox | React | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Visible label is included in accessible name | R | 0% | 8% | +7.5pp | 0 / 38 | 3 / 40 |
| Required fields are indicated visually | R | 18% | 5% | -12.5pp | 7 / 40 | 2 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox is keyboard reachable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has an accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Space toggles checkbox state | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 5% | 0% | -5.0pp | 2 / 40 | 0 / 40 |
| Helper text is programmatically associated | R | 62% | 0% | -62.5pp | 25 / 40 | 0 / 40 |
Single Checkbox | Vanilla JS | Dark
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Visible label is included in accessible name | R | 0% | 5% | +5.0pp | 0 / 40 | 2 / 40 |
| Required fields are indicated visually | R | 5% | 5% | +0.0pp | 2 / 40 | 2 / 40 |
| Helper text is programmatically associated | R | 80% | 2% | -77.5pp | 32 / 40 | 1 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has an accessible name | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox is keyboard reachable | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Space toggles checkbox state | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
Single Checkbox | Vanilla JS | Modern
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Visible label is included in accessible name | R | 2% | 5% | +2.6pp | 1 / 40 | 2 / 39 |
| Required fields are indicated visually | R | 20% | 5% | -15.0pp | 8 / 40 | 2 / 40 |
| Visual labels are defined and persistent | R | 0% | 2% | +2.5pp | 0 / 40 | 1 / 40 |
| Helper text is programmatically associated | R | 74% | 2% | -71.9pp | 29 / 39 | 1 / 40 |
| ARIA attributes match native checkbox attributes if used | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Checked state is programmatically exposed | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has a valid role | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Required fields are indicated programmatically | R | 0% | 0% | +0.0pp | 0 / 40 | 0 / 40 |
| Each checkbox has an accessible name | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Each checkbox is keyboard reachable | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
| Space toggles checkbox state | R | 2% | 0% | -2.5pp | 1 / 40 | 0 / 40 |
Skills (vs Control)
Skills are self-contained packages (a directory containing SKILL.md and any support files) that are mounted into the sandboxed agent at runtime. Each skill defines its own multi-turn conversation; the agent's submission at the end of each turn is evaluated separately so we can compare how each turn performs against control.
Note on interpretation. Turn 1 is a single-turn generation directly comparable to control. Later turns operate on prior context, so their Δ reflects both the skill package content and the effect of having a review opportunity.
* Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.
Building Accessible UI
A skill that implicitly steers generation toward accessible HTML and can be explicitly invoked to review and remediate previously produced HTML.
| Rank | Model | Control* | Generate* | Review* | Δ last vs control* | Δ last vs turn 1* |
|---|---|---|---|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 8% | 91% | 96% | +88.1pp | +5.6pp |
| 2 | GPT-5.4 Mini | 25% | 93% | 94% | +69.4pp | +1.3pp |
| 3 | GPT-5.4 | 15% | 91% | 92% | +76.9pp | +0.6pp |
| 4 | Claude Opus 4.7 | 14% | 91% | 91% | +76.9pp | +0.0pp |
| 5 | Gemini 3 Flash Preview | 6% | 78% | 91% | +85.0pp | +13.1pp |
| 6 | GPT-5.5 | 18% | 86% | 89% | +71.2pp | +3.1pp |
| 7 | Claude Sonnet 4.6 | 6% | 76% | 83% | +76.9pp | +6.9pp |
| 8 | Claude Haiku 4.5 | 3% | 52% | 56% | +52.5pp | +3.1pp |
Pass rate by test case
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 80% | 80% | +80.0pp |
| Claude Opus 4.7 | 0% | 100% | 80% | +80.0pp |
| Claude Sonnet 4.6 | 0% | 100% | 100% | +100.0pp |
| Gemini 3 Flash Preview | 0% | 100% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 Mini | 0% | 100% | 100% | +100.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 80% | 80% | +80.0pp |
| Claude Opus 4.7 | 0% | 100% | 100% | +100.0pp |
| Claude Sonnet 4.6 | 0% | 80% | 80% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 100% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 Mini | 0% | 100% | 100% | +100.0pp |
| GPT-5.5 | 40% | 100% | 100% | +60.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 40% | 40% | +40.0pp |
| Claude Opus 4.7 | 0% | 80% | 80% | +80.0pp |
| Claude Sonnet 4.6 | 0% | 100% | 100% | +100.0pp |
| Gemini 3 Flash Preview | 0% | 80% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 0% | 80% | 80% | +80.0pp |
| GPT-5.4 Mini | 0% | 100% | 100% | +100.0pp |
| GPT-5.5 | 80% | 100% | 100% | +20.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 60% | 60% | +60.0pp |
| Claude Opus 4.7 | 0% | 80% | 80% | +80.0pp |
| Claude Sonnet 4.6 | 0% | 80% | 80% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 60% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 0% | 80% | 100% | +100.0pp |
| GPT-5.4 Mini | 0% | 80% | 80% | +80.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 20% | 80% | 100% | +80.0pp |
| Claude Opus 4.7 | 0% | 100% | 100% | +100.0pp |
| Claude Sonnet 4.6 | 40% | 100% | 100% | +60.0pp |
| Gemini 3 Flash Preview | 100% | 40% | 100% | +0.0pp |
| Gemini 3.1 Pro Preview | 100% | 100% | 100% | +0.0pp |
| GPT-5.4 | 40% | 100% | 100% | +60.0pp |
| GPT-5.4 Mini | 80% | 100% | 100% | +20.0pp |
| GPT-5.5 | 20% | 100% | 100% | +80.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 60% | 60% | +60.0pp |
| Claude Opus 4.7 | 0% | 80% | 80% | +80.0pp |
| Claude Sonnet 4.6 | 20% | 100% | 100% | +80.0pp |
| Gemini 3 Flash Preview | 60% | 60% | 100% | +40.0pp |
| Gemini 3.1 Pro Preview | 60% | 100% | 100% | +40.0pp |
| GPT-5.4 | 20% | 100% | 100% | +80.0pp |
| GPT-5.4 Mini | 80% | 100% | 100% | +20.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 20% | 60% | 80% | +60.0pp |
| Claude Opus 4.7 | 0% | 100% | 100% | +100.0pp |
| Claude Sonnet 4.6 | 0% | 100% | 100% | +100.0pp |
| Gemini 3 Flash Preview | 20% | 60% | 100% | +80.0pp |
| Gemini 3.1 Pro Preview | 0% | 80% | 100% | +100.0pp |
| GPT-5.4 | 40% | 100% | 100% | +60.0pp |
| GPT-5.4 Mini | 100% | 100% | 100% | +0.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 60% | 80% | 100% | +40.0pp |
| Claude Opus 4.7 | 0% | 100% | 100% | +100.0pp |
| Claude Sonnet 4.6 | 0% | 80% | 80% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 100% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 Mini | 40% | 80% | 100% | +60.0pp |
| GPT-5.5 | 20% | 100% | 100% | +80.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 60% | 60% | +60.0pp |
| Claude Opus 4.7 | 0% | 80% | 100% | +100.0pp |
| Claude Sonnet 4.6 | 0% | 80% | 80% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 60% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 80% | 100% | 100% | +20.0pp |
| GPT-5.4 Mini | 60% | 100% | 100% | +40.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 60% | 60% | +60.0pp |
| Claude Opus 4.7 | 0% | 100% | 100% | +100.0pp |
| Claude Sonnet 4.6 | 0% | 80% | 100% | +100.0pp |
| Gemini 3 Flash Preview | 0% | 40% | 80% | +80.0pp |
| Gemini 3.1 Pro Preview | 20% | 80% | 100% | +80.0pp |
| GPT-5.4 | 40% | 100% | 100% | +60.0pp |
| GPT-5.4 Mini | 80% | 80% | 100% | +20.0pp |
| GPT-5.5 | 40% | 100% | 100% | +60.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 40% | 40% | +40.0pp |
| Claude Opus 4.7 | 0% | 100% | 100% | +100.0pp |
| Claude Sonnet 4.6 | 0% | 40% | 40% | +40.0pp |
| Gemini 3 Flash Preview | 0% | 100% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 20% | 100% | 100% | +80.0pp |
| GPT-5.4 Mini | 0% | 100% | 100% | +100.0pp |
| GPT-5.5 | 20% | 80% | 100% | +80.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 60% | 80% | +80.0pp |
| Claude Opus 4.7 | 0% | 100% | 100% | +100.0pp |
| Claude Sonnet 4.6 | 0% | 80% | 80% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 80% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 60% | 100% | +100.0pp |
| GPT-5.4 | 20% | 100% | 100% | +80.0pp |
| GPT-5.4 Mini | 80% | 100% | 100% | +20.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 60% | 60% | +60.0pp |
| Claude Opus 4.7 | 0% | 100% | 100% | +100.0pp |
| Claude Sonnet 4.6 | 20% | 60% | 80% | +60.0pp |
| Gemini 3 Flash Preview | 0% | 80% | 80% | +80.0pp |
| Gemini 3.1 Pro Preview | 0% | 80% | 80% | +80.0pp |
| GPT-5.4 | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 Mini | 20% | 100% | 100% | +80.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 80% | 80% | +80.0pp |
| Claude Opus 4.7 | 0% | 100% | 100% | +100.0pp |
| Claude Sonnet 4.6 | 0% | 100% | 100% | +100.0pp |
| Gemini 3 Flash Preview | 0% | 60% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 Mini | 0% | 100% | 100% | +100.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 60% | 60% | +60.0pp |
| Claude Opus 4.7 | 60% | 100% | 100% | +40.0pp |
| Claude Sonnet 4.6 | 0% | 60% | 60% | +60.0pp |
| Gemini 3 Flash Preview | 0% | 40% | 40% | +40.0pp |
| Gemini 3.1 Pro Preview | 20% | 80% | 100% | +80.0pp |
| GPT-5.4 | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 Mini | 20% | 100% | 100% | +80.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 60% | 80% | +80.0pp |
| Claude Opus 4.7 | 40% | 100% | 100% | +60.0pp |
| Claude Sonnet 4.6 | 0% | 60% | 60% | +60.0pp |
| Gemini 3 Flash Preview | 0% | 100% | 80% | +80.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 Mini | 20% | 100% | 100% | +80.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 80% | 80% | +80.0pp |
| Claude Opus 4.7 | 0% | 60% | 60% | +60.0pp |
| Claude Sonnet 4.6 | 0% | 40% | 80% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 100% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 80% | 80% | +80.0pp |
| GPT-5.4 | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 Mini | 0% | 100% | 100% | +100.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 20% | 20% | +20.0pp |
| Claude Opus 4.7 | 0% | 60% | 60% | +60.0pp |
| Claude Sonnet 4.6 | 0% | 20% | 40% | +40.0pp |
| Gemini 3 Flash Preview | 0% | 100% | 80% | +80.0pp |
| Gemini 3.1 Pro Preview | 0% | 80% | 80% | +80.0pp |
| GPT-5.4 | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 Mini | 0% | 100% | 100% | +100.0pp |
| GPT-5.5 | 0% | 80% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 40% | 40% | +40.0pp |
| Claude Opus 4.7 | 0% | 80% | 80% | +80.0pp |
| Claude Sonnet 4.6 | 0% | 80% | 100% | +100.0pp |
| Gemini 3 Flash Preview | 0% | 80% | 80% | +80.0pp |
| Gemini 3.1 Pro Preview | 0% | 60% | 100% | +100.0pp |
| GPT-5.4 | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 Mini | 0% | 80% | 80% | +80.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 20% | 20% | +20.0pp |
| Claude Opus 4.7 | 0% | 60% | 60% | +60.0pp |
| Claude Sonnet 4.6 | 0% | 20% | 40% | +40.0pp |
| Gemini 3 Flash Preview | 0% | 40% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 0% | 80% | 100% | +100.0pp |
| GPT-5.4 Mini | 0% | 100% | 100% | +100.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 0% | 0% | +0.0pp |
| Claude Opus 4.7 | 100% | 100% | 100% | +0.0pp |
| Claude Sonnet 4.6 | 0% | 80% | 80% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 100% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 80% | 100% | +100.0pp |
| GPT-5.4 | 0% | 80% | 80% | +80.0pp |
| GPT-5.4 Mini | 40% | 80% | 80% | +40.0pp |
| GPT-5.5 | 40% | 60% | 80% | +40.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 40% | 40% | +40.0pp |
| Claude Opus 4.7 | 0% | 60% | 60% | +60.0pp |
| Claude Sonnet 4.6 | 0% | 80% | 80% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 80% | 80% | +80.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 20% | 20% | 20% | +0.0pp |
| GPT-5.4 Mini | 0% | 80% | 80% | +80.0pp |
| GPT-5.5 | 20% | 0% | 0% | -20.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 20% | 20% | +20.0pp |
| Claude Opus 4.7 | 80% | 100% | 100% | +20.0pp |
| Claude Sonnet 4.6 | 20% | 80% | 80% | +60.0pp |
| Gemini 3 Flash Preview | 0% | 80% | 60% | +60.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 80% | 100% | 100% | +20.0pp |
| GPT-5.4 Mini | 0% | 60% | 60% | +60.0pp |
| GPT-5.5 | 0% | 40% | 80% | +80.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 60% | 60% | +60.0pp |
| Claude Opus 4.7 | 0% | 100% | 100% | +100.0pp |
| Claude Sonnet 4.6 | 20% | 100% | 100% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 80% | 80% | +80.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 0% | 80% | 80% | +80.0pp |
| GPT-5.4 Mini | 40% | 60% | 60% | +20.0pp |
| GPT-5.5 | 0% | 60% | 60% | +60.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 80% | 80% | +80.0pp |
| Claude Opus 4.7 | 0% | 100% | 100% | +100.0pp |
| Claude Sonnet 4.6 | 0% | 60% | 80% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 80% | 80% | +80.0pp |
| Gemini 3.1 Pro Preview | 0% | 80% | 80% | +80.0pp |
| GPT-5.4 | 0% | 40% | 40% | +40.0pp |
| GPT-5.4 Mini | 20% | 100% | 100% | +80.0pp |
| GPT-5.5 | 80% | 40% | 40% | -40.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 20% | 40% | +40.0pp |
| Claude Opus 4.7 | 40% | 100% | 100% | +60.0pp |
| Claude Sonnet 4.6 | 0% | 100% | 100% | +100.0pp |
| Gemini 3 Flash Preview | 0% | 60% | 80% | +80.0pp |
| Gemini 3.1 Pro Preview | 0% | 80% | 80% | +80.0pp |
| GPT-5.4 | 40% | 80% | 80% | +40.0pp |
| GPT-5.4 Mini | 40% | 100% | 100% | +60.0pp |
| GPT-5.5 | 40% | 20% | 20% | -20.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 60% | 60% | +60.0pp |
| Claude Opus 4.7 | 40% | 100% | 100% | +60.0pp |
| Claude Sonnet 4.6 | 20% | 80% | 100% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 80% | 80% | +80.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 20% | 80% | 80% | +60.0pp |
| GPT-5.4 Mini | 40% | 100% | 100% | +60.0pp |
| GPT-5.5 | 80% | 80% | 80% | +0.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 20% | 20% | +20.0pp |
| Claude Opus 4.7 | 0% | 80% | 80% | +80.0pp |
| Claude Sonnet 4.6 | 20% | 100% | 100% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 80% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 80% | 80% | +80.0pp |
| GPT-5.4 | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 Mini | 40% | 80% | 80% | +40.0pp |
| GPT-5.5 | 40% | 80% | 80% | +40.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 60% | 60% | +60.0pp |
| Claude Opus 4.7 | 40% | 100% | 100% | +60.0pp |
| Claude Sonnet 4.6 | 0% | 80% | 80% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 100% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 Mini | 0% | 100% | 100% | +100.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 40% | 40% | +40.0pp |
| Claude Opus 4.7 | 20% | 100% | 100% | +80.0pp |
| Claude Sonnet 4.6 | 40% | 80% | 80% | +40.0pp |
| Gemini 3 Flash Preview | 0% | 100% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 0% | 100% | 100% | +100.0pp |
| GPT-5.4 | 20% | 100% | 100% | +80.0pp |
| GPT-5.4 Mini | 0% | 100% | 100% | +100.0pp |
| GPT-5.5 | 0% | 100% | 100% | +100.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 40% | 40% | +40.0pp |
| Claude Opus 4.7 | 20% | 100% | 100% | +80.0pp |
| Claude Sonnet 4.6 | 0% | 60% | 80% | +80.0pp |
| Gemini 3 Flash Preview | 0% | 80% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 40% | 80% | 100% | +60.0pp |
| GPT-5.4 | 40% | 100% | 100% | +60.0pp |
| GPT-5.4 Mini | 0% | 100% | 100% | +100.0pp |
| GPT-5.5 | 20% | 100% | 100% | +80.0pp |
| Model | Control* | Generate* | Review* | Δ last vs control* |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0% | 60% | 40% | +40.0pp |
| Claude Opus 4.7 | 20% | 100% | 100% | +80.0pp |
| Claude Sonnet 4.6 | 0% | 80% | 100% | +100.0pp |
| Gemini 3 Flash Preview | 0% | 80% | 100% | +100.0pp |
| Gemini 3.1 Pro Preview | 20% | 100% | 100% | +80.0pp |
| GPT-5.4 | 0% | 100% | 80% | +80.0pp |
| GPT-5.4 Mini | 0% | 100% | 100% | +100.0pp |
| GPT-5.5 | 20% | 100% | 100% | +80.0pp |
* Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.
Skill details
Each skill's mounted package, sandbox location, and per-turn prompt templates.
Building Accessible UI
A skill that implicitly steers generation toward accessible HTML and can be explicitly invoked to review and remediate previously produced HTML.
Samples per (test, model): 5
Skill package: ../../config/skills/building-accessible-ui
Turn prompts
-
Generate
(generate){{test_case_prompt}} -
Review
(review)Add and run accessibility tests, review results, and remediate the HTML. Fix any real accessibility issues you find. Leave correct markup alone. Submit one corrected standalone HTML document as your final answer. Do not wrap it in markdown fences or add commentary.
SKILL.md
---
name: building-accessible-ui
description: MUST BE USED for any UI work. Invoke this skill before generating, modifying, or reviewing any code that renders, styles, or wires up a user-facing interface — including markup, components, templates, styles, and the JavaScript/TypeScript that drives them. This skill encodes the accessibility (WCAG 2.2 AA) requirements every UI change must satisfy; skipping it produces inaccessible output. Applies across web, mobile, and desktop. If the task touches the UI layer in any way, use this skill first.
---
# building-accessible-ui
Checklist for producing and reviewing accessible UIs. Each rule leads with the platform-agnostic principle and, where relevant, the Web (HTML + ARIA + CSS) implementation. Apply the web guidance only when the output is web.
Detailed rationale lives in `references/`; widget-specific guidance in `components/`. **Open a file only when it's relevant to the current task.** Do not preload. Every file opened and every line a tool prints stays in context — don't re-read.
## Accessibility constitution
Ground rules. Use them to resolve conflicts and decide how much custom work is justified. The checklist below is their mechanical application.
### 1. Accessibility is a core outcome
A UI inaccessible to realistic users is not "done". Treat accessibility as a first-class criterion alongside correctness, performance, and security — not a finishing step. When scope must be cut, record the gap explicitly. Never claim output is "fully accessible"; state what was addressed and known limitations.
### 2. Build for real people
Evaluate designs against these personas; if a decision breaks one, justify it and offer an alternative:
- **Screen reader** — landmarks, headings, accessible name/role/state, reading order.
- **Keyboard-only** — Tab/arrows/Enter/Space/Escape, visible focus, no traps.
- **Low-vision** — zoom, reflow, contrast, Forced Colors.
- **Cognitive** — plain language, clear labels, actionable errors, forgiving interactions.
- **Deaf / hard of hearing** — captions/transcripts; no sound-only cues.
- **Motor / voice / switch** — large hit targets, named controls, no precise/timed gestures.
- **Situational** — sunlight, one-handed, noisy, flaky network.
### 3. Implementation priority
Use the highest option that fits:
1. Existing accessible component in this codebase / design system.
2. A component library.
3. Native platform semantics (`<button>`, `<a href>`, `<input>`, `<label>`, `<fieldset>`/`<legend>`, `<dialog>`, `<details>`, `<nav>`, `<main>`, headings).
4. Native element + minimum necessary ARIA (`aria-describedby`, `aria-expanded`, `aria-current`, etc.).
5. Fully custom ARIA widget — only when nothing above fits, and only if you implement the APG keyboard, focus, and state behavior end-to-end.
No ARIA is better than bad ARIA. Don't duplicate native semantics (no `role="button"` on `<button>`). Don't use `role="menu"` for site navigation. Don't invent new patterns when a standard one exists.
### 4. Balance, don't trade away
Accessibility, performance, security/privacy, and visual design are joint constraints — not dials to trade off. If an optimization removes a label, breaks focus, or hides content from AT, redesign the optimization. Accessible names must not leak secrets, but security is not a reason to ship an unlabeled control — find a labeling approach that doesn't leak data. Visual polish doesn't justify removing focus indicators or semantic structure. Under schedule pressure, prefer cutting scope over shipping an inaccessible feature. When constraints genuinely conflict, surface it explicitly.
### 5. Respect existing code
Don't rewrite an existing component or shared utility just because it could be more accessible — other code depends on it. When you see issues outside the current task's scope: note them (issue, affected persona, suggested fix) and ask before changing. Fix in place only when the change is required by the task, localized, and low-risk. Inside scope, fix real issues; never silently remove existing affordances (labels, landmarks, focus management, live regions) without an equal-or-better replacement.
## How to use this checklist
Identify which components the request involves (form, checkbox group, radio group, disclosure, modal, full view, etc.) and open the matching `components/<name>.md` once. Then work the checklist below. Open a `references/*.md` only when an item is unclear or you need the concrete fix pattern.
Do not claim the output is "fully accessible". State what was addressed and known limitations.
**Do NOT use this skill for:** backend-only changes, data migrations, build/CI configuration, non-UI tests, or tasks that do not touch the UI layer.
## Checklist
- **Prefer existing components.** If available, reuse existing UI components rather than creating new ones from scratch or custom implementations.
- **Platform-native semantics.** Prefer native platform controls and structures over custom constructs; add accessibility overrides only when a native control genuinely can't be used. → `references/structure.md`.
- **Web:** Prefer semantic HTML (`<button>`, `<a>`, `<input>`, `<label>`, `<fieldset>`/`<legend>`, `<nav>`, `<main>`, `<header>`, `<footer>`, `<h1>`–`<h6>`) over `<div>`/`<span>` with ARIA. Use ARIA only when no native element fits.
- **Regions / landmarks.** View structure is exposed via semantic regions/landmarks; duplicated landmarks have unique accessible names.
- **Web:** Exactly one `<main>`; `<header>`, `<nav>`, `<footer>` used when applicable.
- **Headings.** Logical outline labels sections without skipping levels; one top-level heading per view. → `references/structure.md`.
- **Web:** One `<h1>`, typically the first heading in `<main>`. Set a descriptive `<title>`.
- **Bypass blocks on web pages.** Provide a mechanism to skip repeated navigation when delivering traditional web pages. (Not required for Electron or non-web surfaces.) → `references/keyboard-focus.md`.
- **Web:** A "Skip to main content" link as the first focusable element
- **Name / role / value.** Every interactive element exposes an accurate accessible name; role matches purpose; dynamic states (pressed, expanded, selected, checked, disabled, invalid) stay in sync with visuals.
- **Web:** Prefer native attributes over ARIA. If necessary, use the minimum ARIA needed and update state attributes alongside DOM/visual changes.
- **Name-label match.** The accessible name of each interactive element contains the visible label text.
- **Web:** If `aria-label` is used, include the visible label text. For multiple controls that share a label (e.g., "Remove"), add context ("Remove item: Socks").
- **Labels and help text.** Every form control has a programmatic label describing its purpose; help/error text is programmatically associated with its control. → `components/forms.md`.
- **Web:** `<label for>` or wrapping `<label>`; never placeholder alone. Associate help/error via `aria-describedby` / `aria-errormessage`.
- **Grouping.** Related options (checkboxes, radios) are grouped so their shared name is part of the accessible name of each option. Group-level help/error text is associated with the group itself — not with each option and not with an intermediate wrapper.
- **Web:** `<fieldset>` with a `<legend>`. Put `aria-describedby` on the `<fieldset>` (not on a child `<div>`, and never on an extra `<div role="group">` inside the fieldset — `<fieldset>` already is the group).
- **Required fields.** Marked both visually and programmatically; not indicated by color alone.
- **Web:** Use an asterisk to indicate required fields. Native `required` on the control or `aria-required="true"`.
- **Keyboard operability.** Every interactive element is keyboard operable; tab order matches reading/visual order; expected keys work (activation, arrow keys inside composite widgets, Escape closes overlays); no keyboard traps; static content is not sequentially focusable.
- **Web:** Do not remove focus outlines without equal-or-better replacement. Use `tabindex="-1"` only for elements that need programmatic (not sequential) focus. → `references/keyboard-focus.md`.
- **Focus management.** Focus is always visible. Overlays/dialogs/disclosures move focus appropriately and restore it on close; no focus traps outside modals.
- **Hidden content.** Content hidden from assistive technology is not focusable and is hidden consistently across visual, semantic, and focus layers.
- **Web:** `hidden` / `display: none` / `aria-hidden="true"` used consistently.
- **Graphics.** Informative graphics have meaningful text alternatives; decorative graphics are hidden from AT. → `references/images-graphics.md`.
- **Web:** `<img>` informative → `alt`; decorative → `alt=""`. Informative `<svg>` → `role="img"` + accessible name. Other decorative → `aria-hidden="true"`.
- **Contrast.** Text ≥ 4.5:1 (3:1 large); focus indicators and key boundaries ≥ 3:1. Never color-only cues. → `references/contrast-forced-colors.md`.
- **Respect OS accessibility settings.** Never override OS high contrast, reduced-motion, or color-scheme preferences; adapt to forced-colors / high-contrast. → `references/contrast-forced-colors.md`.
- **Reflow.** Content adapts to narrow viewports (target 320 CSS px) without two-dimensional scrolling for multi-line text; controls remain operable. → `references/reflow.md`.
- **Navigation.** Uses semantic navigation grouping with state-exposing toggles for expandable menus. → `references/navigation.md`.
- **Web:** `<nav>`, not `role="menu"`; `aria-expanded` on triggers.
- **Tables / grids.** Static tabular data uses table semantics with header/cell associations; interactive grids only when truly warranted. → `references/tables-grids.md`.
- **Status messages.** Provide status messages for dynamic content updates that are relevant to the user (loading indicators, form submission results, etc.). → `references/status-messages.md`
- **Web:** Use `aria-live="polite"` or `aria-live="assertive"`.
- **Testing.** Add and run automated accessibility tests unless the project explicitly opts out. Writing or configuring a test is not enough — execution, fixes, and a result report are part of the deliverable. **The final automated test run must be on the exact artifact you submit: any edit after a passing test invalidates that test, so re-run before submitting.** **Open `references/testing.md` before writing any test code** for the opt-out signals, strategy precedence, runtime probe order, and reporting rules.
- **Web:** Prefer `@axe-core/*` bindings that match the existing test runner; render the component/page fully so interactive state, focus, and live regions are evaluated.
- **Other platforms:** Use the platform's native audit (Android `AccessibilityChecks`, iOS `XCUIAccessibilityAudit`, .NET `AccessibilityInsights`) under the same precedence.
- **Specs/Documentation.** Follow the project's documentation pattern and document accessibility considerations for each view, component, and interaction. → `references/specs-documentation.md`.
Detailed Results
The detailed sample browser is split into per-test sections. Expand a test below to lazy-load its sample details.
No test cases match the current filters.
Checkbox Group | React | Dark
Samples: 200 | Passes: 118 | Fails: 82
Models: 8
Open this panel to load the sample-level details.
Checkbox Group | React | Modern
Samples: 200 | Passes: 109 | Fails: 91
Models: 8
Open this panel to load the sample-level details.
Checkbox Group | Vanilla JS | Dark
Samples: 200 | Passes: 117 | Fails: 83
Models: 8
Open this panel to load the sample-level details.
Checkbox Group | Vanilla JS | Modern
Samples: 200 | Passes: 115 | Fails: 85
Models: 8
Open this panel to load the sample-level details.
Disclosure Widget | React | Dark
Samples: 200 | Passes: 143 | Fails: 57
Models: 8
Open this panel to load the sample-level details.
Disclosure Widget | React | Modern
Samples: 200 | Passes: 136 | Fails: 64
Models: 8
Open this panel to load the sample-level details.
Disclosure Widget | Vanilla JS | Dark
Samples: 200 | Passes: 160 | Fails: 40
Models: 8
Open this panel to load the sample-level details.
Disclosure Widget | Vanilla JS | Modern
Samples: 200 | Passes: 144 | Fails: 56
Models: 8
Open this panel to load the sample-level details.
Modal Dialog | React | Dark
Samples: 200 | Passes: 94 | Fails: 106
Models: 8
Open this panel to load the sample-level details.
Modal Dialog | React | Modern
Samples: 200 | Passes: 102 | Fails: 98
Models: 8
Open this panel to load the sample-level details.
Modal Dialog | Vanilla JS | Dark
Samples: 200 | Passes: 119 | Fails: 81
Models: 8
Open this panel to load the sample-level details.
Modal Dialog | Vanilla JS | Modern
Samples: 200 | Passes: 122 | Fails: 78
Models: 8
Open this panel to load the sample-level details.
Radio Button Group | React | Dark
Samples: 200 | Passes: 88 | Fails: 112
Models: 8
Open this panel to load the sample-level details.
Radio Button Group | React | Modern
Samples: 200 | Passes: 98 | Fails: 102
Models: 8
Open this panel to load the sample-level details.
Radio Button Group | Vanilla JS | Dark
Samples: 200 | Passes: 88 | Fails: 112
Models: 8
Open this panel to load the sample-level details.
Radio Button Group | Vanilla JS | Modern
Samples: 200 | Passes: 93 | Fails: 107
Models: 8
Open this panel to load the sample-level details.
Shopping Home Page | React | Dark
Samples: 200 | Passes: 99 | Fails: 101
Models: 8
Open this panel to load the sample-level details.
Shopping Home Page | React | Modern
Samples: 200 | Passes: 87 | Fails: 113
Models: 8
Open this panel to load the sample-level details.
Shopping Home Page | Vanilla JS | Dark
Samples: 200 | Passes: 106 | Fails: 94
Models: 8
Open this panel to load the sample-level details.
Shopping Home Page | Vanilla JS | Modern
Samples: 200 | Passes: 89 | Fails: 111
Models: 8
Open this panel to load the sample-level details.
Simple Contact Form | React | Dark | Error Message Present
Samples: 200 | Passes: 101 | Fails: 99
Models: 8
Open this panel to load the sample-level details.
Simple Contact Form | React | Dark | No Error Message
Samples: 200 | Passes: 98 | Fails: 102
Models: 8
Open this panel to load the sample-level details.
Simple Contact Form | React | Modern | Error Message Present
Samples: 200 | Passes: 120 | Fails: 80
Models: 8
Open this panel to load the sample-level details.
Simple Contact Form | React | Modern | No Error Message
Samples: 200 | Passes: 105 | Fails: 95
Models: 8
Open this panel to load the sample-level details.
Simple Contact Form | Vanilla JS | Dark | Error Message Present
Samples: 200 | Passes: 97 | Fails: 103
Models: 8
Open this panel to load the sample-level details.
Simple Contact Form | Vanilla JS | Dark | No Error Message
Samples: 200 | Passes: 76 | Fails: 124
Models: 8
Open this panel to load the sample-level details.
Simple Contact Form | Vanilla JS | Modern | Error Message Present
Samples: 200 | Passes: 109 | Fails: 91
Models: 8
Open this panel to load the sample-level details.
Simple Contact Form | Vanilla JS | Modern | No Error Message
Samples: 200 | Passes: 106 | Fails: 94
Models: 8
Open this panel to load the sample-level details.
Single Checkbox | React | Dark
Samples: 200 | Passes: 122 | Fails: 78
Models: 8
Open this panel to load the sample-level details.
Single Checkbox | React | Modern
Samples: 200 | Passes: 128 | Fails: 72
Models: 8
Open this panel to load the sample-level details.
Single Checkbox | Vanilla JS | Dark
Samples: 200 | Passes: 130 | Fails: 70
Models: 8
Open this panel to load the sample-level details.
Single Checkbox | Vanilla JS | Modern
Samples: 200 | Passes: 135 | Fails: 65
Models: 8
Open this panel to load the sample-level details.