A11y LLM Eval

Overview

The A11y LLM Eval report provides a summary of accessibility evaluation results for different models, instruction sets, and skills. It can help identify how different approaches impact accessibility outcomes and highlight areas for improvement. All content is generated using GitHub Copilot SDK and results are based on automated checks and curated test cases.

Run scope: 8 models | 32 prompt cases | 1280 control samples | 2 instruction sets | 1 skills

Control baseline

12%

Overall control pass rate*; best model GPT-5.4 Mini at 25%

Hardest case

Shopping Home Page | React | Dark

0% pass rate*, 15.55 avg WCAG failures

Best instruction lift

1. Basic

Best delta +48.5pp vs control

Best skill lift

Building Accessible UI

Best final-turn delta +88.1pp vs control; +5.6pp vs turn 1

* Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.

Control snapshot

Control results show how well models produce accessible code with no instructions or prompts to specifically create accessible code.

ModelRankPass rate*Avg Total WCAG Failures
GPT-5.4 Mini 1 25% 1.46
GPT-5.5 2 18% 1.47
GPT-5.4 3 15% 1.59
Claude Opus 4.7 4 14% 4.51
Gemini 3.1 Pro Preview 5 8% 3.08

Instruction-set snapshot

Instruction-set results show how well models produce accessible code when given specific guidance at the system/instruction level. Instructions guide the agent's behavior throughout the generation session and can improve accessibility outcomes, but they also consume context, especially when they are lengthy or combined with other instructions.

Instruction setRankVariant pass rate*Delta vs control
1. Basic 1 60% +48.5pp
0. Minimal 2 37% +24.8pp

Skill snapshot

Skills are reusable, task-specific packages that can include guidance, examples, supporting files, scripts, and tool-use workflows, while instruction sets are always-on guidance added to the agent's context for a run. Use instructions for broad behavior you want applied consistently across tasks; use a skill when the guidance is specialized, larger, procedural, or depends on files, scripts, or a focused sequence of tool-assisted steps. Skills keep general instructions lighter and can guide the model through a process, such as generating an answer and then reviewing it against a checklist.

SkillAvg final-turn pass rate*Avg delta vs controlBest model(s)
Building Accessible UI 86% +74.6pp Gemini 3.1 Pro Preview

Variant token + pass-rate snapshot

This table compares control, instruction sets, and skill turns using per-sample averages from the evaluated run. API calls are counted from the underlying Copilot session transcript for each sample; tokens per API call are computed as average total tokens divided by average call count for that sample. The guidance token percentage reflects the share of input tokens that came from guidance files (instruction markdown or skill directory files).

VariantAvg pass rate*Avg API callsAvg tokens / API callAvg tokens inAvg tokens out% of possible guidance tokens
Control 12% 4.34 16,896 69,869 4,608 n/a
1. Basic 60% 5.45 19,689 104,678 5,904 100.0%
0. Minimal 37% 4.78 17,048 77,703 4,891 100.0%
Building Accessible UI - Generate (Turn 1) 82% 8.26 22,459 186,584 6,879 14.4%
Building Accessible UI - Review (Turn 2) 86% 11.57 32,731 383,001 6,346 18.5%

Control summary

Control results show how well models produce accessible code with no instructions or prompts to specifically create accessible code. Models are ranked by WCAG pass rate across 32 test cases and 5 samples per test (160 samples per model). These tests do not comprehensively test all WCAG requirements, only a subset of the most common issues. WCAG failures may still exist even for passing tests.

ModelRankPass rate*Avg Total WCAG FailuresAvg Axe WCAG FailuresAvg Assertion WCAG FailuresAvg Best Practice Failures
GPT-5.4 Mini 1 25% 1.46 0.26 1.20 1.06
GPT-5.5 2 18% 1.47 0.36 1.11 0.34
GPT-5.4 3 15% 1.59 0.24 1.34 0.70
Claude Opus 4.7 4 14% 4.51 3.56 0.96 3.98
Gemini 3.1 Pro Preview 5 8% 3.08 1.30 1.78 4.46
Claude Sonnet 4.6 6 6% 10.02 8.81 1.21 8.59
Gemini 3 Flash Preview 7 6% 3.38 1.32 2.05 4.71
Claude Haiku 4.5 8 3% 5.91 3.64 2.27 9.43

* Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.

Pass@k aggregates

Pass@k estimates the probability that at least one of k randomly selected samples passes. This is computed from control samples only.

Checkbox Group | React | Dark
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 4 80% 100% 100%
Checkbox Group | React | Modern
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 0 0% 0% 0%
Checkbox Group | Vanilla JS | Dark
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 0 0% 0% 0%
Checkbox Group | Vanilla JS | Modern
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 2 40% 100% 100%
Disclosure Widget | React | Dark
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 1 20% 100% 100%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 1 20% 100% 100%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 2 40% 100% 100%
GPT-5.4 Mini 5 5 100% 100% 100%
GPT-5.5 5 0 0% 0% 0%
Disclosure Widget | React | Modern
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 3 60% 100% 100%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 2 40% 100% 100%
GPT-5.5 5 1 20% 100% 100%
Disclosure Widget | Vanilla JS | Dark
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 1 20% 100% 100%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 2 40% 100% 100%
Gemini 3 Flash Preview 5 5 100% 100% 100%
Gemini 3.1 Pro Preview 5 5 100% 100% 100%
GPT-5.4 5 2 40% 100% 100%
GPT-5.4 Mini 5 4 80% 100% 100%
GPT-5.5 5 1 20% 100% 100%
Disclosure Widget | Vanilla JS | Modern
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 1 20% 100% 100%
Gemini 3 Flash Preview 5 3 60% 100% 100%
Gemini 3.1 Pro Preview 5 3 60% 100% 100%
GPT-5.4 5 1 20% 100% 100%
GPT-5.4 Mini 5 4 80% 100% 100%
GPT-5.5 5 0 0% 0% 0%
Modal Dialog | React | Dark
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 1 20% 100% 100%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 1 20% 100% 100%
Modal Dialog | React | Modern
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 1 20% 100% 100%
GPT-5.4 Mini 5 4 80% 100% 100%
GPT-5.5 5 0 0% 0% 0%
Modal Dialog | Vanilla JS | Dark
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 4 80% 100% 100%
GPT-5.4 Mini 5 3 60% 100% 100%
GPT-5.5 5 0 0% 0% 0%
Modal Dialog | Vanilla JS | Modern
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 1 20% 100% 100%
GPT-5.4 5 2 40% 100% 100%
GPT-5.4 Mini 5 4 80% 100% 100%
GPT-5.5 5 2 40% 100% 100%
Radio Button Group | React | Dark
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 3 60% 100% 100%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 1 20% 100% 100%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 1 20% 100% 100%
GPT-5.5 5 0 0% 0% 0%
Radio Button Group | React | Modern
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 2 40% 100% 100%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 1 20% 100% 100%
GPT-5.5 5 0 0% 0% 0%
Radio Button Group | Vanilla JS | Dark
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 1 20% 100% 100%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 1 20% 100% 100%
GPT-5.5 5 0 0% 0% 0%
Radio Button Group | Vanilla JS | Modern
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 0 0% 0% 0%
Shopping Home Page | React | Dark
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 0 0% 0% 0%
Shopping Home Page | React | Modern
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 0 0% 0% 0%
Shopping Home Page | Vanilla JS | Dark
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 0 0% 0% 0%
Shopping Home Page | Vanilla JS | Modern
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 0 0% 0% 0%
Simple Contact Form | React | Dark | Error Message Present
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 1 20% 100% 100%
GPT-5.5 5 4 80% 100% 100%
Simple Contact Form | React | Dark | No Error Message
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 2 40% 100% 100%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 2 40% 100% 100%
GPT-5.4 Mini 5 2 40% 100% 100%
GPT-5.5 5 2 40% 100% 100%
Simple Contact Form | React | Modern | Error Message Present
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 2 40% 100% 100%
Claude Sonnet 4.6 5 1 20% 100% 100%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 1 20% 100% 100%
GPT-5.4 Mini 5 2 40% 100% 100%
GPT-5.5 5 4 80% 100% 100%
Simple Contact Form | React | Modern | No Error Message
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 1 20% 100% 100%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 2 40% 100% 100%
GPT-5.5 5 2 40% 100% 100%
Simple Contact Form | Vanilla JS | Dark | Error Message Present
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 5 100% 100% 100%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 2 40% 100% 100%
GPT-5.5 5 2 40% 100% 100%
Simple Contact Form | Vanilla JS | Dark | No Error Message
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 1 20% 100% 100%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 1 20% 100% 100%
Simple Contact Form | Vanilla JS | Modern | Error Message Present
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 4 80% 100% 100%
Claude Sonnet 4.6 5 1 20% 100% 100%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 4 80% 100% 100%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 0 0% 0% 0%
Simple Contact Form | Vanilla JS | Modern | No Error Message
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 0 0% 0% 0%
Claude Sonnet 4.6 5 1 20% 100% 100%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 2 40% 100% 100%
GPT-5.5 5 0 0% 0% 0%
Single Checkbox | React | Dark
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 1 20% 100% 100%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 2 40% 100% 100%
GPT-5.4 5 2 40% 100% 100%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 1 20% 100% 100%
Single Checkbox | React | Modern
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 1 20% 100% 100%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 1 20% 100% 100%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 1 20% 100% 100%
Single Checkbox | Vanilla JS | Dark
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 2 40% 100% 100%
Claude Sonnet 4.6 5 0 0% 0% 0%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 0 0% 0% 0%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 0 0% 0% 0%
Single Checkbox | Vanilla JS | Modern
Model Samples Passes pass@1* pass@5* pass@10*
Claude Haiku 4.5 5 0 0% 0% 0%
Claude Opus 4.7 5 1 20% 100% 100%
Claude Sonnet 4.6 5 2 40% 100% 100%
Gemini 3 Flash Preview 5 0 0% 0% 0%
Gemini 3.1 Pro Preview 5 0 0% 0% 0%
GPT-5.4 5 1 20% 100% 100%
GPT-5.4 Mini 5 0 0% 0% 0%
GPT-5.5 5 0 0% 0% 0%

* Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.

Control analysis

This section summarizes where models perform well, where they struggle, and the most frequent types of accessibility issues observed across all samples.

Most common axe WCAG failures

Rule Impact Failures % of failures Seen in models Seen in test cases Description
color-contrast serious 532 89.1% 8 32 Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
aria-required-children critical 14 2.3% 4 5 Ensure elements with an ARIA role that require child roles contain them
button-name critical 14 2.3% 3 5 Ensure buttons have discernible text
link-in-text-block serious 8 1.3% 3 5 Ensure links are distinguished from surrounding text in a way that does not rely on color
link-name serious 8 1.3% 2 4 Ensure links have discernible text
aria-allowed-attr critical 5 0.8% 3 4 Ensure an element's role supports its ARIA attributes
listitem serious 5 0.8% 3 4 Ensure
  • elements are used semantically
  • aria-prohibited-attr serious 3 0.5% 2 2 Ensure ARIA attributes are not prohibited for an element's role
    image-alt critical 2 0.3% 1 1 Ensure elements have alternative text or a role of none or presentation
    label critical 2 0.3% 2 2 Ensure every form element has a label

    Most common axe best-practice failures

    Rule Impact Failures % of failures Seen in models Seen in test cases Description
    region moderate 610 44.4% 8 32 Ensure all page content is contained by landmarks
    landmark-one-main moderate 508 37.0% 8 29 Ensure the document has a main landmark
    heading-order moderate 103 7.5% 7 11 Ensure the order of headings is semantically correct
    page-has-heading-one moderate 71 5.2% 5 21 Ensure that the page, or at least one of its frames contains a level-one heading
    landmark-complementary-is-top-level moderate 35 2.5% 3 8 Ensure the complementary landmark or aside is at top level
    landmark-unique moderate 24 1.7% 3 6 Ensure landmarks are unique
    aria-dialog-name serious 12 0.9% 3 3 Ensure every ARIA dialog and alertdialog node has an accessible name
    aria-allowed-role minor 7 0.5% 1 6 Ensure role attribute has an appropriate value for the element
    image-redundant-alt minor 1 0.1% 1 1 Ensure image alternative is not repeated as text
    label-title-only serious 1 0.1% 1 1 Ensure that every form element has a visible label and is not solely labeled using hidden labels, or the title or aria-describedby attributes

    Assertion-level patterns (per test case)

    Checkbox Group | React | Dark

    Assertion Type Failure rate Failures / applicable Not applicable
    Helper text is programmatically associated R 90% 36 / 40 0
    Each checkbox group has a valid role R 70% 28 / 40 0
    Each checkbox group has an accessible label R 70% 28 / 40 0
    Space toggles checkbox state of each checkbox R 10% 4 / 40 0
    Visible label is included in accessible name R 8% 3 / 39 1

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Checkbox Group | React | Modern

    Assertion Type Failure rate Failures / applicable Not applicable
    Helper text is programmatically associated R 98% 39 / 40 0
    Each checkbox group has an accessible label R 70% 28 / 40 0
    Each checkbox group has a valid role R 68% 27 / 40 0
    Each checkbox has an accessible name R 5% 2 / 40 0
    Visible label is included in accessible name R 5% 2 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Checkbox Group | Vanilla JS | Dark

    Assertion Type Failure rate Failures / applicable Not applicable
    Helper text is programmatically associated R 100% 40 / 40 0
    Each checkbox group has an accessible label R 72% 29 / 40 0
    Each checkbox group has a valid role R 70% 28 / 40 0
    Each checkbox has an accessible name R 2% 1 / 40 0
    Each checkbox is in the tab order R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Checkbox Group | Vanilla JS | Modern

    Assertion Type Failure rate Failures / applicable Not applicable
    Helper text is programmatically associated R 90% 36 / 40 0
    Each checkbox group has an accessible label R 52% 21 / 40 0
    Each checkbox group has a valid role R 50% 20 / 40 0
    Visible label is included in accessible name R 8% 3 / 40 0
    Each checkbox has an accessible name R 5% 2 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Disclosure Widget | React | Dark

    Assertion Type Failure rate Failures / applicable Not applicable
    Collapsed content is hidden from everyone R 77% 24 / 31 9
    All examples have a valid semantics R 18% 7 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Disclosure Widget | React | Modern

    Assertion Type Failure rate Failures / applicable Not applicable
    Collapsed content is hidden from everyone R 84% 32 / 38 2
    All examples have a valid semantics R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Disclosure Widget | Vanilla JS | Dark

    Assertion Type Failure rate Failures / applicable Not applicable
    Collapsed content is hidden from everyone R 44% 16 / 36 4
    All examples have a valid semantics R 10% 4 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Disclosure Widget | Vanilla JS | Modern

    Assertion Type Failure rate Failures / applicable Not applicable
    Collapsed content is hidden from everyone R 57% 20 / 35 5
    All examples have a valid semantics R 12% 5 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Modal Dialog | React | Dark

    Assertion Type Failure rate Failures / applicable Not applicable
    Each modal dialog hides content behind it while open R 92% 37 / 40 0
    Each modal dialog takes focus when opened R 65% 26 / 40 0
    Focus is not lost when each dialog closes R 60% 24 / 40 0
    Each modal dialog traps keyboard focus R 52% 21 / 40 0
    Each dialog can be closed by escape key BP 40% 16 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Modal Dialog | React | Modern

    Assertion Type Failure rate Failures / applicable Not applicable
    Each modal dialog hides content behind it while open R 85% 34 / 40 0
    Each modal dialog takes focus when opened R 48% 19 / 40 0
    Focus is not lost when each dialog closes R 42% 17 / 40 0
    Each modal dialog traps keyboard focus R 32% 13 / 40 0
    Each dialog can be closed by escape key BP 25% 10 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Modal Dialog | Vanilla JS | Dark

    Assertion Type Failure rate Failures / applicable Not applicable
    Each modal dialog hides content behind it while open R 80% 32 / 40 0
    Each modal dialog takes focus when opened R 30% 12 / 40 0
    Each dialog can be closed by escape key BP 22% 9 / 40 0
    Each dialog has a dialog role R 22% 9 / 40 0
    Focus is not lost when each dialog closes R 22% 9 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Modal Dialog | Vanilla JS | Modern

    Assertion Type Failure rate Failures / applicable Not applicable
    Each modal dialog hides content behind it while open R 68% 27 / 40 0
    Each modal dialog takes focus when opened R 35% 14 / 40 0
    Focus is not lost when each dialog closes R 20% 8 / 40 0
    Each modal dialog traps keyboard focus R 18% 7 / 40 0
    Each dialog can be closed by escape key BP 15% 6 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Radio Button Group | React | Dark

    Assertion Type Failure rate Failures / applicable Not applicable
    Required fields are indicated visually R 100% 20 / 20 20
    Helper text is programmatically associated R 100% 10 / 10 30
    Each radio group has an accessible label R 45% 18 / 40 0
    Arrow keys change the selected radio within each group R 5% 2 / 40 0
    Each radio group is keyboard reachable R 5% 2 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Radio Button Group | React | Modern

    Assertion Type Failure rate Failures / applicable Not applicable
    Required fields are indicated visually R 100% 21 / 21 19
    Helper text is programmatically associated R 100% 18 / 18 22
    Each radio group has an accessible label R 42% 17 / 40 0
    Arrow keys change the selected radio within each group R 5% 2 / 40 0
    Each radio group is keyboard reachable R 5% 2 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Radio Button Group | Vanilla JS | Dark

    Assertion Type Failure rate Failures / applicable Not applicable
    Required fields are indicated visually R 100% 33 / 33 7
    Helper text is programmatically associated R 100% 18 / 18 22
    Each radio group has an accessible label R 50% 20 / 40 0
    Arrow keys change the selected radio within each group R 2% 1 / 40 0
    Each radio group is keyboard reachable R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Radio Button Group | Vanilla JS | Modern

    Assertion Type Failure rate Failures / applicable Not applicable
    Required fields are indicated visually R 100% 35 / 35 5
    Helper text is programmatically associated R 100% 14 / 14 26
    Each radio group has an accessible label R 25% 10 / 40 0
    Arrow keys change the selected radio within each group R 2% 1 / 40 0
    Each radio group is keyboard reachable R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Shopping Home Page | React | Dark

    Assertion Type Failure rate Failures / applicable Not applicable
    Has a skip navigation link R 100% 40 / 40 0
    Has a single maincontent R 52% 21 / 40 0
    Has a single banner R 25% 10 / 40 0
    Has single h1 BP 5% 2 / 40 0
    Has an h1 R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Shopping Home Page | React | Modern

    Assertion Type Failure rate Failures / applicable Not applicable
    Has a skip navigation link R 100% 40 / 40 0
    Has a single maincontent R 40% 16 / 40 0
    Has a single banner R 22% 9 / 40 0
    Has a single footer R 10% 4 / 40 0
    Has single h1 BP 5% 2 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Shopping Home Page | Vanilla JS | Dark

    Assertion Type Failure rate Failures / applicable Not applicable
    Has a skip navigation link R 100% 40 / 40 0
    Has a single maincontent R 32% 13 / 40 0
    Has a single banner R 10% 4 / 40 0
    Has a single footer R 3% 1 / 38 2
    Has at least one navigation R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Shopping Home Page | Vanilla JS | Modern

    Assertion Type Failure rate Failures / applicable Not applicable
    Has a skip navigation link R 100% 40 / 40 0
    Has a single maincontent R 35% 14 / 40 0
    Has a single banner R 10% 4 / 40 0
    Has a single footer R 3% 1 / 33 7
    Has at least one navigation R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Simple Contact Form | React | Dark | Error Message Present

    Assertion Type Failure rate Failures / applicable Not applicable
    Inputs use appropriate autocomplete for purpose R 60% 24 / 40 0
    Required fields are indicated programmatically R 30% 11 / 37 3
    Required fields are indicated visually R 16% 6 / 37 3
    Helper text is programmatically associated R 15% 6 / 40 0
    Placeholder text is programmatically defined as a property R 0% 0 / 38 2

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Simple Contact Form | React | Dark | No Error Message

    Assertion Type Failure rate Failures / applicable Not applicable
    Inputs use appropriate autocomplete for purpose R 52% 21 / 40 0
    Required fields are indicated visually R 36% 14 / 39 1
    Helper text is programmatically associated R 18% 7 / 40 0
    Required fields are indicated programmatically R 8% 3 / 39 1
    Visible label is included in accessible name R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Simple Contact Form | React | Modern | Error Message Present

    Assertion Type Failure rate Failures / applicable Not applicable
    Inputs use appropriate autocomplete for purpose R 52% 21 / 40 0
    Required fields are indicated programmatically R 23% 8 / 35 5
    Required fields are indicated visually R 23% 8 / 35 5
    Each text input has an accessible name R 0% 0 / 40 0
    Each text input has textbox role R 0% 0 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Simple Contact Form | React | Modern | No Error Message

    Assertion Type Failure rate Failures / applicable Not applicable
    Inputs use appropriate autocomplete for purpose R 55% 22 / 40 0
    Required fields are indicated visually R 44% 17 / 39 1
    Helper text is programmatically associated R 12% 5 / 40 0
    Required fields are indicated programmatically R 8% 3 / 39 1
    Visible label is included in accessible name R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Simple Contact Form | Vanilla JS | Dark | Error Message Present

    Assertion Type Failure rate Failures / applicable Not applicable
    Required fields are indicated visually R 46% 18 / 39 1
    Inputs use appropriate autocomplete for purpose R 40% 16 / 40 0
    Helper text is programmatically associated R 12% 5 / 40 0
    Required fields are indicated programmatically R 8% 3 / 39 1
    Visible label is included in accessible name R 8% 3 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Simple Contact Form | Vanilla JS | Dark | No Error Message

    Assertion Type Failure rate Failures / applicable Not applicable
    Required fields are indicated visually R 70% 28 / 40 0
    Inputs use appropriate autocomplete for purpose R 40% 16 / 40 0
    Helper text is programmatically associated R 15% 6 / 40 0
    Visible label is included in accessible name R 8% 3 / 40 0
    Required fields are indicated programmatically R 5% 2 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Simple Contact Form | Vanilla JS | Modern | Error Message Present

    Assertion Type Failure rate Failures / applicable Not applicable
    Required fields are indicated visually R 45% 18 / 40 0
    Inputs use appropriate autocomplete for purpose R 38% 15 / 40 0
    Visible label is included in accessible name R 12% 5 / 40 0
    Helper text is programmatically associated R 8% 3 / 40 0
    Required fields are indicated programmatically R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Simple Contact Form | Vanilla JS | Modern | No Error Message

    Assertion Type Failure rate Failures / applicable Not applicable
    Required fields are indicated visually R 72% 29 / 40 0
    Inputs use appropriate autocomplete for purpose R 38% 15 / 40 0
    Helper text is programmatically associated R 15% 6 / 40 0
    Visible label is included in accessible name R 12% 5 / 40 0
    Visual labels are defined and persistent R 5% 2 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Single Checkbox | React | Dark

    Assertion Type Failure rate Failures / applicable Not applicable
    Helper text is programmatically associated R 74% 29 / 39 1
    Required fields are indicated visually R 15% 6 / 40 0
    ARIA attributes match native checkbox attributes if used R 2% 1 / 40 0
    Checked state is programmatically exposed R 2% 1 / 40 0
    Each checkbox has a valid role R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Single Checkbox | React | Modern

    Assertion Type Failure rate Failures / applicable Not applicable
    Helper text is programmatically associated R 62% 25 / 40 0
    Required fields are indicated visually R 18% 7 / 40 0
    Visual labels are defined and persistent R 5% 2 / 40 0
    Each checkbox has an accessible name R 2% 1 / 40 0
    Space toggles checkbox state R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Single Checkbox | Vanilla JS | Dark

    Assertion Type Failure rate Failures / applicable Not applicable
    Helper text is programmatically associated R 80% 32 / 40 0
    Required fields are indicated visually R 5% 2 / 40 0
    ARIA attributes match native checkbox attributes if used R 0% 0 / 40 0
    Checked state is programmatically exposed R 0% 0 / 40 0
    Each checkbox has a valid role R 0% 0 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Single Checkbox | Vanilla JS | Modern

    Assertion Type Failure rate Failures / applicable Not applicable
    Helper text is programmatically associated R 74% 29 / 39 1
    Required fields are indicated visually R 20% 8 / 40 0
    Each checkbox has an accessible name R 2% 1 / 40 0
    Each checkbox is keyboard reachable R 2% 1 / 40 0
    Space toggles checkbox state R 2% 1 / 40 0

    Assertion statistics are computed within this test case only and are not compared across different test cases.

    Methodology

    Based on 32 prompt cases and 5 samples per case (160 samples per model), this run is roughly powered to detect model WCAG pass-rate differences of about 15.7 percentage points or larger in a two-model comparison (approximate 95% confidence, 80% power; assumes independent samples).

    This is a planning heuristic, not a confidence interval reported by the harness. Repeated samples within the same prompt case are correlated, so the true detectable difference may be somewhat larger.

    Output format instructions

    Save your answer to `index.html`. Feel free to use separate CSS and JS files in the same directory.

    All tests are automatic and deterministic (no human intervention). Only a fraction of accessibility requirements in WCAG can be covered in this way. Many requirements still need a human to evaluate. As such, these tests are not comprehensive. Even if a test passes, it may still fail WCAG and contain serious accessibility issues.

    Please leave feedback, review the source code, and contribute test cases, assertions, and other improvements at the GitHub Project.

    Contributors

    This report and evaluation harness are maintained by Michael Fairchild. For questions about methodology, generated samples, or report interpretation, contact mfairchild@microsoft.com. Contributors include Scott O'hara, Aaron Gustafson, Shawn Lauriat, Dylan Isaac, and Cameron Cundiff. This project would not be possible without the support of the Microsoft Accessibility team and the GitHub Copilot SDK team.

    Glossary

    Column Definitions

    Other Glossary Terms

    Change Log

    5/2026 Update

    2/2026 Update

    Instruction Benchmarks (vs Control)

    These results show how well each instruction set performs vs the control configuration (averaged across models). Instruction sets contain specific guidance intended to improve accessibility and are appended to the system prompt.

    Several instruction sets are used in this benchmark to help identify which instructions are most effective at improving accessibility. Models are ranked by average WCAG pass rate across all models and test cases for that instruction set.

    Summary (ranked by avg WCAG pass rate)

    Rank Instruction Set Avg Control Pass Rate* Avg Instruction Set Pass Rate* Δ Avg Pass Rate*
    1 1. Basic 12% 60% +48.5pp
    2 0. Minimal 12% 37% +24.8pp

    * Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.

    Instruction benchmark details

    This section includes per-model benchmark results and the full text of each instruction set.

    Instruction sets

    0. Minimal

    Minimal reminder that all output must be accessible.

    Full instruction set

    Variant samples per (test, model): 5

    Generation mode: copilot_agent

    All output MUST be accessible.
    
    1. Basic

    Basic reminder that all output must be accessible (includes slightly more instructions than minimal).

    Full instruction set

    Variant samples per (test, model): 5

    Generation mode: copilot_agent

    ---
    description: "Accessibility coding rules: WCAG 2.2 AA conformance, semantic structure, keyboard support, focus management. Apply when writing or modifying UI code."
    applyTo: "**"
    ---
    
    # Accessibility instructions (standard)
    
    Conform to [WCAG 2.2 Level AA](https://www.w3.org/TR/WCAG22/). Do not claim output is "fully accessible" — state what was addressed and any known limitations.
    
    ## Implementation priority
    
    Use the first option that fits:
    
    1. Existing accessible component in the project / design system.
    2. A component library already in use.
    3. Native platform semantics (`<button>`, `<a href>`, `<input>`, `<label>`, `<fieldset>`/`<legend>`, `<dialog>`, `<nav>`, `<main>`, `<h1>`–`<h6>`).
    4. Native element + minimum necessary ARIA.
    5. Fully custom ARIA widget — only when nothing above fits, and only with complete APG keyboard, focus, and state behavior.
    
    No ARIA is better than bad ARIA. Don't duplicate native semantics (no `role="button"` on `<button>`). Don't use `role="menu"` for site navigation.
    
    ## Structure
    
    - Use landmark elements (`<header>`, `<nav>`, `<main>`, `<footer>`). Exactly one `<main>`. Give duplicated landmarks unique accessible names.
    - One `<h1>` per view, typically first heading in `<main>`. Don't skip heading levels.
    - Set a descriptive `<title>`.
    - **Web pages only:** Provide a "Skip to main content" link as the first focusable element.
    
    ## Name, role, value
    
    - Every interactive element exposes an accurate accessible name. Role matches purpose. Dynamic states (pressed, expanded, selected, checked, disabled, invalid) stay in sync with visuals.
    - The accessible name MUST contain the visible label text. When multiple controls share a label (e.g. many "Remove" buttons), add context: `aria-label="Remove item: Socks"`.
    
    ## Keyboard and focus
    
    - All functionality can be achieved by both mouse an keyboard; tab order matches reading/visual order.
    - Focus is always visible — do not remove focus outlines without an equal-or-better replacement.
    - Avoid keyboard traps.
    - Escape should close overlays.
    - Static content MUST NOT be sequentially focusable. Use `tabindex="-1"` only for programmatic focus targets.
    - Content hidden from AT (`aria-hidden="true"`) MUST NOT be focusable.
    - Dialogs move focus in and restore it on close.
    - Composite widgets (tabs, listbox, menu, grid): one tab stop total; arrow keys move focus internally via roving `tabindex` or `aria-activedescendant`.
    
    ## Forms
    
    - Every form field has a visual and programmatic label (`<label for>` or wrapping `<label>`). Never rely on placeholder alone.
    - Associate help/error text via `aria-describedby`.
    - Group related options (checkboxes, radios) with `<fieldset>` + `<legend>`.
    - Required fields: visible indicator (e.g., `*`) AND `required` / `aria-required="true"`. Never color alone. This is a MUST when the form contains both required and optional fields.
    - Invalid fields: `aria-invalid="true"`; remove when corrected. Error messages explain how to fix.
    - On submit with invalid input, focus the first invalid control. Don't disable submit solely to prevent submission.
    
    ## Contrast and color
    
    - Text contrast ≥ 4.5:1 (≥ 3:1 for large text: 24px regular or 18.66px bold).
    - Focus indicators and key control boundaries ≥ 3:1 vs adjacent colors.
    - Never use color as the only cue for meaning (error, success, required, selected).
    - Use design tokens / CSS custom properties. Avoid `opacity`, `rgba`, `hsla` on text and essential affordances — contrast becomes background-dependent.
    - Ensure contrast in all states: default, hover, active, focus, visited, disabled.
    
    ## Forced colors / OS settings
    
    - Never override OS high-contrast, reduced-motion, or color-scheme preferences without good reason.
    - Do not use `forced-color-adjust: none` without good reason (e.g., data-viz where color needs to remain the same).
    - In `@media (forced-colors: active)`, use system color keywords (`ButtonText`, `ButtonBorder`, `CanvasText`, `Canvas`) — never fixed hex/RGB.
    - Use `currentColor` for SVG `fill`/`stroke` so icons inherit the foreground.
    - If relying on `box-shadow` for focus, add a transparent `outline` so focus renders in forced colors.
    
    ## Reflow (SC 1.4.10)
    
    - Content MUST be able to 320 CSS pixels wide without two-dimensional scrolling for multi-line text.
    - For multi-column layouts that are not necessary to convey meaning or important to the UX of the interface, content stacks; text wraps; controls remain operable.
    - Use fluid `flex`/`grid`. Set `max-width: 100%` on media, `min-width: 0` on flex/grid children, `overflow-wrap: anywhere` for long strings.
    - Exception: inherently 2D components (large tables, maps, charts, media, interfaces with toolbars or interfaces that require 2D layout) may scroll horizontally at component level; the surrounding view still reflows.
    
    ## Graphics
    
    - Informative `<img>` → meaningful `alt`. Decorative `<img>` → `alt=""`.
    - Informative `<svg>` → `role="img"` with `aria-label` / `aria-labelledby`. Decorative SVG/graphics → `aria-hidden="true"`.
    
    ## Navigation
    
    - Use `<nav>` with lists and links — not `role="menu"` / `role="menubar"`.
    - Expandable navigation: toggle uses `button[aria-expanded]`. Escape MAY close sub-navigations.
    
    ## Tables and grids
    
    - Static tabular data: `<table>` with `<th>` for column/row headers.
    - Use `role="grid"` only for genuinely interactive tabular UIs, with proper row/cell nesting and arrow-key navigation.
    
    ## Status messages
    
    - Announce dynamic updates (loading, success, failure, error, validation summaries) via `aria-live="polite"` or `aria-live="assertive"`.
    
    ## Final verification
    
    Before finalizing, verify: landmarks + one `<h1>`; keyboard operability with visible focus and no traps; visible labels included in accessible names; form labels + required + error association + focus-first-invalid; contrast thresholds; forced-colors adaptation; reflow at 320px; image alternatives; table header associations.

    Results

    Model Instruction Set Control Pass Rate* Instruction Set Pass Rate* Δ Pass Rate*
    Claude Haiku 4.5 0. Minimal 3% 11% +8.1pp
    Claude Haiku 4.5 1. Basic 3% 19% +16.2pp
    Claude Opus 4.7 0. Minimal 14% 42% +27.5pp
    Claude Opus 4.7 1. Basic 14% 64% +49.4pp
    Claude Sonnet 4.6 0. Minimal 6% 9% +2.5pp
    Claude Sonnet 4.6 1. Basic 6% 21% +15.0pp
    GPT-5.4 0. Minimal 15% 38% +22.5pp
    GPT-5.4 1. Basic 15% 84% +69.4pp
    GPT-5.4 Mini 0. Minimal 25% 46% +20.6pp
    GPT-5.4 Mini 1. Basic 25% 81% +55.6pp
    GPT-5.5 0. Minimal 18% 64% +46.2pp
    GPT-5.5 1. Basic 18% 89% +71.9pp
    Gemini 3 Flash Preview 0. Minimal 6% 31% +25.6pp
    Gemini 3 Flash Preview 1. Basic 6% 49% +43.8pp
    Gemini 3.1 Pro Preview 0. Minimal 8% 53% +45.0pp
    Gemini 3.1 Pro Preview 1. Basic 8% 75% +66.9pp

    * Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.

    Instruction set analysis vs control

    This section highlights where each instruction set helped (or hurt) compared to the control, aggregated across all samples for that instruction set.

    * Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.

    0. Minimal — overall Δ pass rate +24.8pp

    Overall: Control 12% (n=1280) → Variant 37% (n=1280). Avg WCAG failures/sample: 3.93 → 2.63 (Δ -1.30).

    Most improved test cases

    Test case Control pass rate* Variant pass rate* Δ pass rate* Δ avg WCAG failures
    Single Checkbox | Vanilla JS | Modern 10% 68% +57.5pp -0.92
    Single Checkbox | Vanilla JS | Dark 5% 60% +55.0pp -0.85
    Disclosure Widget | React | Dark 22% 72% +50.0pp -0.52
    Single Checkbox | React | Modern 8% 55% +47.5pp -0.60
    Checkbox Group | React | Modern 0% 42% +42.5pp -2.20

    Most regressed test cases

    Test case Control pass rate* Variant pass rate* Δ pass rate* Δ avg WCAG failures
    Simple Contact Form | Vanilla JS | Dark | Error Message Present 22% 18% -5.0pp -0.25

    Most reduced axe WCAG rules

    Rule Control rate Variant rate Δ rate Description
    color-contrast 41.6% 35.0% -6.6pp Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
    button-name 1.1% 0.0% -1.1pp Ensure buttons have discernible text
    link-name 0.6% 0.1% -0.5pp Ensure links have discernible text
    link-in-text-block 0.6% 0.3% -0.3pp Ensure links are distinguished from surrounding text in a way that does not rely on color
    image-alt 0.2% 0.0% -0.2pp Ensure <img> elements have alternative text or a role of none or presentation

    Most increased axe WCAG rules

    Rule Control rate Variant rate Δ rate Description
    aria-allowed-attr 0.4% 1.2% +0.9pp Ensure an element's role supports its ARIA attributes
    aria-hidden-focus 0.0% 0.4% +0.4pp Ensure aria-hidden elements are not focusable nor contain focusable elements
    aria-prohibited-attr 0.2% 0.5% +0.3pp Ensure ARIA attributes are not prohibited for an element's role
    listitem 0.4% 0.6% +0.2pp Ensure <li> elements are used semantically
    aria-conditional-attr 0.0% 0.1% +0.1pp Ensure ARIA attributes are used as described in the specification of the element's role

    Assertion analysis (vs control)

    Failure rates are computed per assertion (within each test case) and compared between the variant and control.

    Most improved assertions

    Test case Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Single Checkbox | Vanilla JS | Dark Helper text is programmatically associated R 80% 10% -70.0pp 32 / 40 4 / 40
    Shopping Home Page | Vanilla JS | Dark Has a skip navigation link R 100% 30% -70.0pp 40 / 40 12 / 40
    Shopping Home Page | React | Dark Has a skip navigation link R 100% 32% -67.5pp 40 / 40 13 / 40
    Shopping Home Page | Vanilla JS | Modern Has a skip navigation link R 100% 35% -65.0pp 40 / 40 14 / 40
    Checkbox Group | Vanilla JS | Dark Each checkbox group has an accessible label R 72% 10% -62.5pp 29 / 40 4 / 40
    Shopping Home Page | React | Modern Has a skip navigation link R 100% 38% -62.5pp 40 / 40 15 / 40
    Single Checkbox | Vanilla JS | Modern Helper text is programmatically associated R 74% 12% -61.9pp 29 / 39 5 / 40
    Checkbox Group | Vanilla JS | Dark Each checkbox group has a valid role R 70% 10% -60.0pp 28 / 40 4 / 40
    Disclosure Widget | React | Modern Collapsed content is hidden from everyone R 84% 29% -55.3pp 32 / 38 11 / 38
    Disclosure Widget | React | Dark Collapsed content is hidden from everyone R 77% 24% -53.1pp 24 / 31 9 / 37

    Most regressed assertions

    Test case Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Single Checkbox | Vanilla JS | Dark Required fields are indicated visually R 5% 15% +10.0pp 2 / 40 6 / 40
    Shopping Home Page | React | Dark Has a single footer R 0% 5% +5.0pp 0 / 39 2 / 40
    Single Checkbox | React | Dark Visible label is included in accessible name R 0% 5% +5.0pp 0 / 39 2 / 40
    Single Checkbox | Vanilla JS | Modern Visible label is included in accessible name R 2% 8% +5.0pp 1 / 40 3 / 40
    Single Checkbox | React | Modern Visible label is included in accessible name R 0% 3% +2.6pp 0 / 38 1 / 39
    Modal Dialog | Vanilla JS | Dark Each modal dialog takes focus when opened R 30% 32% +2.5pp 12 / 40 13 / 40
    Checkbox Group | React | Dark ARIA attributes match native checkbox attributes if used R 0% 2% +2.5pp 0 / 40 1 / 40
    Checkbox Group | React | Dark Each checkbox has a valid role R 0% 2% +2.5pp 0 / 40 1 / 40
    Checkbox Group | React | Modern Visual labels are defined and persistent R 0% 2% +2.5pp 0 / 40 1 / 40
    Checkbox Group | Vanilla JS | Dark Visual labels are defined and persistent R 0% 2% +2.5pp 0 / 40 1 / 40
    All assertion deltas (per test case)

    Checkbox Group | React | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Helper text is programmatically associated R 90% 44% -46.4pp 36 / 40 17 / 39
    Each checkbox group has an accessible label R 70% 20% -50.0pp 28 / 40 8 / 40
    Each checkbox group has a valid role R 70% 18% -52.5pp 28 / 40 7 / 40
    ARIA attributes match native checkbox attributes if used R 0% 2% +2.5pp 0 / 40 1 / 40
    Each checkbox has a valid role R 0% 2% +2.5pp 0 / 40 1 / 40
    Checked state is programmatically exposed R 2% 2% +0.0pp 1 / 40 1 / 40
    Each checkbox has an accessible name R 5% 2% -2.5pp 2 / 40 1 / 40
    Each checkbox is in the tab order R 5% 2% -2.5pp 2 / 40 1 / 40
    Space toggles checkbox state of each checkbox R 10% 2% -7.5pp 4 / 40 1 / 40
    Visual labels are defined and persistent R 2% 0% -2.5pp 1 / 40 0 / 40
    Visible label is included in accessible name R 8% 0% -7.7pp 3 / 39 0 / 40
    Required fields are indicated programmatically R - 0% - 0 / 0 0 / 1
    Required fields are indicated visually R - 0% - 0 / 0 0 / 1

    Checkbox Group | React | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Helper text is programmatically associated R 98% 53% -44.9pp 39 / 40 20 / 38
    Each checkbox group has an accessible label R 70% 18% -52.5pp 28 / 40 7 / 40
    Each checkbox group has a valid role R 68% 15% -52.5pp 27 / 40 6 / 40
    Visible label is included in accessible name R 5% 3% -2.4pp 2 / 40 1 / 39
    Visual labels are defined and persistent R 0% 2% +2.5pp 0 / 40 1 / 40
    ARIA attributes match native checkbox attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox is in the tab order R 2% 0% -2.5pp 1 / 40 0 / 40
    Space toggles checkbox state of each checkbox R 2% 0% -2.5pp 1 / 40 0 / 40
    Each checkbox has an accessible name R 5% 0% -5.0pp 2 / 40 0 / 40
    Required fields are indicated programmatically R - - - 0 / 0 0 / 0
    Required fields are indicated visually R - - - 0 / 0 0 / 0

    Checkbox Group | Vanilla JS | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Helper text is programmatically associated R 100% 60% -40.0pp 40 / 40 24 / 40
    Each checkbox group has a valid role R 70% 10% -60.0pp 28 / 40 4 / 40
    Each checkbox group has an accessible label R 72% 10% -62.5pp 29 / 40 4 / 40
    Visual labels are defined and persistent R 0% 2% +2.5pp 0 / 40 1 / 40
    Each checkbox has an accessible name R 2% 2% +0.0pp 1 / 40 1 / 40
    Each checkbox is in the tab order R 2% 2% +0.0pp 1 / 40 1 / 40
    Space toggles checkbox state of each checkbox R 2% 2% +0.0pp 1 / 40 1 / 40
    Visible label is included in accessible name R 2% 2% +0.0pp 1 / 40 1 / 40
    ARIA attributes match native checkbox attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R - - - 0 / 0 0 / 0
    Required fields are indicated visually R - - - 0 / 0 0 / 0

    Checkbox Group | Vanilla JS | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Helper text is programmatically associated R 90% 57% -32.5pp 36 / 40 23 / 40
    Each checkbox group has an accessible label R 52% 25% -27.5pp 21 / 40 10 / 40
    Each checkbox group has a valid role R 50% 22% -27.5pp 20 / 40 9 / 40
    Each checkbox has an accessible name R 5% 2% -2.5pp 2 / 40 1 / 40
    Each checkbox is in the tab order R 5% 2% -2.5pp 2 / 40 1 / 40
    Space toggles checkbox state of each checkbox R 5% 2% -2.5pp 2 / 40 1 / 40
    Visible label is included in accessible name R 8% 2% -5.0pp 3 / 40 1 / 40
    ARIA attributes match native checkbox attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R - - - 0 / 0 0 / 0
    Required fields are indicated visually R - - - 0 / 0 0 / 0

    Disclosure Widget | React | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Collapsed content is hidden from everyone R 77% 24% -53.1pp 24 / 31 9 / 37
    All examples have a valid semantics R 18% 0% -17.5pp 7 / 40 0 / 40

    Disclosure Widget | React | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Collapsed content is hidden from everyone R 84% 29% -55.3pp 32 / 38 11 / 38
    All examples have a valid semantics R 2% 2% +0.0pp 1 / 40 1 / 40

    Disclosure Widget | Vanilla JS | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Collapsed content is hidden from everyone R 44% 22% -21.9pp 16 / 36 9 / 40
    All examples have a valid semantics R 10% 0% -10.0pp 4 / 40 0 / 40

    Disclosure Widget | Vanilla JS | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Collapsed content is hidden from everyone R 57% 25% -32.1pp 20 / 35 10 / 40
    All examples have a valid semantics R 12% 0% -12.5pp 5 / 40 0 / 40

    Modal Dialog | React | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Each modal dialog hides content behind it while open R 92% 72% -20.0pp 37 / 40 29 / 40
    Each modal dialog takes focus when opened R 65% 15% -50.0pp 26 / 40 6 / 40
    Focus is not lost when each dialog closes R 60% 12% -47.5pp 24 / 40 5 / 40
    Each modal dialog traps keyboard focus R 52% 8% -45.0pp 21 / 40 3 / 40
    Each dialog has a dialog role R 35% 5% -30.0pp 14 / 40 2 / 40
    Each dialog can be closed by escape key BP 40% 5% -35.0pp 16 / 40 2 / 40
    Closed dialogs are not exposed to assistive technology R 5% 2% -2.5pp 2 / 40 1 / 40

    Modal Dialog | React | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Each modal dialog hides content behind it while open R 85% 78% -7.5pp 34 / 40 31 / 40
    Each modal dialog takes focus when opened R 48% 22% -25.0pp 19 / 40 9 / 40
    Focus is not lost when each dialog closes R 42% 20% -22.5pp 17 / 40 8 / 40
    Closed dialogs are not exposed to assistive technology R 8% 8% +0.0pp 3 / 40 3 / 40
    Each dialog has a dialog role R 20% 8% -12.5pp 8 / 40 3 / 40
    Each dialog can be closed by escape key BP 25% 8% -17.5pp 10 / 40 3 / 40
    Each modal dialog traps keyboard focus R 32% 8% -25.0pp 13 / 40 3 / 40

    Modal Dialog | Vanilla JS | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Each modal dialog hides content behind it while open R 80% 60% -20.0pp 32 / 40 24 / 40
    Each modal dialog takes focus when opened R 30% 32% +2.5pp 12 / 40 13 / 40
    Each modal dialog traps keyboard focus R 18% 8% -10.0pp 7 / 40 3 / 40
    Each dialog can be closed by escape key BP 22% 5% -17.5pp 9 / 40 2 / 40
    Closed dialogs are not exposed to assistive technology R 2% 2% +0.0pp 1 / 40 1 / 40
    Each dialog has a dialog role R 22% 2% -20.0pp 9 / 40 1 / 40
    Focus is not lost when each dialog closes R 22% 2% -20.0pp 9 / 40 1 / 40

    Modal Dialog | Vanilla JS | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Each modal dialog hides content behind it while open R 68% 52% -15.0pp 27 / 40 21 / 40
    Each modal dialog takes focus when opened R 35% 30% -5.0pp 14 / 40 12 / 40
    Focus is not lost when each dialog closes R 20% 20% +0.0pp 8 / 40 8 / 40
    Each dialog can be closed by escape key BP 15% 15% +0.0pp 6 / 40 6 / 40
    Each dialog has a dialog role R 15% 15% +0.0pp 6 / 40 6 / 40
    Each modal dialog traps keyboard focus R 18% 10% -7.5pp 7 / 40 4 / 40
    Closed dialogs are not exposed to assistive technology R 2% 5% +2.5pp 1 / 40 2 / 40

    Radio Button Group | React | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Helper text is programmatically associated R 100% 100% +0.0pp 10 / 10 4 / 4
    Required fields are indicated visually R 100% 100% +0.0pp 20 / 20 30 / 30
    Each radio group has an accessible label R 45% 12% -32.5pp 18 / 40 5 / 40
    ARIA attributes match native radio attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 20 0 / 30
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Arrow keys change the selected radio within each group R 5% 0% -5.0pp 2 / 40 0 / 40
    Each radio group is keyboard reachable R 5% 0% -5.0pp 2 / 40 0 / 40
    Each radio has an accessible name R 5% 0% -5.0pp 2 / 40 0 / 40
    Visible label is included in accessible name R 5% 0% -5.0pp 2 / 40 0 / 40

    Radio Button Group | React | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated visually R 100% 100% +0.0pp 21 / 21 25 / 25
    Helper text is programmatically associated R 100% 89% -11.1pp 18 / 18 8 / 9
    Each radio group has an accessible label R 42% 5% -37.5pp 17 / 40 2 / 40
    ARIA attributes match native radio attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 21 0 / 25
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Arrow keys change the selected radio within each group R 5% 0% -5.0pp 2 / 40 0 / 40
    Each radio group is keyboard reachable R 5% 0% -5.0pp 2 / 40 0 / 40
    Each radio has an accessible name R 5% 0% -5.0pp 2 / 40 0 / 40
    Visible label is included in accessible name R 5% 0% -5.0pp 2 / 40 0 / 40

    Radio Button Group | Vanilla JS | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Helper text is programmatically associated R 100% 100% +0.0pp 18 / 18 1 / 1
    Required fields are indicated visually R 100% 97% -2.8pp 33 / 33 35 / 36
    Each radio group has an accessible label R 50% 8% -42.5pp 20 / 40 3 / 40
    ARIA attributes match native radio attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 33 0 / 36
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Arrow keys change the selected radio within each group R 2% 0% -2.5pp 1 / 40 0 / 40
    Each radio group is keyboard reachable R 2% 0% -2.5pp 1 / 40 0 / 40
    Each radio has an accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Visible label is included in accessible name R 2% 0% -2.5pp 1 / 40 0 / 40

    Radio Button Group | Vanilla JS | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Helper text is programmatically associated R 100% 100% +0.0pp 14 / 14 7 / 7
    Required fields are indicated visually R 100% 90% -9.7pp 35 / 35 28 / 31
    Each radio group has an accessible label R 25% 5% -20.0pp 10 / 40 2 / 40
    Arrow keys change the selected radio within each group R 2% 2% +0.0pp 1 / 40 1 / 40
    Each radio group is keyboard reachable R 2% 2% +0.0pp 1 / 40 1 / 40
    Each radio has an accessible name R 2% 2% +0.0pp 1 / 40 1 / 40
    Visible label is included in accessible name R 2% 2% +0.0pp 1 / 40 1 / 40
    ARIA attributes match native radio attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 35 0 / 31
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40

    Shopping Home Page | React | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Has a skip navigation link R 100% 32% -67.5pp 40 / 40 13 / 40
    Has a single banner R 25% 15% -10.0pp 10 / 40 6 / 40
    Has a single maincontent R 52% 12% -40.0pp 21 / 40 5 / 40
    Has a single footer R 0% 5% +5.0pp 0 / 39 2 / 40
    Has an h1 R 2% 0% -2.5pp 1 / 40 0 / 40
    Has at least one h2 R 2% 0% -2.5pp 1 / 40 0 / 40
    Has at least one navigation R 2% 0% -2.5pp 1 / 40 0 / 40
    Has single h1 BP 5% 0% -5.0pp 2 / 40 0 / 40

    Shopping Home Page | React | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Has a skip navigation link R 100% 38% -62.5pp 40 / 40 15 / 40
    Has a single banner R 22% 18% -5.0pp 9 / 40 7 / 40
    Has a single maincontent R 40% 15% -25.0pp 16 / 40 6 / 40
    Has a single footer R 10% 5% -5.0pp 4 / 40 2 / 40
    Has an h1 R 2% 0% -2.5pp 1 / 40 0 / 40
    Has at least one h2 R 2% 0% -2.5pp 1 / 40 0 / 40
    Has at least one navigation R 2% 0% -2.5pp 1 / 40 0 / 40
    Has single h1 BP 5% 0% -5.0pp 2 / 40 0 / 40

    Shopping Home Page | Vanilla JS | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Has a skip navigation link R 100% 30% -70.0pp 40 / 40 12 / 40
    Has a single maincontent R 32% 12% -20.0pp 13 / 40 5 / 40
    Has a single banner R 10% 10% +0.0pp 4 / 40 4 / 40
    Has an h1 R 0% 0% +0.0pp 0 / 40 0 / 40
    Has at least one h2 R 0% 0% +0.0pp 0 / 40 0 / 40
    Has single h1 BP 0% 0% +0.0pp 0 / 40 0 / 40
    Has at least one navigation R 2% 0% -2.5pp 1 / 40 0 / 40
    Has a single footer R 3% 0% -2.6pp 1 / 38 0 / 37

    Shopping Home Page | Vanilla JS | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Has a skip navigation link R 100% 35% -65.0pp 40 / 40 14 / 40
    Has a single banner R 10% 12% +2.5pp 4 / 40 5 / 40
    Has a single maincontent R 35% 12% -22.5pp 14 / 40 5 / 40
    Has an h1 R 0% 0% +0.0pp 0 / 40 0 / 40
    Has at least one h2 R 0% 0% +0.0pp 0 / 40 0 / 40
    Has single h1 BP 0% 0% +0.0pp 0 / 40 0 / 40
    Has at least one navigation R 2% 0% -2.5pp 1 / 40 0 / 40
    Has a single footer R 3% 0% -3.0pp 1 / 33 0 / 39

    Simple Contact Form | React | Dark | Error Message Present

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 60% 45% -15.0pp 24 / 40 18 / 40
    Required fields are indicated visually R 16% 10% -6.0pp 6 / 37 4 / 39
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 38 0 / 33
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Visible label is included in accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Helper text is programmatically associated R 15% 0% -15.0pp 6 / 40 0 / 40
    Required fields are indicated programmatically R 30% 0% -29.7pp 11 / 37 0 / 39

    Simple Contact Form | React | Dark | No Error Message

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 52% 48% -5.0pp 21 / 40 19 / 40
    Required fields are indicated visually R 36% 25% -10.9pp 14 / 39 10 / 40
    Required fields are indicated programmatically R 8% 2% -5.2pp 3 / 39 1 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 34 0 / 30
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Visible label is included in accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Helper text is programmatically associated R 18% 0% -17.5pp 7 / 40 0 / 40

    Simple Contact Form | React | Modern | Error Message Present

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 52% 48% -5.0pp 21 / 40 19 / 40
    Required fields are indicated visually R 23% 15% -7.5pp 8 / 35 6 / 39
    Required fields are indicated programmatically R 23% 5% -17.7pp 8 / 35 2 / 39
    Helper text is programmatically associated R 0% 2% +2.5pp 0 / 40 1 / 40
    Visible label is included in accessible name R 0% 2% +2.5pp 0 / 40 1 / 40
    Visual labels are defined and persistent R 0% 2% +2.5pp 0 / 40 1 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 40 0 / 33
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40

    Simple Contact Form | React | Modern | No Error Message

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 55% 35% -20.0pp 22 / 40 14 / 40
    Required fields are indicated visually R 44% 32% -11.1pp 17 / 39 13 / 40
    Each text input has textbox role R 0% 2% +2.5pp 0 / 40 1 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 36 0 / 32
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Visible label is included in accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Visual labels are defined and persistent R 2% 0% -2.5pp 1 / 40 0 / 40
    Required fields are indicated programmatically R 8% 0% -7.7pp 3 / 39 0 / 40
    Helper text is programmatically associated R 12% 0% -12.5pp 5 / 40 0 / 40

    Simple Contact Form | Vanilla JS | Dark | Error Message Present

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 40% 35% -5.0pp 16 / 40 14 / 40
    Required fields are indicated visually R 46% 35% -11.2pp 18 / 39 14 / 40
    Required fields are indicated programmatically R 8% 8% -0.2pp 3 / 39 3 / 40
    Visible label is included in accessible name R 8% 2% -5.0pp 3 / 40 1 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 35 0 / 28
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Helper text is programmatically associated R 12% 0% -12.5pp 5 / 40 0 / 40

    Simple Contact Form | Vanilla JS | Dark | No Error Message

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated visually R 70% 35% -35.0pp 28 / 40 14 / 40
    Inputs use appropriate autocomplete for purpose R 40% 32% -7.5pp 16 / 40 13 / 40
    Visible label is included in accessible name R 8% 5% -2.5pp 3 / 40 2 / 40
    Helper text is programmatically associated R 15% 2% -12.5pp 6 / 40 1 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 32 0 / 23
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 5% 0% -5.0pp 2 / 40 0 / 40

    Simple Contact Form | Vanilla JS | Modern | Error Message Present

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated visually R 45% 38% -7.5pp 18 / 40 15 / 40
    Inputs use appropriate autocomplete for purpose R 38% 35% -2.5pp 15 / 40 14 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 36 0 / 28
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 2% 0% -2.5pp 1 / 40 0 / 40
    Visual labels are defined and persistent R 2% 0% -2.5pp 1 / 40 0 / 40
    Helper text is programmatically associated R 8% 0% -7.5pp 3 / 40 0 / 40
    Visible label is included in accessible name R 12% 0% -12.5pp 5 / 40 0 / 40

    Simple Contact Form | Vanilla JS | Modern | No Error Message

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated visually R 72% 48% -25.0pp 29 / 40 19 / 40
    Inputs use appropriate autocomplete for purpose R 38% 22% -15.0pp 15 / 40 9 / 40
    Helper text is programmatically associated R 15% 2% -12.5pp 6 / 40 1 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 32 0 / 22
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 2% 0% -2.5pp 1 / 40 0 / 40
    Visual labels are defined and persistent R 5% 0% -5.0pp 2 / 40 0 / 40
    Visible label is included in accessible name R 12% 0% -12.5pp 5 / 40 0 / 40

    Single Checkbox | React | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Helper text is programmatically associated R 74% 25% -49.4pp 29 / 39 10 / 40
    Required fields are indicated visually R 15% 12% -2.5pp 6 / 40 5 / 40
    Visible label is included in accessible name R 0% 5% +5.0pp 0 / 39 2 / 40
    Required fields are indicated programmatically R 0% 2% +2.5pp 0 / 40 1 / 40
    ARIA attributes match native checkbox attributes if used R 2% 2% +0.0pp 1 / 40 1 / 40
    Checked state is programmatically exposed R 2% 0% -2.5pp 1 / 40 0 / 40
    Each checkbox has a valid role R 2% 0% -2.5pp 1 / 40 0 / 40
    Each checkbox has an accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Each checkbox is keyboard reachable R 2% 0% -2.5pp 1 / 40 0 / 40
    Space toggles checkbox state R 2% 0% -2.5pp 1 / 40 0 / 40
    Visual labels are defined and persistent R 2% 0% -2.5pp 1 / 40 0 / 40

    Single Checkbox | React | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated visually R 18% 15% -2.5pp 7 / 40 6 / 40
    Helper text is programmatically associated R 62% 12% -50.0pp 25 / 40 5 / 40
    Visible label is included in accessible name R 0% 3% +2.6pp 0 / 38 1 / 39
    ARIA attributes match native checkbox attributes if used R 0% 2% +2.5pp 0 / 40 1 / 40
    Required fields are indicated programmatically R 0% 2% +2.5pp 0 / 40 1 / 40
    Visual labels are defined and persistent R 5% 2% -2.5pp 2 / 40 1 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox is keyboard reachable R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has an accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Space toggles checkbox state R 2% 0% -2.5pp 1 / 40 0 / 40

    Single Checkbox | Vanilla JS | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated visually R 5% 15% +10.0pp 2 / 40 6 / 40
    Helper text is programmatically associated R 80% 10% -70.0pp 32 / 40 4 / 40
    ARIA attributes match native checkbox attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox is keyboard reachable R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 40 0 / 40
    Space toggles checkbox state R 0% 0% +0.0pp 0 / 40 0 / 40
    Visible label is included in accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40

    Single Checkbox | Vanilla JS | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Helper text is programmatically associated R 74% 12% -61.9pp 29 / 39 5 / 40
    Required fields are indicated visually R 20% 10% -10.0pp 8 / 40 4 / 40
    Visible label is included in accessible name R 2% 8% +5.0pp 1 / 40 3 / 40
    Each checkbox has an accessible name R 2% 2% +0.0pp 1 / 40 1 / 40
    Each checkbox is keyboard reachable R 2% 2% +0.0pp 1 / 40 1 / 40
    Space toggles checkbox state R 2% 2% +0.0pp 1 / 40 1 / 40
    ARIA attributes match native checkbox attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    1. Basic — overall Δ pass rate +48.5pp

    Overall: Control 12% (n=1280) → Variant 60% (n=1280). Avg WCAG failures/sample: 3.93 → 1.63 (Δ -2.30).

    Most improved test cases

    Test case Control pass rate* Variant pass rate* Δ pass rate* Δ avg WCAG failures
    Single Checkbox | Vanilla JS | Modern 10% 80% +70.0pp -1.32
    Single Checkbox | React | Modern 8% 78% +70.0pp -1.03
    Single Checkbox | Vanilla JS | Dark 5% 75% +70.0pp -0.88
    Checkbox Group | Vanilla JS | Dark 0% 68% +67.5pp -2.85
    Checkbox Group | React | Modern 0% 62% +62.5pp -2.20

    Most reduced axe WCAG rules

    Rule Control rate Variant rate Δ rate Description
    color-contrast 41.6% 24.0% -17.6pp Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
    button-name 1.1% 0.0% -1.1pp Ensure buttons have discernible text
    aria-required-children 1.1% 0.2% -0.9pp Ensure elements with an ARIA role that require child roles contain them
    link-name 0.6% 0.0% -0.6pp Ensure links have discernible text
    link-in-text-block 0.6% 0.2% -0.4pp Ensure links are distinguished from surrounding text in a way that does not rely on color

    Most increased axe WCAG rules

    Rule Control rate Variant rate Δ rate Description
    aria-prohibited-attr 0.2% 1.0% +0.8pp Ensure ARIA attributes are not prohibited for an element's role
    aria-allowed-attr 0.4% 1.0% +0.6pp Ensure an element's role supports its ARIA attributes
    aria-hidden-focus 0.0% 0.2% +0.2pp Ensure aria-hidden elements are not focusable nor contain focusable elements
    definition-list 0.0% 0.1% +0.1pp Ensure <dl> elements are structured correctly
    nested-interactive 0.1% 0.2% +0.1pp Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies

    Assertion analysis (vs control)

    Failure rates are computed per assertion (within each test case) and compared between the variant and control.

    Most improved assertions

    Test case Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Radio Button Group | Vanilla JS | Dark Helper text is programmatically associated R 100% 0% -100.0pp 18 / 18 0 / 9
    Shopping Home Page | Vanilla JS | Dark Has a skip navigation link R 100% 0% -100.0pp 40 / 40 0 / 40
    Shopping Home Page | Vanilla JS | Modern Has a skip navigation link R 100% 5% -95.0pp 40 / 40 2 / 40
    Shopping Home Page | React | Dark Has a skip navigation link R 100% 8% -92.5pp 40 / 40 3 / 40
    Shopping Home Page | React | Modern Has a skip navigation link R 100% 8% -92.5pp 40 / 40 3 / 40
    Radio Button Group | React | Dark Helper text is programmatically associated R 100% 16% -84.2pp 10 / 10 3 / 19
    Single Checkbox | Vanilla JS | Dark Helper text is programmatically associated R 80% 2% -77.5pp 32 / 40 1 / 40
    Checkbox Group | Vanilla JS | Dark Helper text is programmatically associated R 100% 23% -76.9pp 40 / 40 9 / 39
    Disclosure Widget | React | Modern Collapsed content is hidden from everyone R 84% 8% -75.9pp 32 / 38 3 / 36
    Checkbox Group | React | Dark Helper text is programmatically associated R 90% 15% -74.6pp 36 / 40 6 / 39

    Most regressed assertions

    Test case Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Simple Contact Form | Vanilla JS | Dark | No Error Message Required fields are indicated programmatically R 5% 15% +10.0pp 2 / 40 6 / 40
    Single Checkbox | React | Modern Visible label is included in accessible name R 0% 8% +7.5pp 0 / 38 3 / 40
    Single Checkbox | Vanilla JS | Dark Visible label is included in accessible name R 0% 5% +5.0pp 0 / 40 2 / 40
    Radio Button Group | React | Modern Required fields are indicated programmatically R 0% 3% +3.1pp 0 / 21 1 / 32
    Single Checkbox | Vanilla JS | Modern Visible label is included in accessible name R 2% 5% +2.6pp 1 / 40 2 / 39
    Checkbox Group | React | Modern ARIA attributes match native checkbox attributes if used R 0% 2% +2.5pp 0 / 40 1 / 40
    Radio Button Group | React | Dark ARIA attributes match native radio attributes if used R 0% 2% +2.5pp 0 / 40 1 / 40
    Radio Button Group | React | Dark Checked state is programmatically exposed R 0% 2% +2.5pp 0 / 40 1 / 40
    Simple Contact Form | React | Dark | Error Message Present Visible label is included in accessible name R 0% 2% +2.5pp 0 / 40 1 / 40
    Simple Contact Form | Vanilla JS | Modern | Error Message Present Required fields are indicated programmatically R 2% 5% +2.5pp 1 / 40 2 / 40
    All assertion deltas (per test case)

    Checkbox Group | React | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated programmatically R - 100% - 0 / 0 1 / 1
    Helper text is programmatically associated R 90% 15% -74.6pp 36 / 40 6 / 39
    Each checkbox group has a valid role R 70% 2% -67.5pp 28 / 40 1 / 40
    Each checkbox group has an accessible label R 70% 2% -67.5pp 28 / 40 1 / 40
    ARIA attributes match native checkbox attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 2% 0% -2.5pp 1 / 40 0 / 40
    Visual labels are defined and persistent R 2% 0% -2.5pp 1 / 40 0 / 40
    Each checkbox has an accessible name R 5% 0% -5.0pp 2 / 40 0 / 40
    Each checkbox is in the tab order R 5% 0% -5.0pp 2 / 40 0 / 40
    Visible label is included in accessible name R 8% 0% -7.7pp 3 / 39 0 / 40
    Space toggles checkbox state of each checkbox R 10% 0% -10.0pp 4 / 40 0 / 40
    Required fields are indicated visually R - 0% - 0 / 0 0 / 1

    Checkbox Group | React | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated programmatically R - 100% - 0 / 0 1 / 1
    Helper text is programmatically associated R 98% 25% -72.5pp 39 / 40 10 / 40
    Each checkbox group has an accessible label R 70% 12% -57.5pp 28 / 40 5 / 40
    Each checkbox group has a valid role R 68% 8% -60.0pp 27 / 40 3 / 40
    ARIA attributes match native checkbox attributes if used R 0% 2% +2.5pp 0 / 40 1 / 40
    Each checkbox is in the tab order R 2% 2% +0.0pp 1 / 40 1 / 40
    Space toggles checkbox state of each checkbox R 2% 2% +0.0pp 1 / 40 1 / 40
    Each checkbox has an accessible name R 5% 2% -2.5pp 2 / 40 1 / 40
    Visible label is included in accessible name R 5% 2% -2.5pp 2 / 40 1 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated visually R - 0% - 0 / 0 0 / 1

    Checkbox Group | Vanilla JS | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Helper text is programmatically associated R 100% 23% -76.9pp 40 / 40 9 / 39
    Each checkbox group has an accessible label R 72% 2% -70.0pp 29 / 40 1 / 40
    ARIA attributes match native checkbox attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has an accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Each checkbox is in the tab order R 2% 0% -2.5pp 1 / 40 0 / 40
    Space toggles checkbox state of each checkbox R 2% 0% -2.5pp 1 / 40 0 / 40
    Visible label is included in accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Each checkbox group has a valid role R 70% 0% -70.0pp 28 / 40 0 / 40
    Required fields are indicated programmatically R - 0% - 0 / 0 0 / 1
    Required fields are indicated visually R - 0% - 0 / 0 0 / 1

    Checkbox Group | Vanilla JS | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Helper text is programmatically associated R 90% 35% -55.0pp 36 / 40 14 / 40
    Each checkbox group has an accessible label R 52% 5% -47.5pp 21 / 40 2 / 40
    ARIA attributes match native checkbox attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has an accessible name R 5% 0% -5.0pp 2 / 40 0 / 40
    Each checkbox is in the tab order R 5% 0% -5.0pp 2 / 40 0 / 40
    Space toggles checkbox state of each checkbox R 5% 0% -5.0pp 2 / 40 0 / 40
    Visible label is included in accessible name R 8% 0% -7.5pp 3 / 40 0 / 40
    Each checkbox group has a valid role R 50% 0% -50.0pp 20 / 40 0 / 40
    Required fields are indicated programmatically R - - - 0 / 0 0 / 0
    Required fields are indicated visually R - - - 0 / 0 0 / 0

    Disclosure Widget | React | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Collapsed content is hidden from everyone R 77% 20% -57.4pp 24 / 31 8 / 40
    All examples have a valid semantics R 18% 0% -17.5pp 7 / 40 0 / 40

    Disclosure Widget | React | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Collapsed content is hidden from everyone R 84% 8% -75.9pp 32 / 38 3 / 36
    All examples have a valid semantics R 2% 0% -2.5pp 1 / 40 0 / 40

    Disclosure Widget | Vanilla JS | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Collapsed content is hidden from everyone R 44% 11% -33.9pp 16 / 36 4 / 38
    All examples have a valid semantics R 10% 0% -10.0pp 4 / 40 0 / 40

    Disclosure Widget | Vanilla JS | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Collapsed content is hidden from everyone R 57% 2% -54.6pp 20 / 35 1 / 40
    All examples have a valid semantics R 12% 0% -12.5pp 5 / 40 0 / 40

    Modal Dialog | React | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Each modal dialog hides content behind it while open R 92% 40% -52.5pp 37 / 40 16 / 40
    Each dialog can be closed by escape key BP 40% 8% -32.5pp 16 / 40 3 / 40
    Focus is not lost when each dialog closes R 60% 8% -52.5pp 24 / 40 3 / 40
    Each modal dialog takes focus when opened R 65% 5% -60.0pp 26 / 40 2 / 40
    Each dialog has a dialog role R 35% 2% -32.5pp 14 / 40 1 / 40
    Each modal dialog traps keyboard focus R 52% 2% -50.0pp 21 / 40 1 / 40
    Closed dialogs are not exposed to assistive technology R 5% 0% -5.0pp 2 / 40 0 / 40

    Modal Dialog | React | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Each modal dialog hides content behind it while open R 85% 35% -50.0pp 34 / 40 14 / 40
    Each dialog can be closed by escape key BP 25% 5% -20.0pp 10 / 40 2 / 40
    Focus is not lost when each dialog closes R 42% 5% -37.5pp 17 / 40 2 / 40
    Each modal dialog takes focus when opened R 48% 5% -42.5pp 19 / 40 2 / 40
    Each dialog has a dialog role R 20% 2% -17.5pp 8 / 40 1 / 40
    Each modal dialog traps keyboard focus R 32% 2% -30.0pp 13 / 40 1 / 40
    Closed dialogs are not exposed to assistive technology R 8% 0% -7.5pp 3 / 40 0 / 40

    Modal Dialog | Vanilla JS | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Each modal dialog hides content behind it while open R 80% 10% -70.0pp 32 / 40 4 / 40
    Each modal dialog takes focus when opened R 30% 5% -25.0pp 12 / 40 2 / 40
    Closed dialogs are not exposed to assistive technology R 2% 0% -2.5pp 1 / 40 0 / 40
    Each modal dialog traps keyboard focus R 18% 0% -17.5pp 7 / 40 0 / 40
    Each dialog can be closed by escape key BP 22% 0% -22.5pp 9 / 40 0 / 40
    Each dialog has a dialog role R 22% 0% -22.5pp 9 / 40 0 / 40
    Focus is not lost when each dialog closes R 22% 0% -22.5pp 9 / 40 0 / 40

    Modal Dialog | Vanilla JS | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Each modal dialog hides content behind it while open R 68% 15% -52.5pp 27 / 40 6 / 40
    Each dialog can be closed by escape key BP 15% 5% -10.0pp 6 / 40 2 / 40
    Each dialog has a dialog role R 15% 5% -10.0pp 6 / 40 2 / 40
    Each modal dialog traps keyboard focus R 18% 5% -12.5pp 7 / 40 2 / 40
    Focus is not lost when each dialog closes R 20% 5% -15.0pp 8 / 40 2 / 40
    Each modal dialog takes focus when opened R 35% 5% -30.0pp 14 / 40 2 / 40
    Closed dialogs are not exposed to assistive technology R 2% 2% +0.0pp 1 / 40 1 / 40

    Radio Button Group | React | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated visually R 100% 58% -42.4pp 20 / 20 19 / 33
    Helper text is programmatically associated R 100% 16% -84.2pp 10 / 10 3 / 19
    Each radio group has an accessible label R 45% 8% -37.5pp 18 / 40 3 / 40
    ARIA attributes match native radio attributes if used R 0% 2% +2.5pp 0 / 40 1 / 40
    Checked state is programmatically exposed R 0% 2% +2.5pp 0 / 40 1 / 40
    Arrow keys change the selected radio within each group R 5% 2% -2.5pp 2 / 40 1 / 40
    Each radio group is keyboard reachable R 5% 2% -2.5pp 2 / 40 1 / 40
    Each radio has an accessible name R 5% 2% -2.5pp 2 / 40 1 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 20 0 / 32
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Visible label is included in accessible name R 5% 0% -5.0pp 2 / 40 0 / 40

    Radio Button Group | React | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated visually R 100% 52% -48.5pp 21 / 21 17 / 33
    Helper text is programmatically associated R 100% 26% -73.9pp 18 / 18 6 / 23
    Required fields are indicated programmatically R 0% 3% +3.1pp 0 / 21 1 / 32
    Each radio group has an accessible label R 42% 2% -40.0pp 17 / 40 1 / 40
    ARIA attributes match native radio attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Arrow keys change the selected radio within each group R 5% 0% -5.0pp 2 / 40 0 / 40
    Each radio group is keyboard reachable R 5% 0% -5.0pp 2 / 40 0 / 40
    Each radio has an accessible name R 5% 0% -5.0pp 2 / 40 0 / 40
    Visible label is included in accessible name R 5% 0% -5.0pp 2 / 40 0 / 40

    Radio Button Group | Vanilla JS | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated visually R 100% 56% -44.4pp 33 / 33 20 / 36
    Each radio group has an accessible label R 50% 10% -40.0pp 20 / 40 4 / 40
    Arrow keys change the selected radio within each group R 2% 2% +0.0pp 1 / 40 1 / 40
    ARIA attributes match native radio attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 33 0 / 36
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Each radio group is keyboard reachable R 2% 0% -2.5pp 1 / 40 0 / 40
    Each radio has an accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Visible label is included in accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Helper text is programmatically associated R 100% 0% -100.0pp 18 / 18 0 / 9

    Radio Button Group | Vanilla JS | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated visually R 100% 65% -35.1pp 35 / 35 24 / 37
    Helper text is programmatically associated R 100% 27% -72.7pp 14 / 14 3 / 11
    Each radio group has an accessible label R 25% 10% -15.0pp 10 / 40 4 / 40
    ARIA attributes match native radio attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 35 0 / 37
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Arrow keys change the selected radio within each group R 2% 0% -2.5pp 1 / 40 0 / 40
    Each radio group is keyboard reachable R 2% 0% -2.5pp 1 / 40 0 / 40
    Each radio has an accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Visible label is included in accessible name R 2% 0% -2.5pp 1 / 40 0 / 40

    Shopping Home Page | React | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Has a single banner R 25% 8% -17.5pp 10 / 40 3 / 40
    Has a skip navigation link R 100% 8% -92.5pp 40 / 40 3 / 40
    Has a single maincontent R 52% 5% -47.5pp 21 / 40 2 / 40
    Has an h1 R 2% 2% +0.0pp 1 / 40 1 / 40
    Has at least one h2 R 2% 2% +0.0pp 1 / 40 1 / 40
    Has at least one navigation R 2% 2% +0.0pp 1 / 40 1 / 40
    Has single h1 BP 5% 2% -2.5pp 2 / 40 1 / 40
    Has a single footer R 0% 0% +0.0pp 0 / 39 0 / 39

    Shopping Home Page | React | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Has a skip navigation link R 100% 8% -92.5pp 40 / 40 3 / 40
    Has a single banner R 22% 5% -17.5pp 9 / 40 2 / 40
    Has an h1 R 2% 0% -2.5pp 1 / 40 0 / 40
    Has at least one h2 R 2% 0% -2.5pp 1 / 40 0 / 40
    Has at least one navigation R 2% 0% -2.5pp 1 / 40 0 / 40
    Has single h1 BP 5% 0% -5.0pp 2 / 40 0 / 40
    Has a single footer R 10% 0% -10.0pp 4 / 40 0 / 40
    Has a single maincontent R 40% 0% -40.0pp 16 / 40 0 / 40

    Shopping Home Page | Vanilla JS | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Has a single banner R 10% 2% -7.5pp 4 / 40 1 / 40
    Has an h1 R 0% 0% +0.0pp 0 / 40 0 / 40
    Has at least one h2 R 0% 0% +0.0pp 0 / 40 0 / 40
    Has single h1 BP 0% 0% +0.0pp 0 / 40 0 / 40
    Has at least one navigation R 2% 0% -2.5pp 1 / 40 0 / 40
    Has a single footer R 3% 0% -2.6pp 1 / 38 0 / 40
    Has a single maincontent R 32% 0% -32.5pp 13 / 40 0 / 40
    Has a skip navigation link R 100% 0% -100.0pp 40 / 40 0 / 40

    Shopping Home Page | Vanilla JS | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Has a single banner R 10% 10% +0.0pp 4 / 40 4 / 40
    Has a skip navigation link R 100% 5% -95.0pp 40 / 40 2 / 40
    Has a single maincontent R 35% 2% -32.5pp 14 / 40 1 / 40
    Has an h1 R 0% 0% +0.0pp 0 / 40 0 / 40
    Has at least one h2 R 0% 0% +0.0pp 0 / 40 0 / 40
    Has single h1 BP 0% 0% +0.0pp 0 / 40 0 / 40
    Has at least one navigation R 2% 0% -2.5pp 1 / 40 0 / 40
    Has a single footer R 3% 0% -3.0pp 1 / 33 0 / 40

    Simple Contact Form | React | Dark | Error Message Present

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 60% 30% -30.0pp 24 / 40 12 / 40
    Required fields are indicated programmatically R 30% 5% -24.7pp 11 / 37 2 / 40
    Visible label is included in accessible name R 0% 2% +2.5pp 0 / 40 1 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 38 0 / 22
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Helper text is programmatically associated R 15% 0% -15.0pp 6 / 40 0 / 40
    Required fields are indicated visually R 16% 0% -16.2pp 6 / 37 0 / 40

    Simple Contact Form | React | Dark | No Error Message

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 52% 30% -22.5pp 21 / 40 12 / 40
    Required fields are indicated programmatically R 8% 8% -0.2pp 3 / 39 3 / 40
    Required fields are indicated visually R 36% 5% -30.9pp 14 / 39 2 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 34 0 / 20
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Visible label is included in accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Helper text is programmatically associated R 18% 0% -17.5pp 7 / 40 0 / 40

    Simple Contact Form | React | Modern | Error Message Present

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 52% 30% -22.5pp 21 / 40 12 / 40
    Required fields are indicated programmatically R 23% 2% -20.4pp 8 / 35 1 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Helper text is programmatically associated R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 40 0 / 24
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Visible label is included in accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated visually R 23% 0% -22.9pp 8 / 35 0 / 40

    Simple Contact Form | React | Modern | No Error Message

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 55% 32% -22.5pp 22 / 40 13 / 40
    Required fields are indicated programmatically R 8% 5% -2.7pp 3 / 39 2 / 40
    Required fields are indicated visually R 44% 5% -38.6pp 17 / 39 2 / 40
    Visual labels are defined and persistent R 2% 2% +0.0pp 1 / 40 1 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 36 0 / 20
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Visible label is included in accessible name R 2% 0% -2.5pp 1 / 40 0 / 39
    Helper text is programmatically associated R 12% 0% -12.5pp 5 / 40 0 / 40

    Simple Contact Form | Vanilla JS | Dark | Error Message Present

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 40% 28% -12.5pp 16 / 40 11 / 40
    Required fields are indicated programmatically R 8% 10% +2.3pp 3 / 39 4 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 35 0 / 18
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Visible label is included in accessible name R 8% 0% -7.5pp 3 / 40 0 / 40
    Helper text is programmatically associated R 12% 0% -12.5pp 5 / 40 0 / 40
    Required fields are indicated visually R 46% 0% -46.2pp 18 / 39 0 / 40

    Simple Contact Form | Vanilla JS | Dark | No Error Message

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 40% 30% -10.0pp 16 / 40 12 / 40
    Required fields are indicated programmatically R 5% 15% +10.0pp 2 / 40 6 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 32 0 / 12
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40
    Visible label is included in accessible name R 8% 0% -7.5pp 3 / 40 0 / 40
    Helper text is programmatically associated R 15% 0% -15.0pp 6 / 40 0 / 40
    Required fields are indicated visually R 70% 0% -70.0pp 28 / 40 0 / 40

    Simple Contact Form | Vanilla JS | Modern | Error Message Present

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 38% 22% -15.0pp 15 / 40 9 / 40
    Required fields are indicated programmatically R 2% 5% +2.5pp 1 / 40 2 / 40
    Visual labels are defined and persistent R 2% 2% +0.0pp 1 / 40 1 / 40
    Visible label is included in accessible name R 12% 2% -10.0pp 5 / 40 1 / 40
    Required fields are indicated visually R 45% 2% -42.5pp 18 / 40 1 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 36 0 / 13
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Helper text is programmatically associated R 8% 0% -7.5pp 3 / 40 0 / 40

    Simple Contact Form | Vanilla JS | Modern | No Error Message

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Inputs use appropriate autocomplete for purpose R 38% 30% -7.5pp 15 / 40 12 / 40
    Each text input has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each text input has textbox role R 0% 0% +0.0pp 0 / 40 0 / 40
    Placeholder text is programmatically defined as a property R 0% 0% +0.0pp 0 / 32 0 / 18
    Text inputs are keyboard focusable R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 2% 0% -2.5pp 1 / 40 0 / 40
    Visual labels are defined and persistent R 5% 0% -5.0pp 2 / 40 0 / 40
    Visible label is included in accessible name R 12% 0% -12.5pp 5 / 40 0 / 40
    Helper text is programmatically associated R 15% 0% -15.0pp 6 / 40 0 / 40
    Required fields are indicated visually R 72% 0% -72.5pp 29 / 40 0 / 40

    Single Checkbox | React | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Required fields are indicated visually R 15% 10% -5.0pp 6 / 40 4 / 40
    Space toggles checkbox state R 2% 5% +2.5pp 1 / 40 2 / 40
    Helper text is programmatically associated R 74% 2% -71.9pp 29 / 39 1 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 40 0 / 40
    Visible label is included in accessible name R 0% 0% +0.0pp 0 / 39 0 / 40
    ARIA attributes match native checkbox attributes if used R 2% 0% -2.5pp 1 / 40 0 / 40
    Checked state is programmatically exposed R 2% 0% -2.5pp 1 / 40 0 / 40
    Each checkbox has a valid role R 2% 0% -2.5pp 1 / 40 0 / 40
    Each checkbox has an accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Each checkbox is keyboard reachable R 2% 0% -2.5pp 1 / 40 0 / 40
    Visual labels are defined and persistent R 2% 0% -2.5pp 1 / 40 0 / 40

    Single Checkbox | React | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Visible label is included in accessible name R 0% 8% +7.5pp 0 / 38 3 / 40
    Required fields are indicated visually R 18% 5% -12.5pp 7 / 40 2 / 40
    ARIA attributes match native checkbox attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox is keyboard reachable R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has an accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Space toggles checkbox state R 2% 0% -2.5pp 1 / 40 0 / 40
    Visual labels are defined and persistent R 5% 0% -5.0pp 2 / 40 0 / 40
    Helper text is programmatically associated R 62% 0% -62.5pp 25 / 40 0 / 40

    Single Checkbox | Vanilla JS | Dark

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Visible label is included in accessible name R 0% 5% +5.0pp 0 / 40 2 / 40
    Required fields are indicated visually R 5% 5% +0.0pp 2 / 40 2 / 40
    Helper text is programmatically associated R 80% 2% -77.5pp 32 / 40 1 / 40
    ARIA attributes match native checkbox attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has an accessible name R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox is keyboard reachable R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 40 0 / 40
    Space toggles checkbox state R 0% 0% +0.0pp 0 / 40 0 / 40
    Visual labels are defined and persistent R 0% 0% +0.0pp 0 / 40 0 / 40

    Single Checkbox | Vanilla JS | Modern

    Assertion Type Control fail rate Variant fail rate Δ fail rate Control failures/total Variant failures/total
    Visible label is included in accessible name R 2% 5% +2.6pp 1 / 40 2 / 39
    Required fields are indicated visually R 20% 5% -15.0pp 8 / 40 2 / 40
    Visual labels are defined and persistent R 0% 2% +2.5pp 0 / 40 1 / 40
    Helper text is programmatically associated R 74% 2% -71.9pp 29 / 39 1 / 40
    ARIA attributes match native checkbox attributes if used R 0% 0% +0.0pp 0 / 40 0 / 40
    Checked state is programmatically exposed R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has a valid role R 0% 0% +0.0pp 0 / 40 0 / 40
    Required fields are indicated programmatically R 0% 0% +0.0pp 0 / 40 0 / 40
    Each checkbox has an accessible name R 2% 0% -2.5pp 1 / 40 0 / 40
    Each checkbox is keyboard reachable R 2% 0% -2.5pp 1 / 40 0 / 40
    Space toggles checkbox state R 2% 0% -2.5pp 1 / 40 0 / 40

    Skills (vs Control)

    Skills are self-contained packages (a directory containing SKILL.md and any support files) that are mounted into the sandboxed agent at runtime. Each skill defines its own multi-turn conversation; the agent's submission at the end of each turn is evaluated separately so we can compare how each turn performs against control.

    Note on interpretation. Turn 1 is a single-turn generation directly comparable to control. Later turns operate on prior context, so their Δ reflects both the skill package content and the effect of having a review opportunity.

    * Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.

    Building Accessible UI

    A skill that implicitly steers generation toward accessible HTML and can be explicitly invoked to review and remediate previously produced HTML.

    Rank Model Control* Generate* Review* Δ last vs control* Δ last vs turn 1*
    1 Gemini 3.1 Pro Preview 8% 91% 96% +88.1pp +5.6pp
    2 GPT-5.4 Mini 25% 93% 94% +69.4pp +1.3pp
    3 GPT-5.4 15% 91% 92% +76.9pp +0.6pp
    4 Claude Opus 4.7 14% 91% 91% +76.9pp +0.0pp
    5 Gemini 3 Flash Preview 6% 78% 91% +85.0pp +13.1pp
    6 GPT-5.5 18% 86% 89% +71.2pp +3.1pp
    7 Claude Sonnet 4.6 6% 76% 83% +76.9pp +6.9pp
    8 Claude Haiku 4.5 3% 52% 56% +52.5pp +3.1pp
    Pass rate by test case
    Checkbox Group | Vanilla JS | Dark
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 80% 80% +80.0pp
    Claude Opus 4.7 0% 100% 80% +80.0pp
    Claude Sonnet 4.6 0% 100% 100% +100.0pp
    Gemini 3 Flash Preview 0% 100% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 0% 100% 100% +100.0pp
    GPT-5.4 Mini 0% 100% 100% +100.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Checkbox Group | Vanilla JS | Modern
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 80% 80% +80.0pp
    Claude Opus 4.7 0% 100% 100% +100.0pp
    Claude Sonnet 4.6 0% 80% 80% +80.0pp
    Gemini 3 Flash Preview 0% 100% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 0% 100% 100% +100.0pp
    GPT-5.4 Mini 0% 100% 100% +100.0pp
    GPT-5.5 40% 100% 100% +60.0pp
    Checkbox Group | React | Dark
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 40% 40% +40.0pp
    Claude Opus 4.7 0% 80% 80% +80.0pp
    Claude Sonnet 4.6 0% 100% 100% +100.0pp
    Gemini 3 Flash Preview 0% 80% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 0% 80% 80% +80.0pp
    GPT-5.4 Mini 0% 100% 100% +100.0pp
    GPT-5.5 80% 100% 100% +20.0pp
    Checkbox Group | React | Modern
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 60% 60% +60.0pp
    Claude Opus 4.7 0% 80% 80% +80.0pp
    Claude Sonnet 4.6 0% 80% 80% +80.0pp
    Gemini 3 Flash Preview 0% 60% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 0% 80% 100% +100.0pp
    GPT-5.4 Mini 0% 80% 80% +80.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Disclosure Widget | Vanilla JS | Dark
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 20% 80% 100% +80.0pp
    Claude Opus 4.7 0% 100% 100% +100.0pp
    Claude Sonnet 4.6 40% 100% 100% +60.0pp
    Gemini 3 Flash Preview 100% 40% 100% +0.0pp
    Gemini 3.1 Pro Preview 100% 100% 100% +0.0pp
    GPT-5.4 40% 100% 100% +60.0pp
    GPT-5.4 Mini 80% 100% 100% +20.0pp
    GPT-5.5 20% 100% 100% +80.0pp
    Disclosure Widget | Vanilla JS | Modern
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 60% 60% +60.0pp
    Claude Opus 4.7 0% 80% 80% +80.0pp
    Claude Sonnet 4.6 20% 100% 100% +80.0pp
    Gemini 3 Flash Preview 60% 60% 100% +40.0pp
    Gemini 3.1 Pro Preview 60% 100% 100% +40.0pp
    GPT-5.4 20% 100% 100% +80.0pp
    GPT-5.4 Mini 80% 100% 100% +20.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Disclosure Widget | React | Dark
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 20% 60% 80% +60.0pp
    Claude Opus 4.7 0% 100% 100% +100.0pp
    Claude Sonnet 4.6 0% 100% 100% +100.0pp
    Gemini 3 Flash Preview 20% 60% 100% +80.0pp
    Gemini 3.1 Pro Preview 0% 80% 100% +100.0pp
    GPT-5.4 40% 100% 100% +60.0pp
    GPT-5.4 Mini 100% 100% 100% +0.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Disclosure Widget | React | Modern
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 60% 80% 100% +40.0pp
    Claude Opus 4.7 0% 100% 100% +100.0pp
    Claude Sonnet 4.6 0% 80% 80% +80.0pp
    Gemini 3 Flash Preview 0% 100% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 0% 100% 100% +100.0pp
    GPT-5.4 Mini 40% 80% 100% +60.0pp
    GPT-5.5 20% 100% 100% +80.0pp
    Modal Dialog | Vanilla JS | Dark
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 60% 60% +60.0pp
    Claude Opus 4.7 0% 80% 100% +100.0pp
    Claude Sonnet 4.6 0% 80% 80% +80.0pp
    Gemini 3 Flash Preview 0% 60% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 80% 100% 100% +20.0pp
    GPT-5.4 Mini 60% 100% 100% +40.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Modal Dialog | Vanilla JS | Modern
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 60% 60% +60.0pp
    Claude Opus 4.7 0% 100% 100% +100.0pp
    Claude Sonnet 4.6 0% 80% 100% +100.0pp
    Gemini 3 Flash Preview 0% 40% 80% +80.0pp
    Gemini 3.1 Pro Preview 20% 80% 100% +80.0pp
    GPT-5.4 40% 100% 100% +60.0pp
    GPT-5.4 Mini 80% 80% 100% +20.0pp
    GPT-5.5 40% 100% 100% +60.0pp
    Modal Dialog | React | Dark
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 40% 40% +40.0pp
    Claude Opus 4.7 0% 100% 100% +100.0pp
    Claude Sonnet 4.6 0% 40% 40% +40.0pp
    Gemini 3 Flash Preview 0% 100% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 20% 100% 100% +80.0pp
    GPT-5.4 Mini 0% 100% 100% +100.0pp
    GPT-5.5 20% 80% 100% +80.0pp
    Modal Dialog | React | Modern
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 60% 80% +80.0pp
    Claude Opus 4.7 0% 100% 100% +100.0pp
    Claude Sonnet 4.6 0% 80% 80% +80.0pp
    Gemini 3 Flash Preview 0% 80% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 60% 100% +100.0pp
    GPT-5.4 20% 100% 100% +80.0pp
    GPT-5.4 Mini 80% 100% 100% +20.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Radio Button Group | Vanilla JS | Dark
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 60% 60% +60.0pp
    Claude Opus 4.7 0% 100% 100% +100.0pp
    Claude Sonnet 4.6 20% 60% 80% +60.0pp
    Gemini 3 Flash Preview 0% 80% 80% +80.0pp
    Gemini 3.1 Pro Preview 0% 80% 80% +80.0pp
    GPT-5.4 0% 100% 100% +100.0pp
    GPT-5.4 Mini 20% 100% 100% +80.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Radio Button Group | Vanilla JS | Modern
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 80% 80% +80.0pp
    Claude Opus 4.7 0% 100% 100% +100.0pp
    Claude Sonnet 4.6 0% 100% 100% +100.0pp
    Gemini 3 Flash Preview 0% 60% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 0% 100% 100% +100.0pp
    GPT-5.4 Mini 0% 100% 100% +100.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Radio Button Group | React | Dark
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 60% 60% +60.0pp
    Claude Opus 4.7 60% 100% 100% +40.0pp
    Claude Sonnet 4.6 0% 60% 60% +60.0pp
    Gemini 3 Flash Preview 0% 40% 40% +40.0pp
    Gemini 3.1 Pro Preview 20% 80% 100% +80.0pp
    GPT-5.4 0% 100% 100% +100.0pp
    GPT-5.4 Mini 20% 100% 100% +80.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Radio Button Group | React | Modern
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 60% 80% +80.0pp
    Claude Opus 4.7 40% 100% 100% +60.0pp
    Claude Sonnet 4.6 0% 60% 60% +60.0pp
    Gemini 3 Flash Preview 0% 100% 80% +80.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 0% 100% 100% +100.0pp
    GPT-5.4 Mini 20% 100% 100% +80.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Shopping Home Page | Vanilla JS | Dark
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 80% 80% +80.0pp
    Claude Opus 4.7 0% 60% 60% +60.0pp
    Claude Sonnet 4.6 0% 40% 80% +80.0pp
    Gemini 3 Flash Preview 0% 100% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 80% 80% +80.0pp
    GPT-5.4 0% 100% 100% +100.0pp
    GPT-5.4 Mini 0% 100% 100% +100.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Shopping Home Page | Vanilla JS | Modern
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 20% 20% +20.0pp
    Claude Opus 4.7 0% 60% 60% +60.0pp
    Claude Sonnet 4.6 0% 20% 40% +40.0pp
    Gemini 3 Flash Preview 0% 100% 80% +80.0pp
    Gemini 3.1 Pro Preview 0% 80% 80% +80.0pp
    GPT-5.4 0% 100% 100% +100.0pp
    GPT-5.4 Mini 0% 100% 100% +100.0pp
    GPT-5.5 0% 80% 100% +100.0pp
    Shopping Home Page | React | Dark
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 40% 40% +40.0pp
    Claude Opus 4.7 0% 80% 80% +80.0pp
    Claude Sonnet 4.6 0% 80% 100% +100.0pp
    Gemini 3 Flash Preview 0% 80% 80% +80.0pp
    Gemini 3.1 Pro Preview 0% 60% 100% +100.0pp
    GPT-5.4 0% 100% 100% +100.0pp
    GPT-5.4 Mini 0% 80% 80% +80.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Shopping Home Page | React | Modern
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 20% 20% +20.0pp
    Claude Opus 4.7 0% 60% 60% +60.0pp
    Claude Sonnet 4.6 0% 20% 40% +40.0pp
    Gemini 3 Flash Preview 0% 40% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 0% 80% 100% +100.0pp
    GPT-5.4 Mini 0% 100% 100% +100.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Simple Contact Form | Vanilla JS | Dark | Error Message Present
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 0% 0% +0.0pp
    Claude Opus 4.7 100% 100% 100% +0.0pp
    Claude Sonnet 4.6 0% 80% 80% +80.0pp
    Gemini 3 Flash Preview 0% 100% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 80% 100% +100.0pp
    GPT-5.4 0% 80% 80% +80.0pp
    GPT-5.4 Mini 40% 80% 80% +40.0pp
    GPT-5.5 40% 60% 80% +40.0pp
    Simple Contact Form | Vanilla JS | Dark | No Error Message
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 40% 40% +40.0pp
    Claude Opus 4.7 0% 60% 60% +60.0pp
    Claude Sonnet 4.6 0% 80% 80% +80.0pp
    Gemini 3 Flash Preview 0% 80% 80% +80.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 20% 20% 20% +0.0pp
    GPT-5.4 Mini 0% 80% 80% +80.0pp
    GPT-5.5 20% 0% 0% -20.0pp
    Simple Contact Form | Vanilla JS | Modern | Error Message Present
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 20% 20% +20.0pp
    Claude Opus 4.7 80% 100% 100% +20.0pp
    Claude Sonnet 4.6 20% 80% 80% +60.0pp
    Gemini 3 Flash Preview 0% 80% 60% +60.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 80% 100% 100% +20.0pp
    GPT-5.4 Mini 0% 60% 60% +60.0pp
    GPT-5.5 0% 40% 80% +80.0pp
    Simple Contact Form | Vanilla JS | Modern | No Error Message
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 60% 60% +60.0pp
    Claude Opus 4.7 0% 100% 100% +100.0pp
    Claude Sonnet 4.6 20% 100% 100% +80.0pp
    Gemini 3 Flash Preview 0% 80% 80% +80.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 0% 80% 80% +80.0pp
    GPT-5.4 Mini 40% 60% 60% +20.0pp
    GPT-5.5 0% 60% 60% +60.0pp
    Simple Contact Form | React | Dark | Error Message Present
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 80% 80% +80.0pp
    Claude Opus 4.7 0% 100% 100% +100.0pp
    Claude Sonnet 4.6 0% 60% 80% +80.0pp
    Gemini 3 Flash Preview 0% 80% 80% +80.0pp
    Gemini 3.1 Pro Preview 0% 80% 80% +80.0pp
    GPT-5.4 0% 40% 40% +40.0pp
    GPT-5.4 Mini 20% 100% 100% +80.0pp
    GPT-5.5 80% 40% 40% -40.0pp
    Simple Contact Form | React | Dark | No Error Message
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 20% 40% +40.0pp
    Claude Opus 4.7 40% 100% 100% +60.0pp
    Claude Sonnet 4.6 0% 100% 100% +100.0pp
    Gemini 3 Flash Preview 0% 60% 80% +80.0pp
    Gemini 3.1 Pro Preview 0% 80% 80% +80.0pp
    GPT-5.4 40% 80% 80% +40.0pp
    GPT-5.4 Mini 40% 100% 100% +60.0pp
    GPT-5.5 40% 20% 20% -20.0pp
    Simple Contact Form | React | Modern | Error Message Present
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 60% 60% +60.0pp
    Claude Opus 4.7 40% 100% 100% +60.0pp
    Claude Sonnet 4.6 20% 80% 100% +80.0pp
    Gemini 3 Flash Preview 0% 80% 80% +80.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 20% 80% 80% +60.0pp
    GPT-5.4 Mini 40% 100% 100% +60.0pp
    GPT-5.5 80% 80% 80% +0.0pp
    Simple Contact Form | React | Modern | No Error Message
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 20% 20% +20.0pp
    Claude Opus 4.7 0% 80% 80% +80.0pp
    Claude Sonnet 4.6 20% 100% 100% +80.0pp
    Gemini 3 Flash Preview 0% 80% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 80% 80% +80.0pp
    GPT-5.4 0% 100% 100% +100.0pp
    GPT-5.4 Mini 40% 80% 80% +40.0pp
    GPT-5.5 40% 80% 80% +40.0pp
    Single Checkbox | Vanilla JS | Dark
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 60% 60% +60.0pp
    Claude Opus 4.7 40% 100% 100% +60.0pp
    Claude Sonnet 4.6 0% 80% 80% +80.0pp
    Gemini 3 Flash Preview 0% 100% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 0% 100% 100% +100.0pp
    GPT-5.4 Mini 0% 100% 100% +100.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Single Checkbox | Vanilla JS | Modern
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 40% 40% +40.0pp
    Claude Opus 4.7 20% 100% 100% +80.0pp
    Claude Sonnet 4.6 40% 80% 80% +40.0pp
    Gemini 3 Flash Preview 0% 100% 100% +100.0pp
    Gemini 3.1 Pro Preview 0% 100% 100% +100.0pp
    GPT-5.4 20% 100% 100% +80.0pp
    GPT-5.4 Mini 0% 100% 100% +100.0pp
    GPT-5.5 0% 100% 100% +100.0pp
    Single Checkbox | React | Dark
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 40% 40% +40.0pp
    Claude Opus 4.7 20% 100% 100% +80.0pp
    Claude Sonnet 4.6 0% 60% 80% +80.0pp
    Gemini 3 Flash Preview 0% 80% 100% +100.0pp
    Gemini 3.1 Pro Preview 40% 80% 100% +60.0pp
    GPT-5.4 40% 100% 100% +60.0pp
    GPT-5.4 Mini 0% 100% 100% +100.0pp
    GPT-5.5 20% 100% 100% +80.0pp
    Single Checkbox | React | Modern
    Model Control* Generate* Review* Δ last vs control*
    Claude Haiku 4.5 0% 60% 40% +40.0pp
    Claude Opus 4.7 20% 100% 100% +80.0pp
    Claude Sonnet 4.6 0% 80% 100% +100.0pp
    Gemini 3 Flash Preview 0% 80% 100% +100.0pp
    Gemini 3.1 Pro Preview 20% 100% 100% +80.0pp
    GPT-5.4 0% 100% 80% +80.0pp
    GPT-5.4 Mini 0% 100% 100% +100.0pp
    GPT-5.5 20% 100% 100% +80.0pp

    * Pass rate reflects only this harness's automated checks (a curated set of axe-core WCAG rules plus hand-written assertions per test case). Automated testing can detect only a subset of accessibility issues: 100% here means the sample passed every check that was run, not that the page is WCAG conformant or fully accessible.

    Skill details

    Each skill's mounted package, sandbox location, and per-turn prompt templates.

    Building Accessible UI

    A skill that implicitly steers generation toward accessible HTML and can be explicitly invoked to review and remediate previously produced HTML.

    Full skill

    Samples per (test, model): 5

    Skill package: ../../config/skills/building-accessible-ui

    Turn prompts

    1. Generate (generate)
      {{test_case_prompt}}
      
    2. Review (review)
      Add and run accessibility tests, review results, and remediate the HTML.
      Fix any real accessibility issues you find. Leave correct markup
      alone. Submit one corrected standalone HTML document as your final
      answer. Do not wrap it in markdown fences or add commentary.
      

    SKILL.md

    ---
    name: building-accessible-ui
    description: MUST BE USED for any UI work. Invoke this skill before generating, modifying, or reviewing any code that renders, styles, or wires up a user-facing interface — including markup, components, templates, styles, and the JavaScript/TypeScript that drives them. This skill encodes the accessibility (WCAG 2.2 AA) requirements every UI change must satisfy; skipping it produces inaccessible output. Applies across web, mobile, and desktop. If the task touches the UI layer in any way, use this skill first.
    ---
    
    # building-accessible-ui
    
    Checklist for producing and reviewing accessible UIs. Each rule leads with the platform-agnostic principle and, where relevant, the Web (HTML + ARIA + CSS) implementation. Apply the web guidance only when the output is web.
    
    Detailed rationale lives in `references/`; widget-specific guidance in `components/`. **Open a file only when it's relevant to the current task.** Do not preload. Every file opened and every line a tool prints stays in context — don't re-read.
    
    ## Accessibility constitution
    
    Ground rules. Use them to resolve conflicts and decide how much custom work is justified. The checklist below is their mechanical application.
    
    ### 1. Accessibility is a core outcome
    
    A UI inaccessible to realistic users is not "done". Treat accessibility as a first-class criterion alongside correctness, performance, and security — not a finishing step. When scope must be cut, record the gap explicitly. Never claim output is "fully accessible"; state what was addressed and known limitations.
    
    ### 2. Build for real people
    
    Evaluate designs against these personas; if a decision breaks one, justify it and offer an alternative:
    
    - **Screen reader** — landmarks, headings, accessible name/role/state, reading order.
    - **Keyboard-only** — Tab/arrows/Enter/Space/Escape, visible focus, no traps.
    - **Low-vision** — zoom, reflow, contrast, Forced Colors.
    - **Cognitive** — plain language, clear labels, actionable errors, forgiving interactions.
    - **Deaf / hard of hearing** — captions/transcripts; no sound-only cues.
    - **Motor / voice / switch** — large hit targets, named controls, no precise/timed gestures.
    - **Situational** — sunlight, one-handed, noisy, flaky network.
    
    ### 3. Implementation priority
    
    Use the highest option that fits:
    
    1. Existing accessible component in this codebase / design system.
    2. A component library.
    3. Native platform semantics (`<button>`, `<a href>`, `<input>`, `<label>`, `<fieldset>`/`<legend>`, `<dialog>`, `<details>`, `<nav>`, `<main>`, headings).
    4. Native element + minimum necessary ARIA (`aria-describedby`, `aria-expanded`, `aria-current`, etc.).
    5. Fully custom ARIA widget — only when nothing above fits, and only if you implement the APG keyboard, focus, and state behavior end-to-end.
    
    No ARIA is better than bad ARIA. Don't duplicate native semantics (no `role="button"` on `<button>`). Don't use `role="menu"` for site navigation. Don't invent new patterns when a standard one exists.
    
    ### 4. Balance, don't trade away
    
    Accessibility, performance, security/privacy, and visual design are joint constraints — not dials to trade off. If an optimization removes a label, breaks focus, or hides content from AT, redesign the optimization. Accessible names must not leak secrets, but security is not a reason to ship an unlabeled control — find a labeling approach that doesn't leak data. Visual polish doesn't justify removing focus indicators or semantic structure. Under schedule pressure, prefer cutting scope over shipping an inaccessible feature. When constraints genuinely conflict, surface it explicitly.
    
    ### 5. Respect existing code
    
    Don't rewrite an existing component or shared utility just because it could be more accessible — other code depends on it. When you see issues outside the current task's scope: note them (issue, affected persona, suggested fix) and ask before changing. Fix in place only when the change is required by the task, localized, and low-risk. Inside scope, fix real issues; never silently remove existing affordances (labels, landmarks, focus management, live regions) without an equal-or-better replacement.
    
    ## How to use this checklist
    
    Identify which components the request involves (form, checkbox group, radio group, disclosure, modal, full view, etc.) and open the matching `components/<name>.md` once. Then work the checklist below. Open a `references/*.md` only when an item is unclear or you need the concrete fix pattern.
    
    Do not claim the output is "fully accessible". State what was addressed and known limitations.
    
    **Do NOT use this skill for:** backend-only changes, data migrations, build/CI configuration, non-UI tests, or tasks that do not touch the UI layer.
    
    ## Checklist
    
    - **Prefer existing components.** If available, reuse existing UI components rather than creating new ones from scratch or custom implementations.
    - **Platform-native semantics.** Prefer native platform controls and structures over custom constructs; add accessibility overrides only when a native control genuinely can't be used. → `references/structure.md`.
      - **Web:** Prefer semantic HTML (`<button>`, `<a>`, `<input>`, `<label>`, `<fieldset>`/`<legend>`, `<nav>`, `<main>`, `<header>`, `<footer>`, `<h1>`–`<h6>`) over `<div>`/`<span>` with ARIA. Use ARIA only when no native element fits.
    - **Regions / landmarks.** View structure is exposed via semantic regions/landmarks; duplicated landmarks have unique accessible names.
      - **Web:** Exactly one `<main>`; `<header>`, `<nav>`, `<footer>` used when applicable.
    - **Headings.** Logical outline labels sections without skipping levels; one top-level heading per view. → `references/structure.md`.
      - **Web:** One `<h1>`, typically the first heading in `<main>`. Set a descriptive `<title>`.
    - **Bypass blocks on web pages.** Provide a mechanism to skip repeated navigation when delivering traditional web pages. (Not required for Electron or non-web surfaces.) → `references/keyboard-focus.md`.
      - **Web:** A "Skip to main content" link as the first focusable element
    - **Name / role / value.** Every interactive element exposes an accurate accessible name; role matches purpose; dynamic states (pressed, expanded, selected, checked, disabled, invalid) stay in sync with visuals.
      - **Web:** Prefer native attributes over ARIA. If necessary, use the minimum ARIA needed and update state attributes alongside DOM/visual changes.
    - **Name-label match.** The accessible name of each interactive element contains the visible label text.
      - **Web:** If `aria-label` is used, include the visible label text. For multiple controls that share a label (e.g., "Remove"), add context ("Remove item: Socks").
    - **Labels and help text.** Every form control has a programmatic label describing its purpose; help/error text is programmatically associated with its control. → `components/forms.md`.
      - **Web:** `<label for>` or wrapping `<label>`; never placeholder alone. Associate help/error via `aria-describedby` / `aria-errormessage`.
    - **Grouping.** Related options (checkboxes, radios) are grouped so their shared name is part of the accessible name of each option. Group-level help/error text is associated with the group itself — not with each option and not with an intermediate wrapper.
      - **Web:** `<fieldset>` with a `<legend>`. Put `aria-describedby` on the `<fieldset>` (not on a child `<div>`, and never on an extra `<div role="group">` inside the fieldset — `<fieldset>` already is the group).
    - **Required fields.** Marked both visually and programmatically; not indicated by color alone.
      - **Web:** Use an asterisk to indicate required fields. Native `required` on the control or `aria-required="true"`.
    - **Keyboard operability.** Every interactive element is keyboard operable; tab order matches reading/visual order; expected keys work (activation, arrow keys inside composite widgets, Escape closes overlays); no keyboard traps; static content is not sequentially focusable.
      - **Web:** Do not remove focus outlines without equal-or-better replacement. Use `tabindex="-1"` only for elements that need programmatic (not sequential) focus. → `references/keyboard-focus.md`.
    - **Focus management.** Focus is always visible. Overlays/dialogs/disclosures move focus appropriately and restore it on close; no focus traps outside modals.
    - **Hidden content.** Content hidden from assistive technology is not focusable and is hidden consistently across visual, semantic, and focus layers.
      - **Web:** `hidden` / `display: none` / `aria-hidden="true"` used consistently.
    - **Graphics.** Informative graphics have meaningful text alternatives; decorative graphics are hidden from AT. → `references/images-graphics.md`.
      - **Web:** `<img>` informative → `alt`; decorative → `alt=""`. Informative `<svg>` → `role="img"` + accessible name. Other decorative → `aria-hidden="true"`. 
    - **Contrast.** Text ≥ 4.5:1 (3:1 large); focus indicators and key boundaries ≥ 3:1. Never color-only cues. → `references/contrast-forced-colors.md`.
    - **Respect OS accessibility settings.** Never override OS high contrast, reduced-motion, or color-scheme preferences; adapt to forced-colors / high-contrast. → `references/contrast-forced-colors.md`.
    - **Reflow.** Content adapts to narrow viewports (target 320 CSS px) without two-dimensional scrolling for multi-line text; controls remain operable. → `references/reflow.md`.
    - **Navigation.** Uses semantic navigation grouping with state-exposing toggles for expandable menus. → `references/navigation.md`.
      - **Web:** `<nav>`, not `role="menu"`; `aria-expanded` on triggers.
    - **Tables / grids.** Static tabular data uses table semantics with header/cell associations; interactive grids only when truly warranted. → `references/tables-grids.md`.
    - **Status messages.** Provide status messages for dynamic content updates that are relevant to the user (loading indicators, form submission results, etc.). → `references/status-messages.md`
      - **Web:** Use `aria-live="polite"` or `aria-live="assertive"`.
    - **Testing.** Add and run automated accessibility tests unless the project explicitly opts out. Writing or configuring a test is not enough — execution, fixes, and a result report are part of the deliverable. **The final automated test run must be on the exact artifact you submit: any edit after a passing test invalidates that test, so re-run before submitting.** **Open `references/testing.md` before writing any test code** for the opt-out signals, strategy precedence, runtime probe order, and reporting rules.
      - **Web:** Prefer `@axe-core/*` bindings that match the existing test runner; render the component/page fully so interactive state, focus, and live regions are evaluated.
      - **Other platforms:** Use the platform's native audit (Android `AccessibilityChecks`, iOS `XCUIAccessibilityAudit`, .NET `AccessibilityInsights`) under the same precedence.
    - **Specs/Documentation.** Follow the project's documentation pattern and document accessibility considerations for each view, component, and interaction. → `references/specs-documentation.md`.
    
    

    Detailed Results

    The detailed sample browser is split into per-test sections. Expand a test below to lazy-load its sample details.

    Checkbox Group | React | Dark

    Samples: 200 | Passes: 118 | Fails: 82

    Models: 8

    Open this panel to load the sample-level details.

    Checkbox Group | React | Modern

    Samples: 200 | Passes: 109 | Fails: 91

    Models: 8

    Open this panel to load the sample-level details.

    Checkbox Group | Vanilla JS | Dark

    Samples: 200 | Passes: 117 | Fails: 83

    Models: 8

    Open this panel to load the sample-level details.

    Checkbox Group | Vanilla JS | Modern

    Samples: 200 | Passes: 115 | Fails: 85

    Models: 8

    Open this panel to load the sample-level details.

    Disclosure Widget | React | Dark

    Samples: 200 | Passes: 143 | Fails: 57

    Models: 8

    Open this panel to load the sample-level details.

    Disclosure Widget | React | Modern

    Samples: 200 | Passes: 136 | Fails: 64

    Models: 8

    Open this panel to load the sample-level details.

    Disclosure Widget | Vanilla JS | Dark

    Samples: 200 | Passes: 160 | Fails: 40

    Models: 8

    Open this panel to load the sample-level details.

    Disclosure Widget | Vanilla JS | Modern

    Samples: 200 | Passes: 144 | Fails: 56

    Models: 8

    Open this panel to load the sample-level details.

    Modal Dialog | React | Dark

    Samples: 200 | Passes: 94 | Fails: 106

    Models: 8

    Open this panel to load the sample-level details.

    Modal Dialog | React | Modern

    Samples: 200 | Passes: 102 | Fails: 98

    Models: 8

    Open this panel to load the sample-level details.

    Modal Dialog | Vanilla JS | Dark

    Samples: 200 | Passes: 119 | Fails: 81

    Models: 8

    Open this panel to load the sample-level details.

    Modal Dialog | Vanilla JS | Modern

    Samples: 200 | Passes: 122 | Fails: 78

    Models: 8

    Open this panel to load the sample-level details.

    Radio Button Group | React | Dark

    Samples: 200 | Passes: 88 | Fails: 112

    Models: 8

    Open this panel to load the sample-level details.

    Radio Button Group | React | Modern

    Samples: 200 | Passes: 98 | Fails: 102

    Models: 8

    Open this panel to load the sample-level details.

    Radio Button Group | Vanilla JS | Dark

    Samples: 200 | Passes: 88 | Fails: 112

    Models: 8

    Open this panel to load the sample-level details.

    Radio Button Group | Vanilla JS | Modern

    Samples: 200 | Passes: 93 | Fails: 107

    Models: 8

    Open this panel to load the sample-level details.

    Shopping Home Page | React | Dark

    Samples: 200 | Passes: 99 | Fails: 101

    Models: 8

    Open this panel to load the sample-level details.

    Shopping Home Page | React | Modern

    Samples: 200 | Passes: 87 | Fails: 113

    Models: 8

    Open this panel to load the sample-level details.

    Shopping Home Page | Vanilla JS | Dark

    Samples: 200 | Passes: 106 | Fails: 94

    Models: 8

    Open this panel to load the sample-level details.

    Shopping Home Page | Vanilla JS | Modern

    Samples: 200 | Passes: 89 | Fails: 111

    Models: 8

    Open this panel to load the sample-level details.

    Simple Contact Form | React | Dark | Error Message Present

    Samples: 200 | Passes: 101 | Fails: 99

    Models: 8

    Open this panel to load the sample-level details.

    Simple Contact Form | React | Dark | No Error Message

    Samples: 200 | Passes: 98 | Fails: 102

    Models: 8

    Open this panel to load the sample-level details.

    Simple Contact Form | React | Modern | Error Message Present

    Samples: 200 | Passes: 120 | Fails: 80

    Models: 8

    Open this panel to load the sample-level details.

    Simple Contact Form | React | Modern | No Error Message

    Samples: 200 | Passes: 105 | Fails: 95

    Models: 8

    Open this panel to load the sample-level details.

    Simple Contact Form | Vanilla JS | Dark | Error Message Present

    Samples: 200 | Passes: 97 | Fails: 103

    Models: 8

    Open this panel to load the sample-level details.

    Simple Contact Form | Vanilla JS | Dark | No Error Message

    Samples: 200 | Passes: 76 | Fails: 124

    Models: 8

    Open this panel to load the sample-level details.

    Simple Contact Form | Vanilla JS | Modern | Error Message Present

    Samples: 200 | Passes: 109 | Fails: 91

    Models: 8

    Open this panel to load the sample-level details.

    Simple Contact Form | Vanilla JS | Modern | No Error Message

    Samples: 200 | Passes: 106 | Fails: 94

    Models: 8

    Open this panel to load the sample-level details.

    Single Checkbox | React | Dark

    Samples: 200 | Passes: 122 | Fails: 78

    Models: 8

    Open this panel to load the sample-level details.

    Single Checkbox | React | Modern

    Samples: 200 | Passes: 128 | Fails: 72

    Models: 8

    Open this panel to load the sample-level details.

    Single Checkbox | Vanilla JS | Dark

    Samples: 200 | Passes: 130 | Fails: 70

    Models: 8

    Open this panel to load the sample-level details.

    Single Checkbox | Vanilla JS | Modern

    Samples: 200 | Passes: 135 | Fails: 65

    Models: 8

    Open this panel to load the sample-level details.