Control summary
Control results show how well models produce accessible code with no instructions or prompts to specifically create accessible code. Models are ranked by WCAG pass rate across 4 test cases and 25 samples per test (100 samples per model). These tests do not comprehensively test all WCAG requirements, only a subset of the most common issues. WCAG failures may still exist even for passing tests.
| Model | Rank | WCAG Pass Rate* | Avg Total WCAG Failures | Avg Axe WCAG Failures | Avg Assertion WCAG Failures | Avg Best Practice Failures |
|---|---|---|---|---|---|---|
| GPT-5.2 | 1 | 41% | 8.94 | 8.47 | 0.47 | 3.57 |
| GPT-5 Mini | 2 | 30% | 3.93 | 3.07 | 0.86 | 3.30 |
| GPT-5.2 Codex | 3 | 23% | 4.04 | 2.52 | 1.52 | 3.80 |
| Gemini 3 Pro Preview | 4 | 2% | 5.88 | 4.06 | 1.82 | 8.47 |
| Grok 4 Fast Non-Reasoning | 5 | 0% | 4.16 | 1.76 | 2.40 | 6.57 |
| Gemini 3 Flash Preview | 6 | 0% | 4.24 | 1.84 | 2.40 | 5.15 |
| DeepSeek V3.2 | 7 | 0% | 9.96 | 7.96 | 2.00 | 4.56 |
| Claude Haiku 4.5 | 8 | 0% | 11.30 | 9.06 | 2.24 | 13.35 |
| Claude Sonnet 4.5 | 9 | 0% | 12.23 | 9.94 | 2.29 | 15.53 |
| Claude Opus 4.6 | 10 | 0% | 17.99 | 16.47 | 1.52 | 14.37 |
Pass@k aggregates
Pass@k estimates the probability that at least one of k randomly selected samples passes. This is computed from control samples only.
| Model | Samples | Passes | pass@1 | pass@5 | pass@10 |
|---|---|---|---|---|---|
| DeepSeek V3.2 | 25 | 0 | 0% | 0% | 0% |
| Claude Haiku 4.5 | 25 | 0 | 0% | 0% | 0% |
| Claude Opus 4.6 | 25 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.5 | 25 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 25 | 0 | 0% | 0% | 0% |
| Gemini 3 Pro Preview | 25 | 2 | 8% | 37% | 65% |
| GPT-5 Mini | 25 | 14 | 56% | 99% | 100% |
| GPT-5.2 | 25 | 12 | 48% | 98% | 100% |
| GPT-5.2 Codex | 25 | 23 | 92% | 100% | 100% |
| Grok 4 Fast Non-Reasoning | 25 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1 | pass@5 | pass@10 |
|---|---|---|---|---|---|
| DeepSeek V3.2 | 25 | 0 | 0% | 0% | 0% |
| Claude Haiku 4.5 | 25 | 0 | 0% | 0% | 0% |
| Claude Opus 4.6 | 25 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.5 | 25 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 25 | 0 | 0% | 0% | 0% |
| Gemini 3 Pro Preview | 25 | 0 | 0% | 0% | 0% |
| GPT-5 Mini | 25 | 9 | 36% | 92% | 100% |
| GPT-5.2 | 25 | 7 | 28% | 84% | 99% |
| GPT-5.2 Codex | 25 | 0 | 0% | 0% | 0% |
| Grok 4 Fast Non-Reasoning | 25 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1 | pass@5 | pass@10 |
|---|---|---|---|---|---|
| DeepSeek V3.2 | 25 | 0 | 0% | 0% | 0% |
| Claude Haiku 4.5 | 25 | 0 | 0% | 0% | 0% |
| Claude Opus 4.6 | 25 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.5 | 25 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 25 | 0 | 0% | 0% | 0% |
| Gemini 3 Pro Preview | 25 | 0 | 0% | 0% | 0% |
| GPT-5 Mini | 25 | 2 | 8% | 37% | 65% |
| GPT-5.2 | 25 | 5 | 20% | 71% | 94% |
| GPT-5.2 Codex | 25 | 0 | 0% | 0% | 0% |
| Grok 4 Fast Non-Reasoning | 25 | 0 | 0% | 0% | 0% |
| Model | Samples | Passes | pass@1 | pass@5 | pass@10 |
|---|---|---|---|---|---|
| DeepSeek V3.2 | 25 | 0 | 0% | 0% | 0% |
| Claude Haiku 4.5 | 25 | 0 | 0% | 0% | 0% |
| Claude Opus 4.6 | 25 | 0 | 0% | 0% | 0% |
| Claude Sonnet 4.5 | 25 | 0 | 0% | 0% | 0% |
| Gemini 3 Flash Preview | 25 | 0 | 0% | 0% | 0% |
| Gemini 3 Pro Preview | 25 | 0 | 0% | 0% | 0% |
| GPT-5 Mini | 25 | 5 | 20% | 71% | 94% |
| GPT-5.2 | 25 | 17 | 68% | 100% | 100% |
| GPT-5.2 Codex | 25 | 0 | 0% | 0% | 0% |
| Grok 4 Fast Non-Reasoning | 25 | 0 | 0% | 0% | 0% |
Control analysis
This section summarizes where models perform well, where they struggle, and the most frequent types of accessibility issues observed across all samples.
Most common axe WCAG failures
| Rule | Impact | Failures | % of failures | Seen in models | Seen in test cases | Description |
|---|---|---|---|---|---|---|
| color-contrast | serious | 763 | 92.3% | 10 | 4 | Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds |
| link-name | serious | 28 | 3.4% | 3 | 1 | Ensure links have discernible text |
| aria-hidden-focus | serious | 12 | 1.5% | 2 | 2 | Ensure aria-hidden elements are not focusable nor contain focusable elements |
| aria-prohibited-attr | serious | 6 | 0.7% | 1 | 1 | Ensure ARIA attributes are not prohibited for an element's role |
| button-name | critical | 5 | 0.6% | 1 | 1 | Ensure buttons have discernible text |
| aria-required-children | critical | 3 | 0.4% | 1 | 1 | Ensure elements with an ARIA role that require child roles contain them |
| link-in-text-block | serious | 3 | 0.4% | 3 | 2 | Ensure links are distinguished from surrounding text in a way that does not rely on color |
| aria-required-parent | critical | 2 | 0.2% | 1 | 1 | Ensure elements with an ARIA role that require parent roles are contained by them |
| image-alt | critical | 2 | 0.2% | 1 | 1 | Ensure |
| aria-allowed-attr | critical | 1 | 0.1% | 1 | 1 | Ensure an element's role supports its ARIA attributes |
Most common axe best-practice failures
| Rule | Impact | Failures | % of failures | Seen in models | Seen in test cases | Description |
|---|---|---|---|---|---|---|
| region | moderate | 894 | 46.9% | 10 | 4 | Ensure all page content is contained by landmarks |
| landmark-one-main | moderate | 639 | 33.5% | 10 | 3 | Ensure the document has a main landmark |
| page-has-heading-one | moderate | 142 | 7.5% | 7 | 2 | Ensure that the page, or at least one of its frames contains a level-one heading |
| heading-order | moderate | 118 | 6.2% | 10 | 3 | Ensure the order of headings is semantically correct |
| aria-allowed-role | minor | 30 | 1.6% | 3 | 2 | Ensure role attribute has an appropriate value for the element |
| landmark-complementary-is-top-level | moderate | 24 | 1.3% | 2 | 1 | Ensure the complementary landmark or aside is at top level |
| landmark-unique | moderate | 19 | 1.0% | 3 | 3 | Ensure landmarks are unique |
| landmark-no-duplicate-banner | moderate | 14 | 0.7% | 2 | 3 | Ensure the document has at most one banner landmark |
| landmark-contentinfo-is-top-level | moderate | 10 | 0.5% | 2 | 1 | Ensure the contentinfo landmark is at top level |
| landmark-banner-is-top-level | moderate | 9 | 0.5% | 2 | 1 | Ensure the banner landmark is at top level |
Assertion-level patterns (per test case)
disclosure-widget
| Assertion | Type | Failure rate | Failures / total |
|---|---|---|---|
| All examples have a valid semantics | R | 54% | 134 / 250 |
| Collapsed content is hidden from assistive technology | R | 22% | 55 / 250 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
modal-dialog
| Assertion | Type | Failure rate | Failures / total |
|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 90% | 225 / 250 |
| Each modal dialog takes focus when opened | R | 80% | 201 / 250 |
| Each dialog can be closed by escape key | BP | 58% | 144 / 250 |
| Focus is not lost when each dialog closes | R | 57% | 142 / 250 |
| Each modal dialog traps keyboard focus | R | 53% | 132 / 250 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
shopping-home-page
| Assertion | Type | Failure rate | Failures / total |
|---|---|---|---|
| Has a single maincontent | R | 64% | 161 / 250 |
| Has a single footer | R | 6% | 14 / 250 |
| Has a single banner | R | 4% | 11 / 250 |
| Has at least one navigation | R | 4% | 11 / 250 |
| Has at least one h2 | R | 2% | 4 / 250 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
simple-contact-form
| Assertion | Type | Failure rate | Failures / total |
|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 87% | 218 / 250 |
| Helper text is programmatically associated | R | 82% | 204 / 250 |
| Required fields are indicated (visually and programmatically) | R | 44% | 109 / 250 |
| Each text input has an accessible name | R | 0% | 0 / 250 |
| Each text input has textbox role | R | 0% | 0 / 250 |
Assertion statistics are computed within this test case only and are not compared across different test cases.
Methodology
This report shows how well various LLMs generate accessible HTML.
- Each test uses a prompt to generate HTML. The generated HTML is then tested for accessibility.
- The prompts intentionally do not include specific accessibility instructions. The goal is to see if the LLMs produce accessible HTML by default.
- The resulting HTML is rendered in a browser via Playwright (Chromium). This allows the HTML's JavaScript and CSS to execute, which can impact accessibility.
- The rendered HTML is evaluated using axe-core to identify common accessibility issues.
- A custom test script (JavaScript) is executed against the rendered page to check for accessibility requirements that are specific to the test case and not covered by axe-core. These tests look for WCAG 2.2 failures and best practices. Best practices do not impact pass/fail results.
- Each test case is run multiple times (samples) to evaluate the consistency and reliability of the LLM's output.
- By default, the harness does not explicitly set a temperature, so each provider/model uses its own default sampling behavior.
- Instruction sets are also evaluated to see how specific accessibility instructions impact results. See the "Instruction sets" section for details.
System Prompt
You are generating a single standalone HTML document. Do NOT wrap output in markdown fences. Include <head> and <body>. Do NOT explain the code, just output it.
All tests are automatic and deterministic (no human intervention). Only a fraction of accessibility requirements in WCAG can be covered in this way. Many requirements still need a human to evaluate. As such, these tests are not comprehensive. Even if a test passes, it may still fail WCAG and contain serious accessibility issues.
Please leave feedback, review the source code, and contribute test cases, assertions, and other improvements at the GitHub Project.
Glossary
Column Definitions
- Rank: The position of the model when sorted by WCAG Pass Rate (lower is better).
- WCAG Pass Rate: The percentage of samples that passed all WCAG tests, including both axe-core WCAG checks and custom WCAG assertions. This does not include best practices.
- Avg Total WCAG Failures: The average number of total WCAG failures (axe-core + assertions) per sample for the model. This does not include best practices.
- Avg Axe WCAG Failures: The average number of axe-core detected WCAG failures per sample for the model. This does not include best practices.
- Avg Assertion WCAG Failures: The average number of custom WCAG assertion failures per sample for the model. This does not include best practices.
- Avg Best Practice Failures: The average number of best practice accessibility issues (informational only) per sample for the model. This includes axe-core best practices and best practice assertions.
Other Glossary Terms
- Assertion: A specific accessibility check defined in the test script. Each assertion checks for a particular accessibility requirement or best practice for the specific test case which is not already tested by axe.
- Axe-core: An open-source accessibility testing engine developed by Deque Systems. It is widely used for automated accessibility testing of web applications. Axe-core
- Pass@k: A metric that estimates the likelihood of at least one sample passing a test when k samples are randomly selected.
- WCAG: Web Content Accessibility Guidelines, a set of guidelines for making web content more accessible to people with disabilities.
- Test Case: A specific scenario designed to evaluate the accessibility of generated HTML content. Each test case includes a prompt, expected accessibility requirements, and a test script.
Change Log
2/2026 Update
- Test Cases: Added a test case for a simple contact form with assertions for simple form controls. Also fixed some minor bugs in other test cases.
- Instruction Sets: Added instruction set evaluation.
- Report: Updated report layout and added new sections for instruction sets and analysis. Also allow filtering by instruction set and specific assertions within test cases.
Instruction Benchmarks (vs Control)
These results show how well each instruction set performs vs the control configuration (averaged across models). Instruction sets contain specific guidance intended to improve accessibility and are appended to the system prompt.
Several instruction sets are used in this benchmark to help identify which instructions are most effective at improving accessibility. Models are ranked by average WCAG pass rate across all models and test cases for that instruction set.
Summary (ranked by avg WCAG pass rate)
| Rank | Instruction Set | Avg Control Pass Rate | Avg Instruction Set Pass Rate | Δ Avg Pass Rate |
|---|---|---|---|---|
| 1 | 2. Detailed Instructions | 10% | 58% | +48.4pp |
| 2 | 1. Basic | 10% | 46% | +36.9pp |
| 3 | 0. Minimal | 10% | 28% | +17.9pp |
Instruction benchmark details
This section includes per-model benchmark results and the full text of each instruction set.
Instruction sets
0. Minimal
Minimal reminder that all output must be accessible.
Variant samples per (test, model): 5
All output MUST be accessible.
1. Basic
Basic reminder that all output must be accessible (includes slightly more instructions than minimal).
Variant samples per (test, model): 5
All output MUST be accessible. Use semantic HTML first; only use ARIA when necessary, and ensure full keyboard support. Conform to [WCAG 2.2 Level AA](https://www.w3.org/TR/WCAG22/).
2. Detailed Instructions
Detailed instructions for accessibility.
Variant samples per (test, model): 5
# Accessibility instructions (detailed)
You are an expert in accessibility with deep software engineering expertise.
## Non-negotiables (MUST)
- Conform to [WCAG 2.2 Level AA](https://www.w3.org/TR/WCAG22/).
- Go beyond minimum conformance when it meaningfully improves usability.
- If the project uses a UI component library, you MUST use the component patterns as defined from the library. Do not recreate patterns.
- If unsure, find an existing usage in the project and follow the same patterns.
- Ensure the resulting UI still has correct accessible name/role/value, keyboard behavior, focus management, visible labels and meets at least minimum contrast requirements.
- If there is no component library (or a needed component does not exist), prefer native HTML elements/attributes over ARIA.
- Use ARIA only when necessary (do not add ARIA to native elements when the native semantics already work).
- Ensure correct accessible **name, role, value, states, and properties**.
- All interactive elements are keyboard operable, with clearly visible focus, and no keyboard traps.
- Do not claim the output is “fully accessible”.
## Inclusive language (MUST)
- Use respectful, inclusive, people-first language in any user-facing text.
- Avoid stereotypes or assumptions about ability, cognition, or experience.
## Cognitive load (SHOULD)
- Prefer plain language.
- Use consistent page structure (landmarks).
- Keep navigation order consistent.
- Keep the interface clean and simple (avoid unnecessary distractions).
## Structure and semantics
### Page structure (MUST)
- Use landmarks (`header`, `nav`, `main`, `footer`) appropriately.
- Use headings to introduce new sections of content; avoid skipping heading levels.
- Prefer one `h1` for the page topic. Generally, the first heading within the `main` element / landmark.
### Page title (SHOULD)
- Set a descriptive `<title>`.
- Prefer: “Unique page - section - site”.
## Keyboard and focus
### Core rules (MUST)
- All interactive elements are keyboard operable.
- Tab order follows reading order and is predictable.
- Focus is always visible.
- Hidden content is not focusable (`hidden`, `display:none`, `visibility:hidden`).
- If content is hidden to assistive technology by using `aria-hidden=true` then that content, nor any of its descendants, can be focusable.
- Static content MUST NOT be tabbable.
- Exception: if an element needs programmatic focus, use `tabindex="-1"`.
### Skip link / bypass blocks (MUST)
Provide a skip link as the first focusable element.
```html
<header>
<a href="#maincontent" class="sr-only">Skip to main content</a>
<!-- header content -->
</header>
<nav>
<!-- navigation -->
</nav>
<main id="maincontent" tabindex="-1">
<h1><!-- page title --></h1>
<!-- content -->
</main>
```
```css
.sr-only:not(:focus):not(:active) {
clip: rect(0 0 0 0);
clip-path: inset(50%);
height: 1px;
overflow: hidden;
position: absolute;
white-space: nowrap;
width: 1px;
}
```
### Composite widgets (SHOULD)
If a component uses arrow-key navigation within itself (tabs, listbox, menu-like UI, grid/date picker):
- Provide one tab stop for the composite container or one child.
- Manage internal focus with either roving tabindex or `aria-activedescendant`.
Roving tabindex (SHOULD):
- Exactly one focusable item has `tabindex="0"`; all others are `-1`.
- Arrow keys move focus by swapping tabindex and calling `.focus()`.
`aria-activedescendant` (SHOULD):
- Container is implicitly focusable or has `tabindex="0"` and `aria-activedescendant="IDREF"`.
- Arrow keys update `aria-activedescendant`.
## Low vision and contrast (MUST)
### Contrast requirements (MUST)
- Text contrast: at least 4.5:1 (large text: 3:1).
- Large text is at least 24px regular or 18.66px bold.
- Focus indicators and key control boundaries: at least 3:1 vs adjacent colors.
- Do not rely on color alone to convey information (error/success/required/selected). Provide text and/or icons with accessible names.
### Color generation rules (MUST)
- Do not invent arbitrary colors.
- Use project-approved design tokens (CSS variables).
- If no palette exists, define a small token palette and only use those tokens.
- Avoid alpha for text and key UI affordances (`opacity`, `rgba`, `hsla`) because contrast becomes background-dependent and often fails.
- Ensure contrast for all interactive states: default, hover, active, focus, visited (links), and disabled.
### Safe defaults when unsure (SHOULD)
- Prefer very dark text on very light backgrounds, or the reverse.
- Avoid mid-gray text on white; muted text should still meet 4.5:1.
### Tokenized palette contract (SHOULD)
- Define and use tokens like: `--color-bg`, `--color-text`, `--color-muted-text`, `--color-link`, `--color-border`, `--color-focus`, `--color-danger`, `--color-success`.
- Only assign UI colors via these tokens (avoid scattered inline hex values).
### Verification (MUST)
Contrast verification is covered by the Final verification checklist.
## High contrast / forced colors mode (MUST)
### Support OS-level accessibility features (MUST)
- Never override or disrupt OS accessibility settings.
- The UI MUST adapt to High Contrast / Forced Colors mode automatically.
- Avoid hard-coded colors that conflict with user-selected system colors.
### Use the `forced-colors` media query when needed (SHOULD)
Use `@media (forced-colors: active)` only when system defaults are not sufficient.
```css
@media (forced-colors: active) {
/* Example: Replace box-shadow (suppressed in forced-colors) with a border */
.button {
border: 2px solid ButtonBorder;
}
}
/* if using box-shadow for a focus style, also use a transparent outline
so that the outline will render when the high contrast setting is enabled */
.btn:focus {
box-shadow: 0 0 4px 3px rgba(90, 50, 200, .7);
outline: 2px solid transparent;
}
```
In Forced Colors mode, avoid relying on:
- Box shadows
- Decorative gradients
### Respect user color schemes in forced colors (MUST)
- Use system color keywords (e.g., `ButtonText`, `ButtonBorder`, `CanvasText`, `Canvas`).
- Do not use fixed hex/RGB colors inside `@media (forced-colors: active)`.
### Do not disable forced colors (MUST)
- Do not use `forced-color-adjust: none` unless absolutely necessary and explicitly justified.
- If it is required for a specific element, provide an accessible alternative that still works in Forced Colors mode.
### Icons (MUST)
- Icons MUST adapt to text color.
- Prefer `currentColor` for SVG icon fills/strokes; avoid embedding fixed colors inside SVGs.
```css
svg {
fill: currentColor;
stroke: currentColor;
}
```
## Reflow (WCAG 2.2 SC 1.4.10) (MUST)
### Goal (MUST)
Multi-line text must be able to fit within 320px wide containers or viewports, so that users do not need to scroll in two-dimensions to read sections of content.
### Core principles (MUST)
- Preserve information and function: nothing essential is removed, obscured, or truncated.
- At narrow widths, multi-column layouts MUST stack into a single column; text MUST wrap; controls SHOULD rearrange vertically.
- Users MUST NOT need to scroll left/right to read multi-line text.
- If content is collapsed in the narrow layout, the full content/function MUST be available within 1 click (e.g., overflow menu, dialog, tooltip).
### Engineering requirements (MUST)
- Use responsive layout primitives (`flex`, `grid`) with fluid sizing; enable text wrapping.
- Avoid fixed widths that force two-dimensional scrolling at 320px.
- Avoid absolute positioning and `overflow: hidden` when it causes content loss, or would result in the obscuring of content at smaller viewport sizes.
- Media and containers SHOULD NOT overflow the viewport at 320px (for example, prefer `max-width: 100%` for images/video/canvas/iframes).
- In flex/grid layouts, ensure children can shrink/wrap (common fix: `min-width: 0` on flex/grid children).
- Handle long strings (URLs, tokens) without forcing overflow (common fix: `overflow-wrap: anywhere` or equivalent).
- Ensure all interactive elements remain visible, reachable, and operable at 320px.
### Exceptions (SHOULD)
If a component truly requires a two-dimensional layout for meaning/usage (e.g., large data tables, maps, diagrams, charts, games, presentations), allow horizontal scrolling only at the component level.
- The page as a whole MUST still reflow (unless the page layout truely requires two-dimensional layout for usage).
- The component MUST remain fully usable (all content reachable; controls operable).
## Controls and labels
### Visible labels (MUST)
- Every interactive element has a visible label.
- The label cannot disappear while entering text or after the field has a value.
### Voice access (MUST)
- The accessible name of each interactive element MUST contain the visible label.
- If using `aria-label`, include the visual label text.
- If multiple controls share the same visible label (e.g., many “Remove” buttons), use an `aria-label` that keeps the visible label text and adds context (e.g., “Remove item: Socks”).
## Forms
### Labels and help text (MUST)
- Every form control has a programmatic label.
- Prefer `<label for="...">`.
- Labels describe the input purpose.
- If help text exists, associate it with `aria-describedby`.
### Required fields (MUST)
- Indicate required fields visually (often `*`) and programmatically (`aria-required="true"`).
### Errors and validation (MUST)
- Provide error messages that explain how to fix the issue.
- Use `aria-invalid="true"` for invalid fields; remove it when valid.
- Associate inline errors with the field via `aria-describedby`.
- Submit buttons SHOULD NOT be disabled solely to prevent submission.
- On submit with invalid input, focus the first invalid control.
## Graphics and images
All graphics include `img`, `svg`, icon fonts, and emojis.
- Informative graphics MUST have meaningful alternatives.
- `img`: use `alt`.
- `svg`: prefer `role="img"` and `aria-label`/`aria-labelledby`.
- Decorative graphics MUST be hidden.
- `img`: `alt=""`.
- Other: `aria-hidden="true"`.
## Navigation and menus
- Use semantic navigation: `<nav>` with lists and links.
- Do not use `role="menu"` / `role="menubar"` for site navigation.
- For expandable navigation:
- Include button elements to toggle navigation and/or sub-navigations. Use `aria-expanded` on the button to indicate state.
- `Escape` MAY close open sub-navigations.
## Tables and grids
### Tables for static data (MUST)
- Use `<table>` for static tabular data.
- Use `<th>` to associate headers.
- Column headers are in the first row.
- Row headers (when present) use `<th>` in each row.
### Grids for dynamic UIs (SHOULD)
- Use grid roles only for truly interactive/dynamic experiences.
- If using `role="grid"`, grid cells MUST be nested in rows so header/cell relationships are determinable.
- Use arrow navigation to navigate within the grid.
## Final verification checklist (MUST)
Before finalizing output, explicitly verify:
- Structure and semantics: landmarks, headings, and one `h1` for the page topic.
- Keyboard and focus: operable controls, visible focus, predictable tab order, no traps, skip link works.
- Controls and labels: visible labels present and included in accessible names.
- Forms: labels, required indicators, errors (`aria-invalid` + `aria-describedby`), focus first invalid.
- Contrast: meets 4.5:1 / 3:1 thresholds, focus/boundaries meet 3:1, color not the only cue.
- Forced colors: does not break OS High Contrast / Forced Colors; uses system colors in `forced-colors: active`.
- Reflow: sections of content should be able to adjust to 320px width without the need for two-dimensional scrolling to read multi-line text; no content loss; controls remain operable.
- Graphics: informative alternatives; decorative graphics hidden.
- Tables/grids: tables use `<th>`; grids (when needed) are structured with rows and cells.
## Final note
Generate the HTML with accessibility in mind, but accessibility issues may still exist; manual review and testing (for example with Accessibility Insights) is still recommended.
Results
| Model | Instruction Set | Control Pass Rate | Instruction Set Pass Rate | Δ Pass Rate |
|---|---|---|---|---|
| Claude Haiku 4.5 | 0. Minimal | 0% | 0% | +0.0pp |
| Claude Haiku 4.5 | 1. Basic | 0% | 10% | +10.0pp |
| Claude Haiku 4.5 | 2. Detailed Instructions | 0% | 25% | +25.0pp |
| Claude Opus 4.6 | 0. Minimal | 0% | 15% | +15.0pp |
| Claude Opus 4.6 | 1. Basic | 0% | 35% | +35.0pp |
| Claude Opus 4.6 | 2. Detailed Instructions | 0% | 80% | +80.0pp |
| Claude Sonnet 4.5 | 0. Minimal | 0% | 5% | +5.0pp |
| Claude Sonnet 4.5 | 1. Basic | 0% | 15% | +15.0pp |
| Claude Sonnet 4.5 | 2. Detailed Instructions | 0% | 60% | +60.0pp |
| DeepSeek V3.2 | 0. Minimal | 0% | 0% | +0.0pp |
| DeepSeek V3.2 | 1. Basic | 0% | 5% | +5.0pp |
| DeepSeek V3.2 | 2. Detailed Instructions | 0% | 35% | +35.0pp |
| GPT-5 Mini | 0. Minimal | 30% | 55% | +25.0pp |
| GPT-5 Mini | 1. Basic | 30% | 75% | +45.0pp |
| GPT-5 Mini | 2. Detailed Instructions | 30% | 55% | +25.0pp |
| GPT-5.2 | 0. Minimal | 41% | 65% | +24.0pp |
| GPT-5.2 | 1. Basic | 41% | 90% | +49.0pp |
| GPT-5.2 | 2. Detailed Instructions | 41% | 95% | +54.0pp |
| GPT-5.2 Codex | 0. Minimal | 23% | 40% | +17.0pp |
| GPT-5.2 Codex | 1. Basic | 23% | 90% | +67.0pp |
| GPT-5.2 Codex | 2. Detailed Instructions | 23% | 85% | +62.0pp |
| Gemini 3 Flash Preview | 0. Minimal | 0% | 50% | +50.0pp |
| Gemini 3 Flash Preview | 1. Basic | 0% | 80% | +80.0pp |
| Gemini 3 Flash Preview | 2. Detailed Instructions | 0% | 75% | +75.0pp |
| Gemini 3 Pro Preview | 0. Minimal | 2% | 45% | +43.0pp |
| Gemini 3 Pro Preview | 1. Basic | 2% | 65% | +63.0pp |
| Gemini 3 Pro Preview | 2. Detailed Instructions | 2% | 65% | +63.0pp |
| Grok 4 Fast Non-Reasoning | 0. Minimal | 0% | 0% | +0.0pp |
| Grok 4 Fast Non-Reasoning | 1. Basic | 0% | 0% | +0.0pp |
| Grok 4 Fast Non-Reasoning | 2. Detailed Instructions | 0% | 5% | +5.0pp |
Instruction set analysis vs control
This section highlights where each instruction set helped (or hurt) compared to the control, aggregated across all samples for that instruction set.
0. Minimal — overall Δ pass rate +17.9pp
Overall: Control 10% (n=1000) → Variant 28% (n=200). Avg WCAG failures/sample: 8.27 → 5.71 (Δ -2.56).
Most improved test cases
| Test case | Control pass rate | Variant pass rate | Δ pass rate | Δ avg WCAG failures |
|---|---|---|---|---|
| disclosure-widget | 20% | 44% | +23.6pp | -0.36 |
| simple-contact-form | 9% | 30% | +21.2pp | -1.72 |
| modal-dialog | 6% | 24% | +17.6pp | -2.82 |
| shopping-home-page | 3% | 12% | +9.2pp | -5.35 |
Most reduced axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| color-contrast | 76.3% | 53.0% | -23.3pp | Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds |
| link-name | 2.8% | 1.0% | -1.8pp | Ensure links have discernible text |
| aria-hidden-focus | 1.2% | 0.5% | -0.7pp | Ensure aria-hidden elements are not focusable nor contain focusable elements |
| aria-prohibited-attr | 0.6% | 0.0% | -0.6pp | Ensure ARIA attributes are not prohibited for an element's role |
| link-in-text-block | 0.3% | 0.0% | -0.3pp | Ensure links are distinguished from surrounding text in a way that does not rely on color |
Most increased axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| list | 0.0% | 0.5% | +0.5pp | Ensure that lists are structured correctly |
| nested-interactive | 0.0% | 0.5% | +0.5pp | Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies |
| aria-required-children | 0.3% | 0.5% | +0.2pp | Ensure elements with an ARIA role that require child roles contain them |
Assertion analysis (vs control)
Failure rates are computed per assertion (within each test case) and compared between the variant and control.
Most improved assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| modal-dialog | Each modal dialog takes focus when opened | R | 80% | 8% | -72.4pp | 201 / 250 | 4 / 50 |
| shopping-home-page | Has a single maincontent | R | 64% | 2% | -62.4pp | 161 / 250 | 1 / 50 |
| simple-contact-form | Helper text is programmatically associated | R | 82% | 22% | -59.6pp | 204 / 250 | 11 / 50 |
| modal-dialog | Focus is not lost when each dialog closes | R | 57% | 0% | -56.8pp | 142 / 250 | 0 / 50 |
| modal-dialog | Each dialog can be closed by escape key | BP | 58% | 2% | -55.6pp | 144 / 250 | 1 / 50 |
| modal-dialog | Each modal dialog traps keyboard focus | R | 53% | 0% | -52.8pp | 132 / 250 | 0 / 50 |
| disclosure-widget | All examples have a valid semantics | R | 54% | 2% | -51.6pp | 134 / 250 | 1 / 50 |
| modal-dialog | Each dialog has a dialog role | R | 52% | 2% | -50.4pp | 131 / 250 | 1 / 50 |
| simple-contact-form | Required fields are indicated (visually and programmatically) | R | 44% | 8% | -35.6pp | 109 / 250 | 4 / 50 |
| simple-contact-form | Inputs use appropriate autocomplete for purpose | R | 87% | 58% | -29.2pp | 218 / 250 | 29 / 50 |
Most regressed assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| disclosure-widget | Collapsed content is hidden from assistive technology | R | 22% | 54% | +32.0pp | 55 / 250 | 27 / 50 |
| shopping-home-page | Has a single banner | R | 4% | 8% | +3.6pp | 11 / 250 | 4 / 50 |
All assertion deltas (per test case)
disclosure-widget
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from assistive technology | R | 22% | 54% | +32.0pp | 55 / 250 | 27 / 50 |
| All examples have a valid semantics | R | 54% | 2% | -51.6pp | 134 / 250 | 1 / 50 |
modal-dialog
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 90% | 74% | -16.0pp | 225 / 250 | 37 / 50 |
| Each modal dialog takes focus when opened | R | 80% | 8% | -72.4pp | 201 / 250 | 4 / 50 |
| Each dialog has a dialog role | R | 52% | 2% | -50.4pp | 131 / 250 | 1 / 50 |
| Each dialog can be closed by escape key | BP | 58% | 2% | -55.6pp | 144 / 250 | 1 / 50 |
| Each modal dialog traps keyboard focus | R | 53% | 0% | -52.8pp | 132 / 250 | 0 / 50 |
| Focus is not lost when each dialog closes | R | 57% | 0% | -56.8pp | 142 / 250 | 0 / 50 |
shopping-home-page
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a single banner | R | 4% | 8% | +3.6pp | 11 / 250 | 4 / 50 |
| Has a single footer | R | 6% | 2% | -3.6pp | 14 / 250 | 1 / 50 |
| Has a single maincontent | R | 64% | 2% | -62.4pp | 161 / 250 | 1 / 50 |
| Has an h1 | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Has at least one h2 | R | 2% | 0% | -1.6pp | 4 / 250 | 0 / 50 |
| Has single h1 | BP | 2% | 0% | -1.6pp | 4 / 250 | 0 / 50 |
| Has at least one navigation | R | 4% | 0% | -4.4pp | 11 / 250 | 0 / 50 |
simple-contact-form
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 87% | 58% | -29.2pp | 218 / 250 | 29 / 50 |
| Helper text is programmatically associated | R | 82% | 22% | -59.6pp | 204 / 250 | 11 / 50 |
| Required fields are indicated (visually and programmatically) | R | 44% | 8% | -35.6pp | 109 / 250 | 4 / 50 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Visible label is included in accessible name | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
1. Basic — overall Δ pass rate +36.9pp
Overall: Control 10% (n=1000) → Variant 46% (n=200). Avg WCAG failures/sample: 8.27 → 3.75 (Δ -4.52).
Most improved test cases
| Test case | Control pass rate | Variant pass rate | Δ pass rate | Δ avg WCAG failures |
|---|---|---|---|---|
| simple-contact-form | 9% | 62% | +53.2pp | -2.44 |
| modal-dialog | 6% | 46% | +39.6pp | -4.12 |
| disclosure-widget | 20% | 54% | +33.6pp | -0.74 |
| shopping-home-page | 3% | 24% | +21.2pp | -10.77 |
Most reduced axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| color-contrast | 76.3% | 32.5% | -43.8pp | Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds |
| link-name | 2.8% | 0.0% | -2.8pp | Ensure links have discernible text |
| aria-prohibited-attr | 0.6% | 0.0% | -0.6pp | Ensure ARIA attributes are not prohibited for an element's role |
| button-name | 0.5% | 0.0% | -0.5pp | Ensure buttons have discernible text |
| aria-required-children | 0.3% | 0.0% | -0.3pp | Ensure elements with an ARIA role that require child roles contain them |
Most increased axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| link-in-text-block | 0.3% | 1.0% | +0.7pp | Ensure links are distinguished from surrounding text in a way that does not rely on color |
| list | 0.0% | 0.5% | +0.5pp | Ensure that lists are structured correctly |
| aria-hidden-focus | 1.2% | 1.5% | +0.3pp | Ensure aria-hidden elements are not focusable nor contain focusable elements |
Assertion analysis (vs control)
Failure rates are computed per assertion (within each test case) and compared between the variant and control.
Most improved assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| simple-contact-form | Helper text is programmatically associated | R | 82% | 6% | -75.6pp | 204 / 250 | 3 / 50 |
| modal-dialog | Each modal dialog takes focus when opened | R | 80% | 8% | -72.4pp | 201 / 250 | 4 / 50 |
| shopping-home-page | Has a single maincontent | R | 64% | 0% | -64.4pp | 161 / 250 | 0 / 50 |
| modal-dialog | Focus is not lost when each dialog closes | R | 57% | 2% | -54.8pp | 142 / 250 | 1 / 50 |
| modal-dialog | Each dialog can be closed by escape key | BP | 58% | 4% | -53.6pp | 144 / 250 | 2 / 50 |
| simple-contact-form | Inputs use appropriate autocomplete for purpose | R | 87% | 34% | -53.2pp | 218 / 250 | 17 / 50 |
| modal-dialog | Each modal dialog traps keyboard focus | R | 53% | 0% | -52.8pp | 132 / 250 | 0 / 50 |
| modal-dialog | Each dialog has a dialog role | R | 52% | 2% | -50.4pp | 131 / 250 | 1 / 50 |
| disclosure-widget | All examples have a valid semantics | R | 54% | 6% | -47.6pp | 134 / 250 | 3 / 50 |
| simple-contact-form | Required fields are indicated (visually and programmatically) | R | 44% | 0% | -43.6pp | 109 / 250 | 0 / 50 |
Most regressed assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| disclosure-widget | Collapsed content is hidden from assistive technology | R | 22% | 32% | +10.0pp | 55 / 250 | 16 / 50 |
| shopping-home-page | Has a single banner | R | 4% | 10% | +5.6pp | 11 / 250 | 5 / 50 |
| shopping-home-page | Has single h1 | BP | 2% | 6% | +4.4pp | 4 / 250 | 3 / 50 |
| shopping-home-page | Has an h1 | R | 0% | 2% | +2.0pp | 0 / 250 | 1 / 50 |
All assertion deltas (per test case)
disclosure-widget
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from assistive technology | R | 22% | 32% | +10.0pp | 55 / 250 | 16 / 50 |
| All examples have a valid semantics | R | 54% | 6% | -47.6pp | 134 / 250 | 3 / 50 |
modal-dialog
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 90% | 50% | -40.0pp | 225 / 250 | 25 / 50 |
| Each modal dialog takes focus when opened | R | 80% | 8% | -72.4pp | 201 / 250 | 4 / 50 |
| Each dialog can be closed by escape key | BP | 58% | 4% | -53.6pp | 144 / 250 | 2 / 50 |
| Each dialog has a dialog role | R | 52% | 2% | -50.4pp | 131 / 250 | 1 / 50 |
| Focus is not lost when each dialog closes | R | 57% | 2% | -54.8pp | 142 / 250 | 1 / 50 |
| Each modal dialog traps keyboard focus | R | 53% | 0% | -52.8pp | 132 / 250 | 0 / 50 |
shopping-home-page
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a single banner | R | 4% | 10% | +5.6pp | 11 / 250 | 5 / 50 |
| Has single h1 | BP | 2% | 6% | +4.4pp | 4 / 250 | 3 / 50 |
| Has an h1 | R | 0% | 2% | +2.0pp | 0 / 250 | 1 / 50 |
| Has a single footer | R | 6% | 2% | -3.6pp | 14 / 250 | 1 / 50 |
| Has at least one h2 | R | 2% | 0% | -1.6pp | 4 / 250 | 0 / 50 |
| Has at least one navigation | R | 4% | 0% | -4.4pp | 11 / 250 | 0 / 50 |
| Has a single maincontent | R | 64% | 0% | -64.4pp | 161 / 250 | 0 / 50 |
simple-contact-form
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 87% | 34% | -53.2pp | 218 / 250 | 17 / 50 |
| Helper text is programmatically associated | R | 82% | 6% | -75.6pp | 204 / 250 | 3 / 50 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Visible label is included in accessible name | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Required fields are indicated (visually and programmatically) | R | 44% | 0% | -43.6pp | 109 / 250 | 0 / 50 |
2. Detailed Instructions — overall Δ pass rate +48.4pp
Overall: Control 10% (n=1000) → Variant 58% (n=200). Avg WCAG failures/sample: 8.27 → 1.27 (Δ -6.99).
Most improved test cases
| Test case | Control pass rate | Variant pass rate | Δ pass rate | Δ avg WCAG failures |
|---|---|---|---|---|
| shopping-home-page | 3% | 72% | +69.2pp | -20.95 |
| simple-contact-form | 9% | 60% | +51.2pp | -2.34 |
| disclosure-widget | 20% | 68% | +47.6pp | -1.26 |
| modal-dialog | 6% | 32% | +25.6pp | -3.42 |
Most reduced axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| color-contrast | 76.3% | 15.0% | -61.3pp | Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds |
| link-name | 2.8% | 0.0% | -2.8pp | Ensure links have discernible text |
| aria-hidden-focus | 1.2% | 0.0% | -1.2pp | Ensure aria-hidden elements are not focusable nor contain focusable elements |
| aria-prohibited-attr | 0.6% | 0.0% | -0.6pp | Ensure ARIA attributes are not prohibited for an element's role |
| button-name | 0.5% | 0.0% | -0.5pp | Ensure buttons have discernible text |
Most increased axe WCAG rules
| Rule | Control rate | Variant rate | Δ rate | Description |
|---|---|---|---|---|
| nested-interactive | 0.0% | 0.5% | +0.5pp | Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies |
| aria-required-children | 0.3% | 0.5% | +0.2pp | Ensure elements with an ARIA role that require child roles contain them |
Assertion analysis (vs control)
Failure rates are computed per assertion (within each test case) and compared between the variant and control.
Most improved assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| simple-contact-form | Helper text is programmatically associated | R | 82% | 6% | -75.6pp | 204 / 250 | 3 / 50 |
| shopping-home-page | Has a single maincontent | R | 64% | 0% | -64.4pp | 161 / 250 | 0 / 50 |
| modal-dialog | Each modal dialog takes focus when opened | R | 80% | 24% | -56.4pp | 201 / 250 | 12 / 50 |
| simple-contact-form | Inputs use appropriate autocomplete for purpose | R | 87% | 36% | -51.2pp | 218 / 250 | 18 / 50 |
| disclosure-widget | All examples have a valid semantics | R | 54% | 6% | -47.6pp | 134 / 250 | 3 / 50 |
| simple-contact-form | Required fields are indicated (visually and programmatically) | R | 44% | 0% | -43.6pp | 109 / 250 | 0 / 50 |
| modal-dialog | Focus is not lost when each dialog closes | R | 57% | 16% | -40.8pp | 142 / 250 | 8 / 50 |
| modal-dialog | Each modal dialog traps keyboard focus | R | 53% | 14% | -38.8pp | 132 / 250 | 7 / 50 |
| modal-dialog | Each dialog can be closed by escape key | BP | 58% | 20% | -37.6pp | 144 / 250 | 10 / 50 |
| modal-dialog | Each dialog has a dialog role | R | 52% | 18% | -34.4pp | 131 / 250 | 9 / 50 |
Most regressed assertions
| Test case | Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|---|
| shopping-home-page | Has single h1 | BP | 2% | 6% | +4.4pp | 4 / 250 | 3 / 50 |
| shopping-home-page | Has a single banner | R | 4% | 8% | +3.6pp | 11 / 250 | 4 / 50 |
All assertion deltas (per test case)
disclosure-widget
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Collapsed content is hidden from assistive technology | R | 22% | 22% | +0.0pp | 55 / 250 | 11 / 50 |
| All examples have a valid semantics | R | 54% | 6% | -47.6pp | 134 / 250 | 3 / 50 |
modal-dialog
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Each modal dialog hides content behind it while open | R | 90% | 64% | -26.0pp | 225 / 250 | 32 / 50 |
| Each modal dialog takes focus when opened | R | 80% | 24% | -56.4pp | 201 / 250 | 12 / 50 |
| Each dialog can be closed by escape key | BP | 58% | 20% | -37.6pp | 144 / 250 | 10 / 50 |
| Each dialog has a dialog role | R | 52% | 18% | -34.4pp | 131 / 250 | 9 / 50 |
| Focus is not lost when each dialog closes | R | 57% | 16% | -40.8pp | 142 / 250 | 8 / 50 |
| Each modal dialog traps keyboard focus | R | 53% | 14% | -38.8pp | 132 / 250 | 7 / 50 |
shopping-home-page
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Has a single banner | R | 4% | 8% | +3.6pp | 11 / 250 | 4 / 50 |
| Has single h1 | BP | 2% | 6% | +4.4pp | 4 / 250 | 3 / 50 |
| Has a single footer | R | 6% | 2% | -3.6pp | 14 / 250 | 1 / 50 |
| Has an h1 | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Has at least one h2 | R | 2% | 0% | -1.6pp | 4 / 250 | 0 / 50 |
| Has at least one navigation | R | 4% | 0% | -4.4pp | 11 / 250 | 0 / 50 |
| Has a single maincontent | R | 64% | 0% | -64.4pp | 161 / 250 | 0 / 50 |
simple-contact-form
| Assertion | Type | Control fail rate | Variant fail rate | Δ fail rate | Control failures/total | Variant failures/total |
|---|---|---|---|---|---|---|
| Inputs use appropriate autocomplete for purpose | R | 87% | 36% | -51.2pp | 218 / 250 | 18 / 50 |
| Helper text is programmatically associated | R | 82% | 6% | -75.6pp | 204 / 250 | 3 / 50 |
| Each text input has an accessible name | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Each text input has textbox role | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Placeholder text is programmatically defined as a property | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Text inputs are keyboard focusable | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Visible label is included in accessible name | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Visual labels are defined and persistent | R | 0% | 0% | +0.0pp | 0 / 250 | 0 / 50 |
| Required fields are indicated (visually and programmatically) | R | 44% | 0% | -43.6pp | 109 / 250 | 0 / 50 |
Detailed Results
No samples match the current filters.
disclosure-widget
Prompt
- Generate an HTML file that demonstrates an expand/collapse widget. - Wrap each widget with a div that has an `example` class. - Give the container for controlled content a `details` class.
DeepSeek V3.2
— 0%
— 0%
— 20%
— 60%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Sample 0 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 8
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 8
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 8
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 8
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 8
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 7
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (7) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - nested-interactive (serious): Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 9
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (9) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Claude Haiku 4.5
— 0%
— 0%
— 20%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0090
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 3 | $0.0079
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0079
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0079
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0091
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 2 | $0.0082
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 10 | $0.0085
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0084
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0080
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 8 | $0.0095
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0075
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 5 | BP: 12 | $0.0101
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0076
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0080
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0080
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 8 | BP: 11 | $0.0077
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (11) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 7 | $0.0074
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (7) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0084
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 7 | $0.0095
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (7) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0078
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0085
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0084
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0083
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0075
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0078
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 10 | $0.0129
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail - elementHandle.$eval: Failed to find element matching selector ".details"
Axe Best Practice Issues (10) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 12 | $0.0114
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 3 | $0.0091
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 9 | $0.0104
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (9) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 6 | BP: 7 | $0.0111
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (7) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 2 | BP: 9 | $0.0088
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (9) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 10 | $0.0098
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (10) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0101
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0098
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0109
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0206
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail - elementHandle.$eval: Failed to find element matching selector ".details"
Sample 1 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0158
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail - elementHandle.$eval: Failed to find element matching selector ".details"
Sample 2 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0194
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 3 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0193
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail - elementHandle.$eval: Failed to find element matching selector ".details"
Sample 4 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0207
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Claude Opus 4.6
— 0%
— 20%
— 40%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0303
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0355
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0352
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0311
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0347
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0321
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0304
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0359
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0346
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0306
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0345
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0306
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0313
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0295
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0351
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0307
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0301
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0319
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0350
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0318
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0349
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0313
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0301
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0355
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0299
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0498
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0684
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0631
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0619
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0621
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0432
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0409
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0422
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: 0. Minimal
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0336
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0388
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0907
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0963
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.1345
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0969
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0979
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Claude Sonnet 4.5
— 0%
— 0%
— 20%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0212
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0383
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 11 | $0.0233
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (11) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 2 | $0.0327
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 2 | $0.0319
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0206
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0205
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 3 | $0.0392
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0218
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 12 | $0.0391
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0191
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0404
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 3 | $0.0341
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 7 | $0.0218
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (7) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 11 | $0.0255
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (11) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0386
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0345
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0212
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 9 | $0.0224
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (9) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 11 | $0.0210
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (11) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 3 | $0.0313
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0370
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0190
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0186
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0235
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0228
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0236
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0271
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0250
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0277
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0385
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 8 | $0.0279
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 2 | $0.0239
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0266
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 7 | $0.0291
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (7) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0619
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0500
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0492
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0491
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0507
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Gemini 3 Flash Preview
— 0%
— 80%
— 60%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Gemini 3 Pro Preview
— 8%
— 60%
— 80%
— 60%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 8% | 37% | 65% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Sample 0 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3 Pro Preview)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3 Pro Preview)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 0 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
GPT-5 Mini
— 56%
— 80%
— 100%
— 80%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 14
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 56% | 99% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Sample 0 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 4
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 7
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (7) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Sample 7 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 7
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Axe Best Practice Issues (7) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 23 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 0 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 2 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 4 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
GPT-5.2
— 48%
— 100%
— 100%
— 80%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 12
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 48% | 98% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Sample 0 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 1 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 4 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - aria-allowed-attr (critical): Ensure an element's role supports its ARIA attributes
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 6 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Sample 7 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 9 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 12 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 13 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 18 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 19 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - link-in-text-block (serious): Ensure links are distinguished from surrounding text in a way that does not rely on color
Sample 22 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 23 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 0 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 0 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 0 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
GPT-5.2 Codex
— 92%
— 100%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 23
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 92% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5.2 Codex)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 0 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 1 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 2 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Sample 4 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: All examples have a valid semantics (R): pass
- ✅: Collapsed content is hidden from assistive technology (R): pass
Grok 4 Fast Non-Reasoning
— 0%
— 0%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 12
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 8
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 7
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (7) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 8
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (5) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (8) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ❌: All examples have a valid semantics (R): fail
- ✅: Collapsed content is hidden from assistive technology (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: All examples have a valid semantics (R): pass
- ❌: Collapsed content is hidden from assistive technology (R): fail
modal-dialog
Prompt
Create an example of a modal dialog component. It is closed by default, and the button to open it has a `trigger` class.
DeepSeek V3.2
— 0%
— 0%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (8) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for getByRole('button', { name: /\b(close|okay|ok|dismiss|exit|cancel|submit|apply|x)\b/iu }).first()[22m [2m - locator resolved to <button id="closeButton" class="modal-close" aria-label="Close dialog">↵ ×↵ </button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is not stable[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is not stable[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 58 × waiting for element to be visible, enabled and stable[22m [2m - element is not visible[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for getByRole('button', { name: /\b(close|okay|ok|dismiss|exit|cancel|submit|apply|x)\b/iu }).first()[22m [2m - locator resolved to <button id="closeButton" class="modal-close" aria-label="Close dialog">↵ ×↵ </button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is not stable[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is not stable[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 58 × waiting for element to be visible, enabled and stable[22m [2m - element is not visible[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 6
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button type="button" class="button trigger" aria-haspopup="dialog" data-modal-target="demo-modal">Open Example Modal</button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" role="dialog" id="demo-modal" aria-modal="true" class="modal-overlay" aria-labelledby="modal-title" aria-describedby="modal-description">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" role="dialog" id="demo-modal" aria-modal="true" class="modal-overlay" aria-labelledby="modal-title" aria-describedby="modal-description">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 58 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" role="dialog" id="demo-modal" aria-modal="true" class="modal-overlay" aria-labelledby="modal-title" aria-describedby="modal-description">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each dialog can be closed by escape key (BP): fail - utils is not defined
- ❌: Each modal dialog traps keyboard focus (R): fail - utils is not defined
- ❌: Each modal dialog takes focus when opened (R): fail - utils is not defined
- ❌: Focus is not lost when each dialog closes (R): fail - utils is not defined
- ❌: Each modal dialog hides content behind it while open (R): fail - utils is not defined
Sample 1 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Claude Haiku 4.5
— 0%
— 0%
— 20%
— 20%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Sample 0 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 2 | BP: 4 | $0.0093
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0092
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0091
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0089
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0097
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0097
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0091
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0085
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0095
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0086
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 2 | BP: 4 | $0.0080
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0090
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0088
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0087
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0089
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0094
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0091
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0085
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 2 | BP: 4 | $0.0093
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0085
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0094
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0086
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0087
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 6 | $0.0083
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0083
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0111
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0084
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0112
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0114
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 2 | $0.0125
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0101
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0102
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0105
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0101
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0128
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0194
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0182
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Sample 2 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0211
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0183
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Sample 4 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0200
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Claude Opus 4.6
— 0%
— 0%
— 0%
— 20%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Sample 0 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0450
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0462
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0453
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0458
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0462
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 5 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0473
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail - locator.evaluate: AbortError: The user aborted a request.
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 6 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0460
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 7 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0453
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 8 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0453
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 9 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0462
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 10 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0453
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 11 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0458
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 12 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0435
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 13 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0462
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 14 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0465
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 15 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0465
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 16 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0478
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail - locator.evaluate: AbortError: The user aborted a request.
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 17 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0455
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 18 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0455
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 19 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0468
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 20 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0461
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 21 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0499
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 22 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0455
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 23 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0462
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 24 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0459
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0614
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0578
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0575
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0608
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0572
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0518
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0553
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0499
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0546
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | $0.0505
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.1061
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.1115
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.1048
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.1018
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.1056
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Claude Sonnet 4.5
— 0%
— 0%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0279
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0273
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0262
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0263
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0266
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0260
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0265
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0260
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 6 | BP: 4 | $0.0270
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0277
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 6 | BP: 4 | $0.0257
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0258
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0267
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0266
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0236
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0269
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0256
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0277
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 6 | BP: 4 | $0.0262
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0296
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0278
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0266
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0232
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0274
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0259
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0357
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0349
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0353
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0367
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0345
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 2 | $0.0363
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 4 | $0.0329
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 4 | $0.0345
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 4 | BP: 2 | $0.0373
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0344
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0684
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0565
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0634
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0645
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0603
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Gemini 3 Flash Preview
— 0%
— 80%
— 100%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Gemini 3 Pro Preview
— 0%
— 20%
— 100%
— 20%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Sample 0 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 5 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 6 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 8 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 9 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 10 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 11 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 12 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 13 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 14 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 15 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 16 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 17 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 20 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 21 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 22 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 24 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
GPT-5 Mini
— 36%
— 80%
— 80%
— 60%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 9
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 36% | 92% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Sample 0 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 5 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 6 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 8 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 9 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ❌: Each modal dialog traps keyboard focus (R): fail
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button id="open-1" class="trigger">Open Modal</button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 14 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each dialog can be closed by escape key (BP): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button id="open-1" class="trigger">Open Modal</button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 14 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each modal dialog traps keyboard focus (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button id="open-1" class="trigger">Open Modal</button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 14 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each modal dialog takes focus when opened (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button id="open-1" class="trigger">Open Modal</button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 14 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Focus is not lost when each dialog closes (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button id="open-1" class="trigger">Open Modal</button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 14 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each modal dialog hides content behind it while open (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button id="open-1" class="trigger">Open Modal</button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 14 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <label for="email">Email</label> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div id="modal-desc" class="modal__body">…</div> from <div id="modal" class="modal" aria-hidden="true">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
Sample 11 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 12 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 14 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 15 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 16 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 18 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 19 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 20 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 22 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 23 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 24 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button id="openDialog" aria-expanded="false" aria-haspopup="dialog" class="btn primary trigger">↵ Open Dialog↵ </button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 58 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each dialog can be closed by escape key (BP): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button id="openDialog" aria-expanded="false" aria-haspopup="dialog" class="btn primary trigger">↵ Open Dialog↵ </button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 58 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each modal dialog traps keyboard focus (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button id="openDialog" aria-expanded="false" aria-haspopup="dialog" class="btn primary trigger">↵ Open Dialog↵ </button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 58 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each modal dialog takes focus when opened (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button id="openDialog" aria-expanded="false" aria-haspopup="dialog" class="btn primary trigger">↵ Open Dialog↵ </button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 58 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Focus is not lost when each dialog closes (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button id="openDialog" aria-expanded="false" aria-haspopup="dialog" class="btn primary trigger">↵ Open Dialog↵ </button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 58 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each modal dialog hides content behind it while open (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button id="openDialog" aria-expanded="false" aria-haspopup="dialog" class="btn primary trigger">↵ Open Dialog↵ </button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 100ms[22m [2m 58 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <div hidden="" class="overlay" id="modalOverlay" aria-hidden="true">…</div> intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
Sample 2 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Sample 3 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
GPT-5.2
— 28%
— 60%
— 80%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 7
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 28% | 84% | 99% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for locator('.trigger').filter({ visible: true }).first()[22m [2m - locator resolved to <button type="button" class="trigger" data-modal-open="example-modal">↵ Open modal↵ </button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <p>You can place any content here: forms, confirmati…</p> from <div hidden="" class="modal" id="example-modal">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <p>You can place any content here: forms, confirmati…</p> from <div hidden="" class="modal" id="example-modal">…</div> subtree intercepts pointer events[22m [2m 2 × retrying click action[22m [2m - waiting 100ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <p>This is a simple modal dialog component. Click ou…</p> from <div hidden="" class="modal" id="example-modal">…</div> subtree intercepts pointer events[22m [2m 14 × retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <p>You can place any content here: forms, confirmati…</p> from <div hidden="" class="modal" id="example-modal">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <p>You can place any content here: forms, confirmati…</p> from <div hidden="" class="modal" id="example-modal">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <p>This is a simple modal dialog component. Click ou…</p> from <div hidden="" class="modal" id="example-modal">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <p>This is a simple modal dialog component. Click ou…</p> from <div hidden="" class="modal" id="example-modal">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is visible, enabled and stable[22m [2m - scrolling into view if needed[22m [2m - done scrolling[22m [2m - <p>You can place any content here: forms, confirmati…</p> from <div hidden="" class="modal" id="example-modal">…</div> subtree intercepts pointer events[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each dialog can be closed by escape key (BP): fail - utils is not defined
- ❌: Each modal dialog traps keyboard focus (R): fail - utils is not defined
- ❌: Each modal dialog takes focus when opened (R): fail - utils is not defined
- ❌: Focus is not lost when each dialog closes (R): fail - utils is not defined
- ❌: Each modal dialog hides content behind it while open (R): fail - utils is not defined
Sample 2 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 5 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 6 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 7 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 8 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 9 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 10 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 11 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 12 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 13 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 14 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 15 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 17 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 18 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 19 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 20 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 21 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 22 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 23 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 24 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 0 (GPT-5.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
GPT-5.2 Codex
— 0%
— 0%
— 80%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 3 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 4 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 5 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 6 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 7 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 8 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 9 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 10 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 11 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 12 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 13 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 14 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 15 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 16 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 17 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 18 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 19 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 20 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 21 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 22 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 23 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 24 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5.2 Codex)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 1 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 0 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 1 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 2 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Sample 4 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Grok 4 Fast Non-Reasoning
— 0%
— 0%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 12
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 12
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 12
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 12
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 8
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 6
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 10
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 12
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail - utils is not defined
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ✅: Each modal dialog hides content behind it while open (R): pass
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail - locator.evaluate: AbortError: The user aborted a request.
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (2) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 6
Assertions ❌
- ✅: Each dialog has a dialog role (R): pass
- ✅: Each dialog can be closed by escape key (BP): pass
- ✅: Each modal dialog traps keyboard focus (R): pass
- ✅: Each modal dialog takes focus when opened (R): pass
- ✅: Focus is not lost when each dialog closes (R): pass
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for getByRole('button', { name: /\b(close|okay|ok|dismiss|exit|cancel|submit|apply|x)\b/iu }).first()[22m [2m - locator resolved to <button type="button" class="modal-close" aria-label="Close dialog">…</button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is not stable[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is not stable[22m [2m 2 × retrying click action[22m [2m - waiting 100ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is not visible[22m [2m 57 × retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is not visible[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each dialog can be closed by escape key (BP): fail
- ✅: Each modal dialog traps keyboard focus (R): pass
- ❌: Each modal dialog takes focus when opened (R): fail
- ❌: Focus is not lost when each dialog closes (R): fail - locator.click: Timeout 30000ms exceeded. Call log: [2m - waiting for getByRole('button', { name: /\b(close|okay|ok|dismiss|exit|cancel|submit|apply|x)\b/iu }).first()[22m [2m - locator resolved to <button type="button" class="modal-close" aria-label="Close dialog">…</button>[22m [2m - attempting click action[22m [2m 2 × waiting for element to be visible, enabled and stable[22m [2m - element is not stable[22m [2m - retrying click action[22m [2m - waiting 20ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is not stable[22m [2m 2 × retrying click action[22m [2m - waiting 100ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is not visible[22m [2m 57 × retrying click action[22m [2m - waiting 500ms[22m [2m - waiting for element to be visible, enabled and stable[22m [2m - element is not visible[22m [2m - retrying click action[22m [2m - waiting 500ms[22m
- ❌: Each modal dialog hides content behind it while open (R): fail
Axe WCAG Failures (4) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ❌: Each dialog has a dialog role (R): fail
- ❌: Each dialog can be closed by escape key (BP): fail - Unable to test because no dialog was found
- ❌: Each modal dialog traps keyboard focus (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog takes focus when opened (R): fail - Unable to test because no dialog was found
- ❌: Focus is not lost when each dialog closes (R): fail - Unable to test because no dialog was found
- ❌: Each modal dialog hides content behind it while open (R): fail - Unable to test because no dialog was found
shopping-home-page
Prompt
Generate an HTML file for the home page of a shopping website. The home page has several sections of content for popular items, deals, and blog posts.
DeepSeek V3.2
— 0%
— 0%
— 0%
— 60%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Sample 0 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (5x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 21 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (21) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (16x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 33 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (33) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 28 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (28) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 36
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (36) ❌
- (29x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Sample 5 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 29
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - link-name (serious): Ensure links have discernible text
Sample 6 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 25
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (25) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (20x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 7 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 23 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (21x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 25
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (25) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (20x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 9 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 29
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (2x) - button-name (critical): Ensure buttons have discernible text
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 10 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 20
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (20) ❌
- (16x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 11 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 16 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 41
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (41) ❌
- (33x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (8x) - link-name (serious): Ensure links have discernible text
Sample 13 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 20 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (20) ❌
- (14x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (6x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 32 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (32) ❌
- (25x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 24
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (17x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Sample 16 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 12 | BP: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (12) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (5x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (4) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 18 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 30
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (30) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - link-name (serious): Ensure links have discernible text
Sample 19 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 36 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (36) ❌
- (29x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 16 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (16x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 26 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (20x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (6x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 51 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (51) ❌
- (44x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 25
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (25) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (18x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (6x) - link-name (serious): Ensure links have discernible text
Sample 24 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 27
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (27) ❌
- (20x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Sample 0 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 18
Assertions ✅
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (18) ❌
- (18x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - link-in-text-block (serious): Ensure links are distinguished from surrounding text in a way that does not rely on color
Sample 3 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 9
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 8
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 32
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (32) ❌
- (32x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 16
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (16x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 26
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (1x) - button-name (critical): Ensure buttons have discernible text
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (6x) - link-name (serious): Ensure links have discernible text
Sample 3 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 31
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (31) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (7x) - link-name (serious): Ensure links have discernible text
Sample 4 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 10
Assertions ✅
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 15
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (15) ❌
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Claude Haiku 4.5
— 0%
— 0%
— 0%
— 80%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Sample 0 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 31 | BP: 34 | $0.0215
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (31) ❌
- (31x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (34) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 30 | $0.0231
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (30) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 38 | BP: 45 | $0.0286
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (38) ❌
- (38x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (45) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 34 | BP: 35 | $0.0248
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (34) ❌
- (34x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (35) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 31 | BP: 42 | $0.0227
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (31) ❌
- (31x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (42) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 41 | BP: 54 | $0.0262
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (41) ❌
- (41x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (54) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 48 | BP: 56 | $0.0267
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (48) ❌
- (48x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (56) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 19 | BP: 31 | $0.0230
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (31) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 22 | BP: 36 | $0.0232
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (22) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (36) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 40 | BP: 56 | $0.0257
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (40) ❌
- (40x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (56) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 38 | BP: 43 | $0.0275
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (38) ❌
- (38x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (43) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 31 | BP: 41 | $0.0233
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (31) ❌
- (31x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (41) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 30 | BP: 44 | $0.0233
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (30) ❌
- (30x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (44) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 36 | BP: 40 | $0.0253
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (36) ❌
- (36x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 27 | BP: 30 | $0.0224
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (27) ❌
- (27x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (30) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 21 | BP: 2 | $0.0208
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (21) ❌
- (21x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 17 | BP: 24 | $0.0236
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (17) ❌
- (17x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (24) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 32 | BP: 38 | $0.0246
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (32) ❌
- (32x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (38) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 44 | BP: 2 | $0.0258
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (44) ❌
- (44x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 18 | BP: 26 | $0.0244
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (18) ❌
- (18x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (26) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 38 | $0.0263
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (38) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 40 | BP: 70 | $0.0239
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (40) ❌
- (40x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (70) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 19 | BP: 31 | $0.0238
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (31) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 35 | BP: 46 | $0.0231
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (35) ❌
- (35x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (46) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 24 | BP: 30 | $0.0237
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (30) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 7 | BP: 2 | $0.0265
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 29 | $0.0213
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (29x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 40 | $0.0250
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (40) ❌
- (40x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 33 | BP: 1 | $0.0291
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (33) ❌
- (33x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 52 | BP: 3 | $0.0317
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (52) ❌
- (52x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 38 | BP: 2 | $0.0247
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (38) ❌
- (38x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 22 | $0.0232
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (22) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 26 | BP: 2 | $0.0242
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 27 | BP: 39 | $0.0240
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (27) ❌
- (27x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (39) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 41 | BP: 3 | $0.0199
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (41) ❌
- (41x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 1 | $0.0361
Assertions ✅
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0433
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 1 | $0.0401
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0380
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0374
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Claude Opus 4.6
— 0%
— 0%
— 0%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 62 | BP: 52 | $0.3197
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (62) ❌
- (62x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (52) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 79 | BP: 59 | $0.3399
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (79) ❌
- (79x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (59) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 56 | BP: 58 | $0.3292
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (56) ❌
- (56x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (58) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 73 | BP: 28 | $0.3381
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (73) ❌
- (73x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (28) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 64 | BP: 37 | $0.3105
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (64) ❌
- (64x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (37) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 55 | BP: 60 | $0.3468
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (55) ❌
- (55x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (60) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 50 | BP: 35 | $0.3179
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (50) ❌
- (50x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (35) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 63 | BP: 52 | $0.3168
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (63) ❌
- (63x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (52) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 56 | BP: 34 | $0.3062
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (56) ❌
- (56x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (34) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 60 | BP: 62 | $0.3373
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (60) ❌
- (60x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (62) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 61 | BP: 53 | $0.3026
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (61) ❌
- (61x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (53) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 54 | BP: 35 | $0.3188
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (54) ❌
- (54x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (35) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 60 | BP: 35 | $0.3427
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (60) ❌
- (60x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (35) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 66 | BP: 15 | $0.3626
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (66) ❌
- (66x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (15) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 73 | BP: 42 | $0.3839
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (73) ❌
- (73x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (42) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 68 | BP: 61 | $0.3482
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (68) ❌
- (68x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (61) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 69 | BP: 34 | $0.3342
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (69) ❌
- (69x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (34) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 58 | BP: 70 | $0.3674
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (58) ❌
- (58x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (70) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 37 | $0.3562
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (37) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 71 | BP: 48 | $0.3568
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (71) ❌
- (71x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (48) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 48 | BP: 31 | $0.3483
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (48) ❌
- (46x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - link-in-text-block (serious): Ensure links are distinguished from surrounding text in a way that does not rely on color
Axe Best Practice Issues (31) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 82 | BP: 34 | $0.3803
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (82) ❌
- (82x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (34) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 66 | BP: 59 | $0.3405
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (66) ❌
- (66x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (59) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 79 | BP: 83 | $0.3754
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (79) ❌
- (79x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (83) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 59 | BP: 53 | $0.2829
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (59) ❌
- (59x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (53) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 11 | BP: 2 | $0.3398
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (11) ❌
- (11x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 13 | $0.3401
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 7 | $0.3121
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 8 | BP: 3 | $0.3227
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 14 | BP: 2 | $0.3454
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (14) ❌
- (14x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 52 | $0.3467
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (52) ❌
- (52x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 47 | $0.3154
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (47) ❌
- (47x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 20 | BP: 1 | $0.3915
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (20) ❌
- (20x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 61 | $0.3428
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (61) ❌
- (61x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 38 | $0.3322
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (38) ❌
- (38x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.2579
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.3086
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.2268
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.2831
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.2480
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Claude Sonnet 4.5
— 0%
— 0%
— 0%
— 80%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Sample 0 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 48 | $0.0990
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (48) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 26 | BP: 64 | $0.1002
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (64) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 31 | BP: 40 | $0.0996
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (31) ❌
- (31x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 31 | BP: 40 | $0.1093
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (31) ❌
- (31x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 34 | BP: 41 | $0.0979
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (34) ❌
- (34x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (41) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 22 | BP: 45 | $0.0815
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (22) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (45) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 33 | BP: 56 | $0.1013
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (33) ❌
- (33x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (56) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 54 | BP: 56 | $0.1120
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (54) ❌
- (54x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (56) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 24 | BP: 41 | $0.1058
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (41) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 22 | BP: 41 | $0.0878
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (22) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (41) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 47 | BP: 41 | $0.1217
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (47) ❌
- (47x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (41) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 14 | BP: 40 | $0.0998
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (14) ❌
- (14x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 40 | BP: 59 | $0.1000
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (40) ❌
- (40x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (59) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 22 | BP: 40 | $0.1013
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (22) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 29 | BP: 55 | $0.1067
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (29x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (55) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 32 | BP: 63 | $0.1122
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (32) ❌
- (32x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (63) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 34 | BP: 64 | $0.0975
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (34) ❌
- (34x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (64) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 33 | BP: 40 | $0.0991
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (33) ❌
- (33x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 41 | BP: 41 | $0.1047
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (41) ❌
- (41x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (41) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 39 | BP: 40 | $0.1002
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (39) ❌
- (39x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 27 | BP: 40 | $0.0911
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (27) ❌
- (27x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 23 | BP: 33 | $0.0967
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (33) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 42 | BP: 58 | $0.1025
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (42) ❌
- (42x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (58) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 31 | BP: 40 | $0.1043
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (31) ❌
- (31x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 39 | BP: 40 | $0.1059
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (39) ❌
- (39x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 24 | $0.0996
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (24x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 52 | BP: 1 | $0.1213
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (52) ❌
- (52x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 35 | $0.1260
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (35) ❌
- (35x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 26 | $0.0986
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 56 | $0.1199
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (56) ❌
- (56x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 22 | $0.1069
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (22) ❌
- (22x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 31 | $0.0947
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (31) ❌
- (31x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 37 | $0.1040
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (37) ❌
- (37x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 26 | $0.1077
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 70 | $0.1131
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (70) ❌
- (70x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.1322
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.1130
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0974
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 12 | $0.1222
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (12) ❌
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.1245
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Gemini 3 Flash Preview
— 0%
— 20%
— 60%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 7
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (7) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (4x) - image-alt (critical): Ensure <img> elements have alternative text or a role of none or presentation
Sample 5 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - image-alt (critical): Ensure <img> elements have alternative text or a role of none or presentation
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 20
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (20) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 12 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (12) ❌
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 16
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (16) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - link-name (serious): Ensure links have discernible text
Sample 11 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 10 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 16
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (16) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 17 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 20 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 19
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (3x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (19) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 7
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 10
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Gemini 3 Pro Preview
— 0%
— 20%
— 0%
— 80%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Sample 0 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 27 | BP: 44
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (27) ❌
- (27x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (44) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 8
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 14 | BP: 20
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (14) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - scrollable-region-focusable (serious): Ensure elements that have scrollable content are accessible by keyboard
Axe Best Practice Issues (20) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - link-name (serious): Ensure links have discernible text
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13 | BP: 28
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (28) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 28 | BP: 39
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (28) ❌
- (28x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (39) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 26 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 15 | BP: 33
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (15) ❌
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (33) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 14 | BP: 41
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (14) ❌
- (14x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (41) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 35 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (35) ❌
- (35x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 30
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (30) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 12 | BP: 45
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (12) ❌
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (45) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 7
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (7) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 45
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (45) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 44
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (44) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13 | BP: 37
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (37) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 23 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 20
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (20) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 16
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (16) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 18 | BP: 22
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (18) ❌
- (18x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (22) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 19 | BP: 17
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (17) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 10 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 40
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 1
Assertions ❌
- ❌: Has an h1 (R): fail
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 7
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 9
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 18
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (18) ❌
- (18x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 5
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 0 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 9
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
GPT-5 Mini
— 8%
— 40%
— 20%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 8% | 37% | 65% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 31 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (31) ❌
- (31x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 18 | BP: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (18) ❌
- (18x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 12 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (12) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - select-name (critical): Ensure select element has an accessible name
Axe Best Practice Issues (4) ⚠️
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-contentinfo (moderate): Ensure the document has at most one contentinfo landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 9
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-main-is-top-level (moderate): Ensure the main landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-main (moderate): Ensure the document has at most one main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 18 | BP: 12
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (18) ❌
- (18x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 14 | BP: 1
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ❌: Has at least one navigation (R): fail
- ❌: Has a single footer (R): fail
Axe WCAG Failures (14) ❌
- (14x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 10 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 9
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (8) ❌
- (2x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-contentinfo (moderate): Ensure the document has at most one contentinfo landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 10 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 9
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (2x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (1x) - aria-required-children (critical): Ensure elements with an ARIA role that require child roles contain them
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 27 | BP: 10
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (27) ❌
- (1x) - aria-required-children (critical): Ensure elements with an ARIA role that require child roles contain them
- (26x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-main-is-top-level (moderate): Ensure the main landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-main (moderate): Ensure the document has at most one main landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 16 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ❌: Has at least one h2 (R): fail
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 9 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (9) ❌
- (1x) - aria-required-children (critical): Ensure elements with an ARIA role that require child roles contain them
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 29 | BP: 13
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ❌: Has at least one h2 (R): fail
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (29x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (13) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 16 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ❌: Has at least one h2 (R): fail
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (16x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 20 | BP: 8
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ❌: Has at least one h2 (R): fail
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (20) ❌
- (20x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 18 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (18) ❌
- (18x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-contentinfo (moderate): Ensure the document has at most one contentinfo landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 23 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 9
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 41 | BP: 9
Assertions ❌
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (41) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (38x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - link-in-text-block (serious): Ensure links are distinguished from surrounding text in a way that does not rely on color
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-contentinfo (moderate): Ensure the document has at most one contentinfo landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
Axe Best Practice Issues (3) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 9 | BP: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (2x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 16 | BP: 12
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (16) ❌
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (1x) - list (serious): Ensure that lists are structured correctly
Axe Best Practice Issues (12) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
- (2x) - list (serious): Ensure that lists are structured correctly
Axe Best Practice Issues (13) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (4) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 26 | BP: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (26) ❌
- (1x) - aria-hidden-focus (serious): Ensure aria-hidden elements are not focusable nor contain focusable elements
- (25x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (1x) - aria-required-children (critical): Ensure elements with an ARIA role that require child roles contain them
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (9) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-contentinfo (moderate): Ensure the document has at most one contentinfo landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (1x) - aria-required-children (critical): Ensure elements with an ARIA role that require child roles contain them
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - nested-interactive (serious): Ensure interactive controls are not nested as they are not always announced by screen readers or can cause focus problems for assistive technologies
Axe Best Practice Issues (3) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (3) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
GPT-5.2
— 20%
— 20%
— 80%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 71% | 94% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 90 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (90) ❌
- (90x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (1) ❌
- (1x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 34 | BP: 16
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (34) ❌
- (4x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (30x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (16) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 32 | BP: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (32) ❌
- (32x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (13) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 24 | BP: 8
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (24) ❌
- (3x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (21x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 27 | BP: 8
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (27) ❌
- (27x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (8) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 99 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (99) ❌
- (99x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 132 | BP: 12
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (132) ❌
- (4x) - aria-required-parent (critical): Ensure elements with an ARIA role that require parent roles are contained by them
- (128x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 12
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (12) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 12 | BP: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (12) ❌
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe Best Practice Issues (5) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 92 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe WCAG Failures (92) ❌
- (3x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (89x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 85 | BP: 14
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (85) ❌
- (1x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (4x) - aria-required-parent (critical): Ensure elements with an ARIA role that require parent roles are contained by them
- (80x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (14) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 30 | BP: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (30) ❌
- (30x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 29 | BP: 12
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (29) ❌
- (29x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 44 | BP: 14
Assertions ❌
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (44) ❌
- (44x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (14) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 39 | BP: 12
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (39) ❌
- (39x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 27 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (27) ❌
- (27x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 23 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 28 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (28) ❌
- (1x) - aria-prohibited-attr (serious): Ensure ARIA attributes are not prohibited for an element's role
- (27x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (3) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 15
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (15) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 8
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (8) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 10
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (10) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 18 | BP: 14
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (18) ❌
- (18x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (14) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 23
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ❌: Has a single footer (R): fail
Axe Best Practice Issues (23) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-contentinfo (moderate): Ensure the document has at most one contentinfo landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 15
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (15) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
- landmark-unique (moderate): Ensure landmarks are unique (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 28 | BP: 12
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (28) ❌
- (28x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 12
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (12) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
- landmark-contentinfo-is-top-level (moderate): Ensure the contentinfo landmark is at top level (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 23 | BP: 10
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (23) ❌
- (23x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (6) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 17
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (17) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 10
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (10) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
- landmark-complementary-is-top-level (moderate): Ensure the complementary landmark or aside is at top level (Best Practice - does not affect pass/fail)
GPT-5.2 Codex
— 0%
— 20%
— 80%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 9 | BP: 9
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (9) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 4
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 12 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (12) ❌
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 38
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (38) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 19 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 15 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (15) ❌
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 9 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13 | BP: 14
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (14) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 4 | BP: 40
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (4) ❌
- (4x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (40) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 9
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (9) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 11 | BP: 12
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (11) ❌
- (11x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 48
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (48) ❌
- (48x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 15 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 10 | BP: 10
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (10) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 9
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (9) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 9 | BP: 12
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (12) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 13 | BP: 1
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 19 | BP: 1
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (19) ❌
- (19x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ❌: Has at least one navigation (R): fail
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (4) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 23 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 9 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- heading-order (moderate): Ensure the order of headings is semantically correct (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (GPT-5.2 Codex)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 0 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 15
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (15) ❌
- (15x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 9
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (9) ❌
- (9x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 5 | BP: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (11) ⚠️
- aria-allowed-role (minor): Ensure role attribute has an appropriate value for the element (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 1 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 2 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 3 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Sample 4 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Grok 4 Fast Non-Reasoning
— 0%
— 0%
— 0%
— 20%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 8 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 27
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (27) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 6 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 4
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (4) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 25
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (25) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 6
Assertions ❌
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 3
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (3) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Has an h1 (R): pass
- ❌: Has single h1 (BP): fail
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 7
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (7) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 2 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ❌: Has a single maincontent (R): fail
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (2) ❌
- (2x) - link-in-text-block (serious): Ensure links are distinguished from surrounding text in a way that does not rely on color
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 11
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (11) ❌
- (11x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 12
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (12) ❌
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 6
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (6) ❌
- (6x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 13
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (13) ❌
- (13x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 8
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 11 | BP: 2
Assertions ❌
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ❌: Has a single banner (R): fail
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (11) ❌
- (11x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- landmark-banner-is-top-level (moderate): Ensure the banner landmark is at top level (Best Practice - does not affect pass/fail)
- landmark-no-duplicate-banner (moderate): Ensure the document has at most one banner landmark (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 7 | BP: 2
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (7) ❌
- (7x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (2) ⚠️
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 8
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 10
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (10) ❌
- (10x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 8
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (8) ❌
- (8x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 14
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (14) ❌
- (14x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 12
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (12) ❌
- (12x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 17
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
Axe WCAG Failures (17) ❌
- (17x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Has an h1 (R): pass
- ✅: Has single h1 (BP): pass
- ✅: Has at least one h2 (R): pass
- ✅: Has a single banner (R): pass
- ✅: Has a single maincontent (R): pass
- ✅: Has at least one navigation (R): pass
- ✅: Has a single footer (R): pass
simple-contact-form
Prompt
Create an HTML file with a contact form containing three text input fields: - Full Name (required field) - Email Address (required field, with helper text explaining the expected format) - Phone Number (optional field, with helper text about the preferred format) Wrap each field (label, input, and any helper text) in a div with class `form-field`. Include a submit button.
DeepSeek V3.2
— 0%
— 0%
— 0%
— 20%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Sample 0 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: XXX-XXX-XXXX.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use the format: (XXX) XXX-XXXX`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: XXX-XXX-XXXX`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use the format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use the format: XXX-XXX-XXXX.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (DeepSeek V3.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `This field is required.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 0 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use the format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (DeepSeek V3.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Please use format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Enter your first and last name.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (5) ❌
- (5x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (DeepSeek V3.2)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Claude Haiku 4.5
— 0%
— 0%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0050
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0053
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0055
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Optional. Preferred format: (123) 456-7890 or +1 (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0054
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0051
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com)` Found `Preferred format: (123) 456-7890 or +1-234-567-8900`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0050
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or +1-123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0053
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0051
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or +1-123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0047
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0053
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com)` Found `Optional. Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0050
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0053
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Please use the format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0052
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or +1-123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0052
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Optional. Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0056
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe` Found `john@example.com Please enter a valid email address (e.g., john@example.com)` Found `(123) 456-7890 Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0052
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (XXX) XXX-XXXX or XXX-XXX-XXXX`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0052
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0047
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0055
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0051
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0053
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0059
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Optional. Please use the format: +1 (555) 123-4567 or similar`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0050
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or +1 123 456 7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0045
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Haiku 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0052
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0067
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0071
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 2 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 1 | $0.0070
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0067
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0070
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0061
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0064
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0069
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe` Found `john@example.com Please enter a valid email address (e.g., yourname@example.com)` Found `(555) 123-4567 Optional. Please use the format: (555) 123-4567 or +1-555-123-4567`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0059
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Haiku 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0053
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0139
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 1 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0189
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 2 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0182
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0166
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (Claude Haiku 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0198
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Claude Opus 4.6
— 0%
— 40%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0172
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0172
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0161
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0172
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0172
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0172
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0161
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0160
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0182
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0175
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Opus 4.6)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0159
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter your email in the format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0287
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0305
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0280
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 6 | $0.0331
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0246
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0202
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Opus 4.6)
Instruction set: 0. Minimal
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0210
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0197
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Opus 4.6)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0197
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Opus 4.6)
Instruction set: 0. Minimal
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0204
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.1039
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0977
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.1003
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0926
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (Claude Opus 4.6)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0912
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Claude Sonnet 4.5
— 0%
— 20%
— 40%
— 60%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 3
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 60% | 100% | 100% |
Sample 0 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0122
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0142
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0134
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0142
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0150
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0149
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0127
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0129
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0148
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0122
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0149
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0150
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0152
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0146
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0144
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0143
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 3 | BP: 5 | $0.0151
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0122
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0139
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0138
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0150
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0145
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., example@domain.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0143
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890 or 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Claude Sonnet 4.5)
Instruction set: Control
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0125
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0180
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Claude Sonnet 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0178
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0208
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: 1. Basic
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0197
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (Claude Sonnet 4.5)
Instruction set: 1. Basic
FAIL | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0176
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0150
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0163
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0185
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
PASS | Latency 0.00s cached
Axe WCAG: 0 | BP: 5 | $0.0185
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Claude Sonnet 4.5)
Instruction set: 0. Minimal
FAIL | Latency 0.00s cached
Axe WCAG: 1 | BP: 5 | $0.0196
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0505
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0524
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s cached
Axe WCAG: 0 | $0.0491
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0501
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (Claude Sonnet 4.5)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s cached
Axe WCAG: 0 | $0.0515
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Gemini 3 Flash Preview
— 0%
— 20%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 555-5555`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Example: username@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Example: username@example.com` Found `Preferred format: (555) 000-0000`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 555-5555.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid address, e.g., name@example.com` Found `Preferred format: (555) 000-0000`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 555-5555`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional: Preferred format is (555) 000-0000.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: example@domain.com` Found `Preferred format: (555) 555-5555`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Example: name@example.com` Found `Preferred format: (555) 000-0000`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Example: username@example.com` Found `Preferred format: (555) 000-0000`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: username@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 555-5555.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: username@example.com` Found `Preferred format: (555) 555-5555`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 555-5555. (Optional)`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Example: username@domain.com` Found `Preferred format: (555) 000-0000`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please use the format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid address, e.g., name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: username@example.com` Found `Preferred format: (555) 555-5555`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3 Flash Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: example@domain.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (Gemini 3 Flash Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Gemini 3 Pro Preview
— 0%
— 80%
— 80%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe` Found `name@example.com Please enter a valid email (e.g., user@domain.com).` Found `(555) 123-4567 Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please use a valid format like name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: user@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: (555) 555-5555.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 123-4567 or 555-123-4567.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `John Doe` Found `john@example.com Please enter a valid email (e.g., name@domain.com).` Found `(555) 123-4567 Preferred format: (XXX) XXX-XXXX. This field is optional.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email (e.g., user@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please use a valid format, e.g., name@example.com` Found `Preferred format: (555) 123-4567`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email (e.g., name@example.com).` Found `Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email (e.g., user@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email (e.g., user@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 3 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Optional. Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please use a valid format like name@example.com` Found `Preferred format: 123-456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (555) 123-4567`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email (e.g., name@example.com).` Found `Optional. Preferred format: 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email format (e.g., user@example.com).` Found `Preferred format: (555) 123-4567 or digits only.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email (e.g., user@example.com).` Found `Preferred format: (555) 123-4567.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 123-4567. Digits only is also acceptable.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email (e.g., name@example.com).` Found `Optional. Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (Optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Gemini 3 Pro Preview)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please use a valid format, e.g., name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `(required)`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (Gemini 3 Pro Preview)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 0 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Gemini 3 Pro Preview)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 0 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (Gemini 3 Pro Preview)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
GPT-5 Mini
— 20%
— 20%
— 100%
— 80%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 71% | 94% |
Samples: 5 | Passes: 1
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 20% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Sample 0 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (1) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `First and last name`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe Required`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (1) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 6 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `First Last`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `First Last`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 7
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (7) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (1) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `This field is required.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (1) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 19 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `e.g. Alex Johnson`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 20 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (1) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5 Mini)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 23 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5 Mini)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `First Last`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 0 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 1 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 2 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `First Last`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Jane Doe`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 6
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 2
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (2) ❌
- (2x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 3 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 4 (GPT-5 Mini)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
GPT-5.2
— 68%
— 80%
— 100%
— 100%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 17
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 68% | 100% | 100% |
Samples: 5 | Passes: 4
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 80% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Sample 0 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: +1 555-123-4567 (include country code if possible)`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 2
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (2) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5.2)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5.2)
Instruction set: Control
PASS | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 2 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 3 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (GPT-5.2)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 0 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (GPT-5.2)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 3
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (3) ❌
- (3x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (GPT-5.2)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 0 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass
Sample 1 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (GPT-5.2)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
GPT-5.2 Codex
— 0%
— 40%
— 100%
— 40%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Samples: 5 | Passes: 5
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 100% | 100% | 100% |
Samples: 5 | Passes: 2
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 40% | 100% | 100% |
Sample 0 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: +1 (555) 123-4567`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 or 123-456-7890.`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Enter a valid email address (e.g., name@example.com).` Found `Preferred format: (555) 123-4567`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: (555) 123-4567`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: +1 (555) 123-4567`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Expected format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (GPT-5.2 Codex)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Format: name@example.com` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (GPT-5.2 Codex)
Instruction set: 1. Basic
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 0 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (GPT-5.2 Codex)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 0 | BP: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe Best Practice Issues (1) ⚠️
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
Sample 4 (GPT-5.2 Codex)
Instruction set: 0. Minimal
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 0 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 2 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 3 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 4 (GPT-5.2 Codex)
Instruction set: 2. Detailed Instructions
PASS | Latency 0.00s
Axe WCAG: 0
Assertions ✅
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ✅: Inputs use appropriate autocomplete for purpose (R): pass
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Grok 4 Fast Non-Reasoning
— 0%
— 0%
— 0%
— 0%
Aggregates are shown when filtering to a specific instruction set.
Samples: 25 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Samples: 5 | Passes: 0
| pass@1 | pass@5 | pass@10 |
|---|---|---|
| 0% | 0% | 0% |
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 5 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Enter a valid email address (e.g., user@example.com)` Found `Preferred format: (123) 456-7890 (optional)`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 6 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 7 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 8 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 9 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 10 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 11 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 12 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 13 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 14 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Enter a valid email address (e.g., user@example.com)` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 15 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 16 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 17 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 18 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 19 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 20 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 21 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 22 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 23 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 24 (Grok 4 Fast Non-Reasoning)
Instruction set: Control
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 6
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (6) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 1. Basic
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ❌: Required fields are indicated (visually and programmatically) (R): fail - Input is programmatically required but has no visual required indicator Input is programmatically required but has no visual required indicator
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., user@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 0. Minimal
FAIL | Latency 0.00s
Axe WCAG: 1 | BP: 5
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ❌: Helper text is programmatically associated (R): fail - Found `Please enter a valid email address (e.g., name@example.com).` Found `Preferred format: (123) 456-7890 (optional).`
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Axe Best Practice Issues (5) ⚠️
- landmark-one-main (moderate): Ensure the document has a main landmark (Best Practice - does not affect pass/fail)
- page-has-heading-one (moderate): Ensure that the page, or at least one of its frames contains a level-one heading (Best Practice - does not affect pass/fail)
- region (moderate): Ensure all page content is contained by landmarks (Best Practice - does not affect pass/fail)
Sample 0 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 0
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Sample 1 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 2 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 3 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds
Sample 4 (Grok 4 Fast Non-Reasoning)
Instruction set: 2. Detailed Instructions
FAIL | Latency 0.00s
Axe WCAG: 1
Assertions ❌
- ✅: Each text input has an accessible name (R): pass
- ✅: Visible label is included in accessible name (R): pass
- ✅: Each text input has textbox role (R): pass - Text input fields with textbox role found
- ✅: Helper text is programmatically associated (R): pass
- ✅: Text inputs are keyboard focusable (R): pass
- ✅: Visual labels are defined and persistent (R): pass
- ✅: Required fields are indicated (visually and programmatically) (R): pass
- ❌: Inputs use appropriate autocomplete for purpose (R): fail
- ✅: Placeholder text is programmatically defined as a property (R): pass - No placeholder text present on text inputs
Axe WCAG Failures (1) ❌
- (1x) - color-contrast (serious): Ensure the contrast between foreground and background colors meets WCAG 2 AA minimum contrast ratio thresholds