Fig 6 — Ideation radar

4 axes · 1–5 rating · 4,220 ideas total (1,200 Baseline + 1,200 Shadow-Frog + 910 Human-rewritten) · switch judges to see how each rater scored

Baseline

Shadow-Frog

Human (rewritten)

Hover any axis vertex for the exact mean score.

Judge

Shadow-Frog − Baseline deltas

Rubric definitions

Groundedness — Does the proposal demonstrate project-specific knowledge (real APIs, modules, conventions) vs. plausible-sounding generalities?

Insight — How unlikely is this idea to emerge from a 5-minute brainstorm by a regular contributor?

User Impact — How many real users would benefit and how meaningfully?

Spec Clarity — Could a maintainer turn this into a PR scope without back-and-forth?

Per-evaluative-dimension means by arm