Small feature
~0.5M tokens total (specify through review). Around 4 tasks, ~200–500 LOC.
Use this page to project the cost of adopting DevSquad Copilot. It converts the framework's token consumption into a planning baseline you can map to your own models and plan.
The baseline is expressed in tokens, which stay stable over time, with a simple method to convert tokens into dollars or GitHub AI Credits using your model's current rates. Actual usage varies with codebase size, model choice, retry loops, and how much context lives outside the framework.
Small feature
~0.5M tokens total (specify through review). Around 4 tasks, ~200–500 LOC.
Medium feature
~4.6M tokens total. Around 18 tasks, ~1,500–3,000 LOC, new endpoint with integration.
Large feature / migration
~36M tokens total. Around 60 tasks, ~8,000+ LOC, cross-service and debug-heavy.
Typical squad
~51M tokens/month for 3 developers at a sustainable cadence (~17M per developer).
Framework overhead versus an unstructured Copilot session is ~1–2% on real work. The bill is dominated by the work being done (code reads, test loops, generation), not by the framework's prompts.
The baseline never goes stale because the volatile input (your model's rate) is supplied at estimate time.
Pick the token figure for your scope from the per-feature or monthly tables below.
Look up your model's input and output rates (per 1M tokens) on the live Models and pricing for GitHub Copilot page.
Apply the formula:
USD = (input_tokens_M * input_rate) + (output_tokens_M * output_rate)AI credits = USD / credit_valueWhere input_tokens_M / output_tokens_M are your scope's token counts in millions (e.g. 4.65M = 4.65), input_rate / output_rate are your model's price per 1M input/output tokens from step 2, and credit_value is the USD value of one AI credit (1 credit = $0.01 at the time of writing; confirm the current value on your billing page).
Output tokens are billed roughly 5x input on most models, so the input/output split matters. Per-feature splits:
| Feature | Input tokens | Output tokens | Total |
|---|---|---|---|
| Small | ~480K | ~83K | ~563K |
| Medium | ~3.97M | ~0.68M | ~4.65M |
| Large | ~30.8M | ~4.9M | ~35.7M |
Paid plans receive a 10% discount on model costs when using auto model selection. Cached input (the repeated prompt prefix on later turns) is billed up to 10x below the uncached input rate, and modern agentic harnesses keep cache hit rates high (around 94% on Anthropic models for agentic workloads), so the figures above are a conservative upper bound on input cost. Extended prompt caching (up to 24h retention on supported OpenAI models) keeps the cache warm across pauses.
| Feature | Calculation | USD | AI credits |
|---|---|---|---|
| Small | 0.48 × $3.00 + 0.083 × $15.00 | ~$2.69 | ~270 |
| Medium | 3.97 × $3.00 + 0.68 × $15.00 | ~$22.11 | ~2,210 |
| Large | 30.8 × $3.00 + 4.9 × $15.00 | ~$165.90 | ~16,600 |
The monthly squad volume scales the same way: apply the same two rates to its input/output split.
GitHub meters all of this in AI credits, and every paid plan includes a monthly credit allowance. To check whether your usage fits, compare your plan's allowance against the per-feature credit estimate above:
features per month before overage = monthly included AI credits / credits per featureRead your plan's current included allowance (and the per-token overage rate) from the GitHub Copilot billing page. Allowances, plan tiers, and the variable flex portion change over time, so the live billing page is the only reliable source. Example: if your plan includes 7,000 credits and a medium feature costs ~2,210 credits at your current rates, you can ship roughly three medium features per month before additional usage applies.
| Profile | Stories | Tasks per story | Total tasks | Code change |
|---|---|---|---|---|
| Small feature | 2 | 2 | 4 | ~200–500 LOC, well-scoped CRUD |
| Medium feature | 6 | 3 | 18 | ~1,500–3,000 LOC, new endpoint + integration |
| Large feature / migration | 18 | 3–4 | 60 | ~8,000+ LOC, cross-service, debug-heavy |
Token counts include all framework overhead, artifact reads, tool outputs, sub-agent calls, and produced artifacts.
| Phase | Small (in/out) | Medium (in/out) | Large (in/out) |
|---|---|---|---|
| envision (one-time per product) | 15K / 3K | 15K / 3K | 15K / 3K |
| kickoff (one-time per product) | 20K / 4K | 25K / 5K | 40K / 8K |
| specify (per feature) | 30K / 5K | 60K / 10K | 120K / 20K |
| plan (per feature) | 50K / 8K | 120K / 15K | 250K / 30K |
| decompose (per feature) | 30K / 5K | 70K / 10K | 150K / 25K |
| sprint (per sprint, amortized) | 25K / 4K | 25K / 4K | 25K / 4K |
| implement (per task) | 80K / 15K | 200K / 35K | 500K / 80K |
| review (per feature) | 50K / 5K | 120K / 10K | 300K / 20K |
| refine (per run, weekly) | 50K / 5K | 50K / 5K | 50K / 5K |
| security (when triggered) | 30K / 5K | 60K / 8K | 100K / 12K |
Planning plus implement plus review only.
| Profile | specify | plan | decompose | implement (sum) | review | Total |
|---|---|---|---|---|---|---|
| Small (4 tasks) | 35K | 58K | 35K | 380K | 55K | ~563K |
| Medium (18 tasks) | 70K | 135K | 80K | 4.23M | 130K | ~4.65M |
| Large (60 tasks) | 140K | 280K | 175K | 34.8M | 320K | ~35.7M |
The implementation phase consumes 70–98% of the feature budget. Planning phases combined are typically 5–25% of total spend.
Assumptions: one squad of 3 developers, 1-week sprints, mixed feature sizes.
| Mix (per month) | Volume | Tokens |
|---|---|---|
| 2 small features | 2 × 563K | ~1.13M |
| 3 medium features | 3 × 4.65M | ~13.95M |
| 1 large feature | 1 × 35.7M | ~35.7M |
| 4 sprints | 4 × 29K | ~116K |
| 4 refine runs (weekly) | 4 × 55K | ~220K |
| Security reviews (2 triggered) | 2 × 68K | ~136K |
| Envision + kickoff (one-time amortized) | — | ~48K |
| Monthly total per squad | ~51.3M | |
| Per developer (3 devs) | ~17.1M |
Framework prompts are a small fraction of real cost. The dominant terms are:
What DevSquad Copilot adds on top of an unstructured Copilot session producing similar code:
| Source of overhead | Tokens per medium feature |
|---|---|
| Coordinator agent prompts (loaded per phase) | ~25K (mostly cached after first turn) |
| Sub-agent prompts (isolated contexts) | ~30K |
| Skill auto-triggers | ~15K |
| Artifact re-reading between phases | ~40K |
| Quality gates, handoff envelopes, reasoning logs | ~10K |
| Total | ~120K |
That is ~1.5% of a medium feature's total spend and ~0.2% of a large one.
Listed in order of leverage:
model field with open-source tooling such as agext-cli, which layers repo-local overrides on top of the installed plugin without modifying the originals. As a rule, assign a lightweight model family (Haiku-class or mini/flash-class) to routine agents (validate, verify, finalize, decompose) and reserve a frontier family (Sonnet-, Opus-, or GPT-5-class) for plan, implement.execute, and review.code.Pick one medium feature.
Run the full sequence end-to-end: /devsquad.specify, then /devsquad.plan, then /devsquad.decompose, then /devsquad.implement, then /devsquad.review.
Download the usage report CSV from the premium request analytics page. Each row includes aic_quantity and aic_gross_amount.
Filter rows by the session window and compare the AI-credit total against the medium-feature estimate above. Because your account converts tokens to credits at whatever rates are current, this validates the baseline without hardcoding any price.
If actuals deviate by more than 2x, the cause is almost always model choice mismatch or retry loops on a small number of tasks.
Re-run this validation whenever you change your default model or after major harness updates, since both shift the token baseline.