Guides

Cost Management

LLM costs can grow unexpectedly — a runaway loop, an accidental GPT-4o call where GPT-4o-mini would do, or a spike in traffic that nobody planned for. This guide walks through every tool Niitaka provides to understand, cap, and reduce your agent costs.

How costs are tracked

Every instrumented LLM call records its cost in USD based on the model's per-token pricing. Costs accumulate in session.total_cost across the session's lifetime. The Sessions dashboard stat tile and the Analytics Cost tab both draw from this field.

Note:Cost data depends on your LLM provider reporting token counts. Niitaka uses Anthropic, OpenAI, Google, and Groq pricing tables to compute costs automatically. If you use a custom or self-hosted model, costs will show as $0.00 until pricing is configured.

Setting cost guardrails

The fastest way to protect your budget is a two-tier cost_limit policy: a soft warn at a lower threshold and a hard abort at a higher one.

yaml

policies:
  # Warn at $0.10 — emits a signal, session keeps running
  - agent_id: report-summariser
    type: cost_limit
    priority: 5
    condition:
      cost_exceeded: 0.10
    action:
      type: warn

  # Hard abort at $0.25 — stops the session immediately
  - agent_id: report-summariser
    type: cost_limit
    priority: 10
    condition:
      cost_exceeded: 0.25
    action:
      type: abort

The warn tier emits a guardrail / cost_limit signal so you know the session is approaching its budget, without stopping it. The abort tier stops the session immediately and emits a second signal. Both events appear in the Signals feed and can be forwarded to Slack or email via Alert configurations.

Tip:Set the warn threshold at ~40% of your abort threshold. This gives you time to react to a runaway session before it hits the hard limit.

Step limits as a cost proxy

Most runaway cost situations are not caused by expensive models — they are caused by agents stuck in infinite loops making the same cheap call hundreds of times. A step_limit policy catches this pattern without requiring you to know the exact per-call cost in advance.

yaml

policies:
  # Abort after 40 steps — prevents infinite loops
  - agent_id: loop-prone-agent
    type: step_limit
    priority: 10
    condition:
      steps_exceeded: 40
    action:
      type: abort

Warning:The absence of a step_limit policy is the single most common root cause of runaway spend in production agents. Always configure one for any agent that uses tool calls or multi-step reasoning.

Finding expensive agents and sessions

Cost by agent — Analytics

Open Analytics → Cost. The by-agent bar chart ranks every agent by total spend in the selected date range. The by-model chart shows which models are driving spend across all agents. Use these together to answer: which agent costs the most, and which model is it using?

Top sessions — outlier detection

The Top sessions by cost table in Analytics → Cost shows the ten most expensive individual sessions. The pattern to watch for: a session that costs 10× the median when the distribution is otherwise tight. That outlier almost always means one of:

A loop that ran until timeout (check step count — if it's at your step limit, the policy saved you; if there is no step limit, add one).
A very large document passed to the context window without chunking.
Accidental use of a large model (GPT-4o instead of GPT-4o-mini) in a path that doesn't need it.

Reducing costs with model optimisation

Validate before switching

Switching to a cheaper model without measuring quality impact is a false economy — you save on LLM costs but may lose goal completion rate, which costs more in downstream effects. Use an experiment to validate the switch before committing.

Identify target agent

Highest spend in Analytics → Cost

↓

Create experiment

baseline = current model, treatment = cheap model

↓

Run to statistical power

200–500 sessions per variant

↓

Check verdict

goal_completed rate must not significantly drop

↓

Promote via Runtime Config

Dashboard or PATCH API — no deploy

See the Experiments documentation for how to set up variants and interpret the statistical results.

Roll out the cheaper model via Runtime Config

Once the experiment confirms the cheaper model meets your quality threshold, promote it via Runtime Config. The change takes effect on the next session with no deployment required.

python

# After a successful experiment confirms gpt-4o-mini matches quality:

import requests

requests.patch(
    "https://api.niitaka.ai/agents/report-summariser/config",
    json={"llm": {"model": "gpt-4o-mini"}},
    headers={"Authorization": f"Bearer {JWT}"},
)
# Takes effect on the next session — no deployment needed.

Fallback as a cost control mechanism

The fallback policy is primarily a reliability tool, but it also controls cost: if your primary model fails and retries are expensive, falling back to a cheap model on the second attempt avoids paying for multiple expensive calls.

yaml

policies:
  # Retry once with the primary model, then fall back to the cheap model
  - agent_id: report-summariser
    type: retry
    priority: 5
    condition:
      on_error: true
    action:
      max_retries: 1
      backoff: constant
      backoff_seconds: 1.0

  - agent_id: report-summariser
    type: fallback
    priority: 3
    condition:
      on_error: true
    action:
      fallback_model: gpt-4o-mini   # 20× cheaper than gpt-4o per token

Estimating and planning your budget

Analytics gives you the data to forecast costs before you scale. Use the average cost per session and your expected traffic to derive a monthly budget figure, then set your abort threshold with headroom above the average.

python

# Estimate daily budget from Analytics data:
avg_cost_per_session = total_cost_last_7d / total_sessions_last_7d
daily_budget = avg_cost_per_session * expected_sessions_per_day

# Example:
# avg_cost = $0.032, expected sessions = 500/day
# daily_budget = $0.032 × 500 = $16/day → ~$500/month

# Set your abort threshold with headroom:
abort_at = avg_cost_per_session * 3   # 3× average catches outliers, not normal runs

Tip:Re-run this calculation after every significant change: new model, new system prompt, added tool calls. Each change can shift the average cost significantly.

Cost hygiene checklist

✓cost_limit policy set for every production agent

✓step_limit policy set for every agent using tool calls

✓Alert configured for guardrail signals (email or Slack)

✓Top sessions reviewed weekly for outliers

✓Model choice validated by experiment before rollout

✓Monthly budget estimate calculated from Analytics data

Next steps

Policies — configure cost_limit and step_limit rules in the dashboard or YAML.
Analytics — read the Cost tab and identify the agents to target first.
Experiments — validate a cheaper model before rolling it out.
Runtime Config — promote the cheaper model without a deployment.

Guardrails & Policies

Was this page helpful?

Need help? Contact Support Questions? Contact Sales LLM? Read llms.txt