Guides
Cost Management
LLM costs can grow unexpectedly — a runaway loop, an accidental GPT-4o call where GPT-4o-mini would do, or a spike in traffic that nobody planned for. This guide walks through every tool Niitaka provides to understand, cap, and reduce your agent costs.
How costs are tracked
Every instrumented LLM call records its cost in USD based on the model's per-token pricing. Costs accumulate in session.total_cost across the session's lifetime. The Sessions dashboard stat tile and the Analytics Cost tab both draw from this field.
Setting cost guardrails
The fastest way to protect your budget is a two-tier cost_limit policy: a soft warn at a lower threshold and a hard abort at a higher one.
policies:
# Warn at $0.10 — emits a signal, session keeps running
- agent_id: report-summariser
type: cost_limit
priority: 5
condition:
cost_exceeded: 0.10
action:
type: warn
# Hard abort at $0.25 — stops the session immediately
- agent_id: report-summariser
type: cost_limit
priority: 10
condition:
cost_exceeded: 0.25
action:
type: abortThe warn tier emits a guardrail / cost_limit signal so you know the session is approaching its budget, without stopping it. The abort tier stops the session immediately and emits a second signal. Both events appear in the Signals feed and can be forwarded to Slack or email via Alert configurations.
Step limits as a cost proxy
Most runaway cost situations are not caused by expensive models — they are caused by agents stuck in infinite loops making the same cheap call hundreds of times. A step_limit policy catches this pattern without requiring you to know the exact per-call cost in advance.
policies:
# Abort after 40 steps — prevents infinite loops
- agent_id: loop-prone-agent
type: step_limit
priority: 10
condition:
steps_exceeded: 40
action:
type: abortstep_limit policy is the single most common root cause of runaway spend in production agents. Always configure one for any agent that uses tool calls or multi-step reasoning.Finding expensive agents and sessions
Cost by agent — Analytics
Open Analytics → Cost. The by-agent bar chart ranks every agent by total spend in the selected date range. The by-model chart shows which models are driving spend across all agents. Use these together to answer: which agent costs the most, and which model is it using?
Top sessions — outlier detection
The Top sessions by cost table in Analytics → Cost shows the ten most expensive individual sessions. The pattern to watch for: a session that costs 10× the median when the distribution is otherwise tight. That outlier almost always means one of:
- A loop that ran until timeout (check step count — if it's at your step limit, the policy saved you; if there is no step limit, add one).
- A very large document passed to the context window without chunking.
- Accidental use of a large model (GPT-4o instead of GPT-4o-mini) in a path that doesn't need it.
Reducing costs with model optimisation
Validate before switching
Switching to a cheaper model without measuring quality impact is a false economy — you save on LLM costs but may lose goal completion rate, which costs more in downstream effects. Use an experiment to validate the switch before committing.
Identify target agent
Highest spend in Analytics → Cost
Create experiment
baseline = current model, treatment = cheap model
Run to statistical power
200–500 sessions per variant
Check verdict
goal_completed rate must not significantly drop
Promote via Runtime Config
Dashboard or PATCH API — no deploy
See the Experiments documentation for how to set up variants and interpret the statistical results.
Roll out the cheaper model via Runtime Config
Once the experiment confirms the cheaper model meets your quality threshold, promote it via Runtime Config. The change takes effect on the next session with no deployment required.
# After a successful experiment confirms gpt-4o-mini matches quality:
import requests
requests.patch(
"https://api.niitaka.ai/agents/report-summariser/config",
json={"llm": {"model": "gpt-4o-mini"}},
headers={"Authorization": f"Bearer {JWT}"},
)
# Takes effect on the next session — no deployment needed.Fallback as a cost control mechanism
The fallback policy is primarily a reliability tool, but it also controls cost: if your primary model fails and retries are expensive, falling back to a cheap model on the second attempt avoids paying for multiple expensive calls.
policies:
# Retry once with the primary model, then fall back to the cheap model
- agent_id: report-summariser
type: retry
priority: 5
condition:
on_error: true
action:
max_retries: 1
backoff: constant
backoff_seconds: 1.0
- agent_id: report-summariser
type: fallback
priority: 3
condition:
on_error: true
action:
fallback_model: gpt-4o-mini # 20× cheaper than gpt-4o per tokenEstimating and planning your budget
Analytics gives you the data to forecast costs before you scale. Use the average cost per session and your expected traffic to derive a monthly budget figure, then set your abort threshold with headroom above the average.
# Estimate daily budget from Analytics data:
avg_cost_per_session = total_cost_last_7d / total_sessions_last_7d
daily_budget = avg_cost_per_session * expected_sessions_per_day
# Example:
# avg_cost = $0.032, expected sessions = 500/day
# daily_budget = $0.032 × 500 = $16/day → ~$500/month
# Set your abort threshold with headroom:
abort_at = avg_cost_per_session * 3 # 3× average catches outliers, not normal runsCost hygiene checklist
Next steps
- Policies — configure cost_limit and step_limit rules in the dashboard or YAML.
- Analytics — read the Cost tab and identify the agents to target first.
- Experiments — validate a cheaper model before rolling it out.
- Runtime Config — promote the cheaper model without a deployment.