Experimentation
Setting up variants
A variant is a named configuration bundle — model, system prompt, guardrail thresholds — that defines one arm of your experiment. Every session is assigned to exactly one variant.
What a variant contains
Each variant stores an optional config object with three top-level sections:
variant name to start_session() and handling the model selection in your own code — Niitaka will still group and compare the sessions statistically.Creating variants in the dashboard
- 1
Open the experiment
Go to Experiments and open a Draft experiment. If you haven't created one yet, click New experiment and link it to your agent.
- 2
Add a variant
Click Add variant. Give it a short, descriptive name that matches what you'll pass to
start_session(variant=...)— for examplegpt-4oorgemini-flash. Names are case-sensitive. - 3
Configure the variant
Fill in the LLM and guardrail fields you want to test. Fields left blank inherit your agent's default policy configuration.
- 4
Repeat for all arms
Add one variant per arm. A two-arm experiment (one baseline + one treatment) is the simplest and reaches significance fastest. You can add up to 8 variants total.
- 5
Set the baseline
One variant must be marked as the baseline (control arm). All statistical comparisons are made relative to it. Typically this is your current production config.
Traffic weights
Each variant has a traffic_weight. Sessions are distributed proportionally — a weight of 1.0 on each of three variants gives a 33/33/33 split. You can skew the split if you want to limit exposure to an untested variant:
Equal split (recommended)
Skewed split (limit exposure)
- Weights are relative, not percentages.
0.5 / 0.5and1.0 / 1.0produce the same 50/50 split. - For multi-variant experiments, equal weights maximise statistical power for a fixed total session count.
- You can update weights while an experiment is Running — note this may cause a Sample Ratio Mismatch (SRM) if the shift is large.
Assigning sessions to variants
Pass experiment_id and variant to start_session(). Both are required for the session to be counted in the experiment.
with niitaka.start_session(
goal="Summarise quarterly report",
agent_id="report-summariser",
experiment_id="exp_a1b2c3d4e5f6", # copy from Experiments dashboard
variant="gpt-4o", # must match the variant name you created exactly
) as session:
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "..."}],
)variant value must match the variant name in the dashboard exactly (case-sensitive). A typo silently attaches the session to the experiment but with no matching variant — it will not appear in any arm's count.Using Runtime Config with variants
If you filled in the variant's LLM and guardrail fields, the SDK can automatically fetch that config at session start. This lets you write one agent that adapts to whichever variant it's assigned to, without if/else branches.
with niitaka.start_session(
goal="Summarise quarterly report",
agent_id="report-summariser",
experiment_id="exp_a1b2c3d4e5f6",
variant="gpt-4o",
) as session:
# SDK auto-fetches this variant's config at session start
config = niitaka.get_runtime_config(agent_id="report-summariser")
model = config["llm"]["model"] # e.g. "gpt-4o"
system_prompt = config["llm"]["system_prompt"]
cost_limit = config["guardrails"]["cost_limit_usd"]
response = openai_client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "..."},
],
)Naming conventions
- Use the model name for model comparison experiments:
gpt-4o,gemini-2.5-flash. - Use descriptive slugs for prompt or guardrail experiments:
v1-concise-prompt,high-cost-limit. - Avoid spaces and special characters — lowercase letters, numbers, and hyphens only.