API reference

The curated public surface. Everything here is importable from traxr unless noted.

Experiments

traxr.experiment.Experiment

A controlled-perturbation experiment over one agent and its data.

Exactly one of agent (a stateless :data:~traxr.agents.AgentRunner callable, reused across runs), agent_factory (zero-arg factory called once per run, the fresh-state path), or llm (the built-in reference agent over your :class:~traxr.llm.LLMClient) must be given.

run

run(dry_run=False)

Run the experiment (or, with dry_run=True, just plan it).

traxr.experiment.ExperimentConfig `dataclass`

Knobs for :class:Experiment (frozen; the controlled-variable invariant).

Attributes:

Name	Type	Description
`perturbations`	`str \| Sequence[PerturbationType]`	`"all"` or an explicit operator list.
`max_steps`	`/ max_tokens / enable_web_tools / enable_python_tool`	Built-in-agent knobs (ignored for external agents).
`max_llm_calls_per_run`	`int \| None`	External-agent budget, enforced inside the Tier 0 wrapper; the only honest runaway bound for code we don't own.
`store_llm_content`	`bool`	Include raw LLM/tool content in trace payloads (hashes only by default; final answers are always stored raw).
`require_sequential`	`bool`	Raise instead of warn when concurrent LLM calls are detected during a run.
`scorer`	`Scorer`	`(expected, actual) -> bool` for `task_success`.
`on_run_error`	`str`	`"record"` keeps a CRASHED run record and continues; `"raise"` propagates the agent's exception.
`keep_artifacts`	`bool`	Keep the per-run temp dirs (perturbed file copies).
`noise_floor_runs`	`int \| None`	Clean re-runs measuring the nondeterminism floor. `None` means the agent-kind default: 1 for external agents, 0 for the built-in agent.
`max_permutations`	`int`	Matrix size cap (:class:`MatrixTooLargeError`).

traxr.experiment.ExperimentPlan `dataclass`

The execution plan run(dry_run=True) returns (no agent ran).

Results

traxr.results.ExperimentResults `dataclass`

Everything one Experiment.run() produced.

Attributes:

Name	Type	Description
`pairs`	`list[PairResult]`	One :class:`PairResult` per permutation.
`traces`	`dict[str, dict[str, Any]]`	`run_label -> serialized trace` (collector `to_dict()`).
`answers`	`dict[str, str \| None]`	`run_label -> raw final answer` (stored raw by design, since scoring and `answer_changed` need them; see the security docs).
`fingerprint`	`dict[str, Any]`	Environment/config fingerprint for reproducibility.
`noise_floor`	`float \| None`	Baseline-vs-itself `d_norm` (None when unmeasured).
`noise_floor_runs`	`int`	How many clean re-runs measured the floor.

manifestation_prevalence

manifestation_prevalence()

Fraction of scored pairs per fine manifestation category.

divergence_summary

divergence_summary()

Count / mean / max of d_norm and mean t*_norm over measured pairs.

recovery_rate

recovery_rate()

Fraction of diverged pairs whose answer survived (recovery=True).

token_overhead_summary

token_overhead_summary()

Mean / max token-inflation ratio over pairs with usage data.

to_dict

to_dict(*, include_traces=True)

Canonical dict form (timestamps excluded; see :meth:to_json).

to_json

to_json(path=None, *, include_traces=True)

Canonical JSON: sorted keys, timestamps excluded, byte-stable for deterministic (stub / mock-transport) experiments.

Parameters:

Name	Type	Description	Default
`path`	`Any`	Optional file path to also write the JSON to.	`None`
`include_traces`	`bool`	Include the full serialized traces.	`True`

to_dataframe

to_dataframe()

The pairs as a pandas DataFrame (needs the [pandas] extra).

to_report

to_report(fmt='md')

Human-readable report ("md" or "html").

"md" is a plain markdown document (good for terminals and PRs). "html" is a single self-contained file (inline styles, no scripts, no external assets) that embeds the :mod:traxr.viz figures when matplotlib is installed and degrades gracefully when it is not.

summary

summary()

Compact printable summary.

Reads top-down as a diagnosis: how many pairs ran, how perturbations manifested, how far the traces diverged (against the noise floor), whether the answer survived, and what it cost.

traxr.results.PairResult `dataclass`

Metrics for one clean-vs-perturbed pair.

scored `property`

scored

Whether this pair produced metrics (the perturbed run happened).

Capture

traxr.capture.openai_wrap.instrument

instrument(client)

Capture LLM calls made through client during Traxr runs.

Wraps client.chat.completions.create in place (sync OpenAI or AsyncOpenAI, including streaming) and returns the same client. Construct your agent with the instrumented client; outside a Traxr run the wrapper is a pure passthrough, so the agent keeps working standalone.

Idempotent: instrumenting an already-instrumented client is a no-op.

Raises:

Type	Description
`TypeError`	If `client` has no `chat.completions.create`.

traxr.capture.patch.patch_openai

patch_openai()

Capture LLM calls from every OpenAI client created inside the context.

Raises:

Type	Description
`OptionalDependencyError`	If the `openai` package is not installed.

traxr.capture.context.emit

emit(event_type, payload=None, *, agent_name='user')

Manually emit a trace event from inside your agent (escape hatch).

The event lands at the current step of the active run. Unregistered event types fall back to the unknown:{event_type} signature (with a one-time :class:~traxr.errors.UnknownEventTypeWarning); upgrade them via :func:traxr.register_signature.

Outside a Traxr run this is a no-op (same passthrough principle as instrument()), so agents that call traxr.emit() keep working standalone.

The agent contract

traxr.agents.task.Task `dataclass`

One run's input, as handed to an :data:AgentRunner.

Attributes:

Name	Type	Description
`question`	`str`	The task question.
`files`	`tuple[Path, ...]`	Input artifact paths. Clean runs receive the originals; perturbed runs receive copies with the ORIGINAL basenames in a fresh temp dir, so file names never leak the condition.
`run_label`	`str`	`"baseline"` or the perturbation name (informational).
`metadata`	`Mapping[str, Any]`	Extra experiment context (never condition-revealing).

traxr.agents.task.invoke_agent

invoke_agent(
    runner,
    task,
    collector,
    *,
    max_llm_calls_per_run=None,
    store_llm_content=False,
    session=None,
)

Run runner on task with Tier 0 capture bound to collector.

The harness, not the agent, emits the final_answer event from the return value. An agent exception is recorded as an agent_error event (the partial trace stays analyzable) and re-raised; the record-vs-raise policy is the experiment runner's concern.

Parameters:

Name	Type	Description	Default
`runner`	`AgentRunner`	The agent callable.	required
`task`	`Task`	This run's input.	required
`collector`	`TraceCollector`	The run's trace collector.	required
`max_llm_calls_per_run`	`int \| None`	LLM-call budget enforced inside the Tier 0 wrapper (the only honest runaway bound for code we don't own).	`None`
`store_llm_content`	`bool`	Include raw LLM/tool content in event payloads instead of hashes only.	`False`
`session`	`CaptureSession \| None`	Pre-built capture session to bind instead of constructing one (its collector must be `collector`). Lets the caller read per-run session state afterwards (e.g. `concurrent_detected`); the budget/content keywords are ignored when given.	`None`

Returns:

Type	Description
`str`	The agent's final answer.

Raises:

Type	Description
`AgentContractError`	If the agent returns a non-`str` value.

traxr.agents.langgraph.from_langgraph

from_langgraph(
    compiled_graph,
    input_builder=None,
    output_extractor=None,
)

Wrap a compiled LangGraph as an AgentRunner with Tier 1 capture.

Parameters:

Name	Type	Description	Default
`compiled_graph`	`Any`	The compiled graph (anything with `invoke(input, config=...)`).	required
`input_builder`	`Callable[[Task], Any] \| None`	`(Task) -> graph input`. Default builds a messages-state input from the question + file-path listing.	`None`
`output_extractor`	`Callable[[Any], str] \| None`	`(graph output) -> str`. Default takes the last message's content from a messages-state result.	`None`

Raises:

Type	Description
`OptionalDependencyError`	If langchain-core is not installed.

traxr.agents.builtin.builtin_agent

builtin_agent(
    llm,
    *,
    enable_web_tools=False,
    enable_python_tool=True,
    max_steps=12,
    max_tokens=100000,
    python_tool_timeout=30.0,
    seed=42,
)

Build a factory of :class:BuiltinAgent instances over the reference MAS.

The returned zero-argument factory is called once per run (clean baseline or perturbation), so every run gets fresh router/plan state while sharing the same LLM client and configuration.

Parameters:

Name	Type	Description	Default
`llm`	`LLMClient`	Any :class:`~traxr.llm.LLMClient` (OpenAICompatibleClient, DeterministicLLMStub, or your own implementation).	required
`enable_web_tools`	`bool`	Register web_search/web_fetch tools. Default OFF; opt-in only, keeps runs offline and deterministic.	`False`
`enable_python_tool`	`bool`	Register the (subprocess-sandboxed) python tool.	`True`
`max_steps`	`int`	Episode step budget.	`12`
`max_tokens`	`int \| None`	Episode token budget (`None` disables the check).	`100000`
`python_tool_timeout`	`float`	Seconds before LLM-written code is killed.	`30.0`
`seed`	`int`	Seed recorded in the episode spec and used by retrieval.	`42`

Returns:

Type	Description
`Callable[[], BuiltinAgent]`	A zero-argument callable producing fresh :class:`BuiltinAgent`
`Callable[[], BuiltinAgent]`	instances.

Raises:

Type	Description
`OptionalDependencyError`	If the reference agent's dependencies (pandas, ...) are not installed.

Example::

from traxr.llm import DeterministicLLMStub
from traxr.agents import builtin_agent

factory = builtin_agent(llm=DeterministicLLMStub("identity"))
agent = factory()
answer = agent.run(["data.csv"], "How many rows does the table have?")

LLM clients

traxr.llm.protocol.LLMClient

Bases: Protocol

Protocol for LLM clients usable with :func:traxr.agents.builtin_agent.

response_type is a routing hint ("plan", "route", "data_analyst", "synthesize", ...) used to select system prompts or, for the deterministic stub, scripted replies.

generate

generate(prompt, response_type='default', context=None)

Generate a plain-text response.

generate_with_tools

generate_with_tools(
    prompt,
    tools,
    response_type="default",
    context=None,
    system_prompt_override=None,
)

Generate a response that may include structured tool calls.

tools is a list of tool schemas (ToolSchema for the built-in agent). Returned tool_calls items expose tool_name, operation, arguments and call_id.

traxr.llm.openai_compat.OpenAICompatibleClient

:class:~traxr.llm.LLMClient for any OpenAI-compatible endpoint.

Parameters:

Name	Type	Description	Default
`model`	`str`	Model name as known to the endpoint.	`'gpt-4o-mini'`
`api_key`	`str \| None`	API key; falls back to the `OPENAI_API_KEY` environment variable. Local servers (Ollama, LM Studio, ...) usually accept any non-empty string.	`None`
`base_url`	`str \| None`	Endpoint base URL (`None` = api.openai.com).	`None`
`seed`	`int`	Sampling seed forwarded to the endpoint (reproducibility).	`42`
`temperature`	`float \| None`	Sampling temperature; defaults to `0.0` for the controlled-variable invariant. `None` omits the parameter.	`0.0`
`max_retries`	`int`	How many times the OpenAI SDK retries a transient failure (network error, `429`, `5xx`) before raising. Forwarded to `openai.OpenAI(max_retries=...)`; defaults to `2` (the SDK default). Set `0` to disable automatic retries.	`2`

Raises:

Type	Description
`OptionalDependencyError`	If the `openai` package is not installed.
`LLMConnectionError`	If no API key is available.

call_count `property`

call_count

Number of LLM calls made by this client instance.

generate

generate(prompt, response_type='default', context=None)

Generate a plain-text response.

generate_with_tools

generate_with_tools(
    prompt,
    tools,
    response_type="default",
    context=None,
    system_prompt_override=None,
)

Generate a response with native OpenAI function calling.

generate_with_retrieval

generate_with_retrieval(
    prompt, retrieval_content, response_type="research"
)

Generate a response that incorporates retrieval content.

generate_with_image

generate_with_image(
    prompt,
    image_base64,
    media_type="image/png",
    response_type="research",
)

Generate a response that analyzes an image (vision-capable models).

Retained from the extracted client for interface fidelity; the built-in agent's image path is inert in v1.

reset_call_count

reset_call_count()

Reset the call counter (determinism between paired runs).

close

close()

Close the underlying HTTP client to release connections.

traxr.llm.stub.DeterministicLLMStub

Scripted :class:~traxr.llm.LLMClient for tests, goldens, and demos.

Parameters:

Name	Type	Description	Default
`scenario`	`str`	One of :data:`SCENARIOS` (ignored when `script` is given).	`'identity'`
`final_answer`	`str`	The answer the `identity`/`reroute`/`loop` synthesizer reply returns. Match it to the fixture data (e.g. the row count printed by the scripted analyst code).	`'4'`
`wrong_answer`	`str`	The answer the `wrong_answer` scenario returns.	`'999'`
`script`	`dict[str, list[StubReply]] \| None`	Custom mapping `response_type -> list[StubReply]`; overrides the scenario entirely.	`None`

call_count `property`

call_count

Number of LLM calls made since the last reset.

generate

generate(prompt, response_type='default', context=None)

Return the next scripted plain-text reply.

generate_with_tools

generate_with_tools(
    prompt,
    tools,
    response_type="default",
    context=None,
    system_prompt_override=None,
)

Return the next scripted reply, including any scripted tool calls.

generate_with_retrieval

generate_with_retrieval(
    prompt, retrieval_content, response_type="research"
)

Retrieval-augmented text generation (same scripted replies).

reset_call_count

reset_call_count()

Reset all call counters (determinism between paired runs).

Scoring and plots

traxr.scoring.check_answer_match

check_answer_match(expected, actual)

Whether actual matches expected after normalization.

Numeric answers compare as floats (so "42" matches "42.0"); everything else is normalized string equality.

check_answer_match("Paris", " PARIS. ") True check_answer_match("1,000", "$1000") True check_answer_match("42", "43") False

traxr.scoring.llm_judge_match

llm_judge_match(expected, actual, llm=None)

Semantic match via LLM judge. Opt-in only, usable directly as ExperimentConfig(scorer=llm_judge_match).

Unlike :func:check_answer_match, this is not deterministic: results can vary between runs and providers. Use it when literal/numeric matching is too strict for your question's expected phrasing.

If llm is omitted, lazily constructs and caches a default OpenAICompatibleClient; this reads OPENAI_API_KEY from the environment and makes a live network call per scored answer. Pass your own llm (e.g. via functools.partial) to use another provider or avoid the implicit network/API-key dependency.

traxr.viz.plot_d_norm

plot_d_norm(results, ax=None)

Per-pair normalized edit distance, with the noise floor when measured.

traxr.viz.plot_t_star

plot_t_star(results, ax=None, bins=10)

Histogram of normalized first-divergence positions (t*/T).

traxr.viz.plot_manifestations

plot_manifestations(results, ax=None)

Manifestation-category prevalence over scored pairs.

Extending the event vocabulary

traxr.trace.registry.register_signature

register_signature(
    event_type,
    signature,
    *,
    classifier=None,
    structural_types=frozenset(),
    key_fields_equal=None,
)

Upgrade a custom event type from the unknown:{event_type} fallback.

Parameters:

Name	Type	Description	Default
`event_type`	`str`	The custom event type emitted via `traxr.emit()`.	required
`signature`	`SignatureFn`	`payload -> str` structural signature builder.	required
`classifier`	`ClassifierFn \| None`	Optional `(clean_payload, perturbed_payload) -> str \| None` returning a structural divergence type (or `None` for lexical-only differences).	`None`
`structural_types`	`frozenset[str] \| set[str]`	The divergence-type strings the classifier can return (so `t*` typing recognizes them as structural).	`frozenset()`
`key_fields_equal`	`KeyFieldsFn \| None`	Optional semantic-equality predicate on payloads.	`None`

Errors and warnings

traxr.errors

Typed exception and warning hierarchy for Traxr.

Principle: fail loud with actionable messages; never silently produce wrong metrics. User-configuration errors fail fast; data-dependent failures are skipped and recorded (as warnings collected per pair).

All exceptions derive from :class:TraxrError; all warnings derive from :class:TraxrWarning.

TraxrError

Bases: Exception

Base class for all Traxr exceptions.

UnsupportedModalityError

Bases: TraxrError

Unknown extension or unsupported modality.

Raised for docx/pptx/image/audio inputs in v1. The message names what IS supported (CSV/XLSX/TXT/MD/PDF) and points at the roadmap for the rest.

ModalityMismatchError

Bases: TraxrError

Declared modality does not match the detected modality of the file.

InvalidArtifactError

Bases: TraxrError

Input file is missing, unreadable, or corrupt.

OptionalDependencyError

Bases: TraxrError

A required optional dependency is not installed.

Raised when openai/PyMuPDF/pdfplumber/openpyxl/matplotlib/langchain-core is needed but missing. The message names the pip extra that provides it (e.g. pip install "traxr[document]").

LLMConnectionError

Bases: TraxrError

Missing/invalid API key or unreachable base_url (built-in agent path).

AgentContractError

Bases: TraxrError

The user's agent violated the AgentRunner contract.

For example, it returned a non-str value.

RunBudgetExceeded

Bases: TraxrError

The agent exceeded max_llm_calls_per_run.

Raised inside the agent by the Tier 0 capture wrapper.

ExperimentConfigError

Bases: TraxrError

Invalid experiment configuration.

Raised when both or neither of agent/agent_factory/llm are resolvable, or other fail-fast configuration problems.

MatrixTooLargeError

Bases: TraxrError

The permutation matrix exceeds the configured cap.

MalformedEventError

Bases: TraxrError

Malformed or missing event payload, or empty/non-string event_type.

ControlledVariableError

Bases: TraxrError

Experiment configuration was mutated between paired runs.

TraxrWarning

Bases: UserWarning

Base class for all Traxr warnings (non-fatal, collected per pair).

UnknownEventTypeWarning

Bases: TraxrWarning

An unregistered event type fell back to the unknown:{event_type} signature.

Emitted once per unknown type; upgrade the type via traxr.register_signature() to include its structure in divergence metrics.

EmptyTraceWarning

Bases: TraxrWarning

A run produced no trace events.

NonDeterminismWarning

Bases: TraxrWarning

Paired runs differ beyond the measured noise floor's expectation.

ConcurrentTraceWarning

Bases: TraxrWarning

Concurrent LLM calls detected while tracing a single run.

TokenUnavailableWarning

Bases: TraxrWarning

Token usage could not be captured for one or more LLM calls.

PerturbationSkippedWarning

Bases: TraxrWarning

A perturbation was not applicable and was skipped (recorded).

Corresponds to the engine reporting applied=False with a skip_reason.

API reference

Experiments

traxr.experiment.Experiment

run

traxr.experiment.ExperimentConfig dataclass

traxr.experiment.ExperimentPlan dataclass

Results

traxr.results.ExperimentResults dataclass

manifestation_prevalence

divergence_summary

recovery_rate

token_overhead_summary

to_dict

to_json

to_dataframe

to_report

summary

traxr.results.PairResult dataclass

scored property

Capture

traxr.capture.openai_wrap.instrument

traxr.capture.patch.patch_openai

traxr.capture.context.emit

The agent contract

traxr.agents.task.Task dataclass

traxr.agents.task.invoke_agent

traxr.agents.langgraph.from_langgraph

traxr.agents.builtin.builtin_agent

LLM clients

traxr.llm.protocol.LLMClient

generate

generate_with_tools

traxr.llm.openai_compat.OpenAICompatibleClient

call_count property

generate

generate_with_tools

generate_with_retrieval

generate_with_image

reset_call_count

close

traxr.llm.stub.DeterministicLLMStub

call_count property

generate

generate_with_tools

generate_with_retrieval

reset_call_count

Scoring and plots

traxr.scoring.check_answer_match

traxr.scoring.llm_judge_match

traxr.viz.plot_d_norm

traxr.viz.plot_t_star

traxr.viz.plot_manifestations

Extending the event vocabulary

traxr.trace.registry.register_signature

Errors and warnings

traxr.errors

TraxrError

UnsupportedModalityError

ModalityMismatchError

InvalidArtifactError

OptionalDependencyError

LLMConnectionError

AgentContractError

RunBudgetExceeded

ExperimentConfigError

MatrixTooLargeError

MalformedEventError

ControlledVariableError

TraxrWarning

UnknownEventTypeWarning

EmptyTraceWarning

NonDeterminismWarning

ConcurrentTraceWarning

TokenUnavailableWarning

PerturbationSkippedWarning

traxr.experiment.ExperimentConfig `dataclass`

traxr.experiment.ExperimentPlan `dataclass`

traxr.results.ExperimentResults `dataclass`

traxr.results.PairResult `dataclass`

scored `property`

traxr.agents.task.Task `dataclass`

call_count `property`

call_count `property`