Your first audit¶
This page walks you from nothing to a real audit result in under a minute.
1. Create a conversation file¶
Save this as example.json:
{
"id": "first-audit",
"platform": "manual",
"reported_resolved": true,
"messages": [
{"role": "user", "content": "I need a refund for order #12345."},
{"role": "bot", "content": "Please check our FAQ at example.com/faq for details on refunds."},
{"role": "user", "content": "I already checked it. Can someone please process my refund?"},
{"role": "bot", "content": "Please check our FAQ at example.com/faq for details on refunds."},
{"role": "user", "content": "Can I speak to a real person?"},
{"role": "bot", "content": "I can help with most order issues. What's your order number?"},
{"role": "user", "content": "this is useless"},
{"role": "bot", "content": "Please check our FAQ at example.com/faq for details on refunds."}
]
}
2. Run the audit¶
You'll see something like:
Analyzed 1 conversation(s).
Detected 3 failure(s).
[MEDIUM ] death_loop conv=first-audit conf=0.65
Bot gave 3 consecutive similar responses (pairwise similarity >= 0.85); customer expressed frustration ("useless").
[MEDIUM ] escalation_burial conv=first-audit conf=0.70
Customer requested a human agent 1 time(s) but the bot did not acknowledge or perform a transfer.
[HIGH ] silent_churn conv=first-audit conf=0.85
Conversation of 8 messages ended without a customer confirmation (no 'thanks', 'got it', or equivalent) — yet the platform reported the conversation as resolved.
Three of the seven failure modes triggered on this single conversation. This is typical of real chatbot conversations — failure modes co-occur.
3. Get JSON output for pipelines¶
4. The same audit in Python¶
from chatbot_auditor import audit, JSONAdapter
adapter = JSONAdapter("example.json")
for detection in audit(adapter.fetch()):
print(f"[{detection.severity.value}] {detection.detector}")
print(f" {detection.explanation}")
print(f" Confidence: {detection.confidence}")
What's happening¶
Behind the scenes, audit() loaded the default registry of detectors and ran each against your conversation:
DeathLoopDetectorfound the repeated FAQ responseEscalationBurialDetectorfound the unanswered "speak to a real person" requestSilentChurnDetectorfound a conversation that ended without confirmation
All three detectors use pure-stdlib detection — no LLM calls, no network, no API keys.
Next¶
- Learn the CLI — all commands and flags
- The 7 failure modes — what each detector catches
- Audit Intercom data — go from example files to real conversations