Skip to content

Your first audit

This page walks you from nothing to a real audit result in under a minute.

1. Create a conversation file

Save this as example.json:

{
  "id": "first-audit",
  "platform": "manual",
  "reported_resolved": true,
  "messages": [
    {"role": "user", "content": "I need a refund for order #12345."},
    {"role": "bot", "content": "Please check our FAQ at example.com/faq for details on refunds."},
    {"role": "user", "content": "I already checked it. Can someone please process my refund?"},
    {"role": "bot", "content": "Please check our FAQ at example.com/faq for details on refunds."},
    {"role": "user", "content": "Can I speak to a real person?"},
    {"role": "bot", "content": "I can help with most order issues. What's your order number?"},
    {"role": "user", "content": "this is useless"},
    {"role": "bot", "content": "Please check our FAQ at example.com/faq for details on refunds."}
  ]
}

2. Run the audit

chatbot-audit analyze example.json

You'll see something like:

Analyzed 1 conversation(s).
Detected 3 failure(s).
  [MEDIUM  ] death_loop               conv=first-audit conf=0.65
      Bot gave 3 consecutive similar responses (pairwise similarity >= 0.85); customer expressed frustration ("useless").
  [MEDIUM  ] escalation_burial        conv=first-audit conf=0.70
      Customer requested a human agent 1 time(s) but the bot did not acknowledge or perform a transfer.
  [HIGH    ] silent_churn             conv=first-audit conf=0.85
      Conversation of 8 messages ended without a customer confirmation (no 'thanks', 'got it', or equivalent) — yet the platform reported the conversation as resolved.

Three of the seven failure modes triggered on this single conversation. This is typical of real chatbot conversations — failure modes co-occur.

3. Get JSON output for pipelines

chatbot-audit analyze example.json --json > detections.json

4. The same audit in Python

from chatbot_auditor import audit, JSONAdapter

adapter = JSONAdapter("example.json")
for detection in audit(adapter.fetch()):
    print(f"[{detection.severity.value}] {detection.detector}")
    print(f"  {detection.explanation}")
    print(f"  Confidence: {detection.confidence}")

What's happening

Behind the scenes, audit() loaded the default registry of detectors and ran each against your conversation:

  • DeathLoopDetector found the repeated FAQ response
  • EscalationBurialDetector found the unanswered "speak to a real person" request
  • SilentChurnDetector found a conversation that ended without confirmation

All three detectors use pure-stdlib detection — no LLM calls, no network, no API keys.

Next