Audit Intercom data¶

This tutorial walks through connecting to a real Intercom workspace, pulling conversations, and producing an audit report.

Prerequisites¶

Intercom workspace with API access
An Intercom Access Token with Read conversations permission (how to create one)
chatbot-auditor[intercom] installed

pip install "chatbot-auditor[intercom]"

Step 1: Authenticate¶

Put your token in an environment variable:

export INTERCOM_ACCESS_TOKEN="dG9rZW4uZXhhbXBsZS4uLg=="

Never commit the token to source control. Use your CI secrets manager for scheduled runs.

Step 2: Run a small test¶

Start with a low limit so you can see the output format before pulling everything:

chatbot-audit analyze-intercom --limit 10

You should see output like:

Analyzed 10 conversation(s).
Detected 14 failure(s).
  [MEDIUM  ] death_loop               conv=42301 conf=0.65
      Bot gave 3 consecutive similar responses ...
  [HIGH    ] silent_churn             conv=42305 conf=0.85
      Conversation of 4 messages ended without a customer confirmation ...
  ...

Step 3: Audit a meaningful sample¶

A good first real audit is 200–500 recent conversations. Pipe to JSON so you can slice and dice:

chatbot-audit analyze-intercom --limit 500 --json > audit.json

Now analyze with jq:

# How many of each failure mode?
jq '[.[].detector] | group_by(.) | map({detector: .[0], count: length})' audit.json

# Top 10 most-flagged conversations
jq '[.[]] | group_by(.conversation_id)
   | map({id: .[0].conversation_id, count: length})
   | sort_by(-.count) | .[:10]' audit.json

# Only critical detections
jq '[.[] | select(.severity == "critical")]' audit.json

Step 4: Use the Python API for control¶

For anything beyond quick CLI exploration, use the Python API. You get streaming, custom configurations, and access to evidence.

from chatbot_auditor import audit
from chatbot_auditor.adapters.intercom import IntercomAdapter

adapter = IntercomAdapter(max_conversations=500)

total_conversations = 0
failures_by_mode: dict[str, int] = {}

for conv in adapter.fetch():
    total_conversations += 1
    for d in audit([conv]):
        failures_by_mode[d.detector] = failures_by_mode.get(d.detector, 0) + 1

print(f"Audited {total_conversations} conversations")
print("Failures by mode:")
for mode, count in sorted(failures_by_mode.items(), key=lambda x: -x[1]):
    rate = count / total_conversations * 100
    print(f"  {mode:<24} {count:>4}  ({rate:.1f}%)")

Example output:

Audited 500 conversations
Failures by mode:
  silent_churn              148  (29.6%)
  death_loop                 92  (18.4%)
  sentiment_collapse         47  (9.4%)
  escalation_burial          31  (6.2%)
  brand_damage                2  (0.4%)

Step 5: Schedule it¶

Put the audit on a cadence. One approach with GitHub Actions:

# .github/workflows/weekly-audit.yml
on:
  schedule:
    - cron: "0 8 * * MON"

jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v3
      - run: uv run --with "chatbot-auditor[intercom]" \
          chatbot-audit analyze-intercom --limit 1000 --json > audit.json
        env:
          INTERCOM_ACCESS_TOKEN: ${{ secrets.INTERCOM_ACCESS_TOKEN }}
      - uses: actions/upload-artifact@v4
        with:
          name: weekly-audit
          path: audit.json

Known limitations¶

IntercomAdapter uses the list endpoint (GET /conversations), which returns conversations with inlined parts but may truncate very long threads. To get full transcripts, fetch each conversation by id using the per-conversation endpoint and reparse.
HTML bodies are stripped to plain text using a regex. If you need rich formatting preserved, inject a custom parser.
Rate limits are handled with exponential backoff (max 5 retries). For very large audits, expect the pull to take a few minutes.

Next¶

Audit CSV data — same workflow for a local export
Configure a policy base — sharpen ConfidentLiesDetector on your real policies
Write a custom detector — add your own failure modes