Audit Intercom data¶
This tutorial walks through connecting to a real Intercom workspace, pulling conversations, and producing an audit report.
Prerequisites¶
- Intercom workspace with API access
- An Intercom Access Token with
Read conversationspermission (how to create one) chatbot-auditor[intercom]installed
Step 1: Authenticate¶
Put your token in an environment variable:
Never commit the token to source control. Use your CI secrets manager for scheduled runs.
Step 2: Run a small test¶
Start with a low limit so you can see the output format before pulling everything:
You should see output like:
Analyzed 10 conversation(s).
Detected 14 failure(s).
[MEDIUM ] death_loop conv=42301 conf=0.65
Bot gave 3 consecutive similar responses ...
[HIGH ] silent_churn conv=42305 conf=0.85
Conversation of 4 messages ended without a customer confirmation ...
...
Step 3: Audit a meaningful sample¶
A good first real audit is 200–500 recent conversations. Pipe to JSON so you can slice and dice:
Now analyze with jq:
# How many of each failure mode?
jq '[.[].detector] | group_by(.) | map({detector: .[0], count: length})' audit.json
# Top 10 most-flagged conversations
jq '[.[]] | group_by(.conversation_id)
| map({id: .[0].conversation_id, count: length})
| sort_by(-.count) | .[:10]' audit.json
# Only critical detections
jq '[.[] | select(.severity == "critical")]' audit.json
Step 4: Use the Python API for control¶
For anything beyond quick CLI exploration, use the Python API. You get streaming, custom configurations, and access to evidence.
from chatbot_auditor import audit
from chatbot_auditor.adapters.intercom import IntercomAdapter
adapter = IntercomAdapter(max_conversations=500)
total_conversations = 0
failures_by_mode: dict[str, int] = {}
for conv in adapter.fetch():
total_conversations += 1
for d in audit([conv]):
failures_by_mode[d.detector] = failures_by_mode.get(d.detector, 0) + 1
print(f"Audited {total_conversations} conversations")
print("Failures by mode:")
for mode, count in sorted(failures_by_mode.items(), key=lambda x: -x[1]):
rate = count / total_conversations * 100
print(f" {mode:<24} {count:>4} ({rate:.1f}%)")
Example output:
Audited 500 conversations
Failures by mode:
silent_churn 148 (29.6%)
death_loop 92 (18.4%)
sentiment_collapse 47 (9.4%)
escalation_burial 31 (6.2%)
brand_damage 2 (0.4%)
Step 5: Schedule it¶
Put the audit on a cadence. One approach with GitHub Actions:
# .github/workflows/weekly-audit.yml
on:
schedule:
- cron: "0 8 * * MON"
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v3
- run: uv run --with "chatbot-auditor[intercom]" \
chatbot-audit analyze-intercom --limit 1000 --json > audit.json
env:
INTERCOM_ACCESS_TOKEN: ${{ secrets.INTERCOM_ACCESS_TOKEN }}
- uses: actions/upload-artifact@v4
with:
name: weekly-audit
path: audit.json
Known limitations¶
IntercomAdapteruses the list endpoint (GET /conversations), which returns conversations with inlined parts but may truncate very long threads. To get full transcripts, fetch each conversation by id using the per-conversation endpoint and reparse.- HTML bodies are stripped to plain text using a regex. If you need rich formatting preserved, inject a custom parser.
- Rate limits are handled with exponential backoff (max 5 retries). For very large audits, expect the pull to take a few minutes.
Next¶
- Audit CSV data — same workflow for a local export
- Configure a policy base — sharpen
ConfidentLiesDetectoron your real policies - Write a custom detector — add your own failure modes