Configure a policy base¶
Without a PolicyBase, ConfidentLiesDetector flags every commitment pattern (refund promises, timelines, guarantees) as a candidate for review — useful as a sentinel but noisy. This tutorial shows how to turn the noise into signal.
The problem¶
Suppose a customer asks for a refund and the bot responds:
"I'll process your refund within 3 business days."
Is that a real commitment? It depends. If your policy says "refunds within 5 business days," the bot just over-promised by 2 days — a lie. If your policy says "within 3 business days for damaged goods," and the context is damaged goods, it's fine.
chatbot-auditor can't know which unless you tell it.
Building the first PolicyBase¶
Start with the 10 commitments your bot should be making, and the 10 it should never be making. You can iterate — this isn't a forever document.
from chatbot_auditor import PolicyBase
policy = PolicyBase.from_iterables(
allowed=[
# Time windows that match policy
"within 30 days",
"within 5 business days",
"within 3 business days",
# Specific allowed commitments
"full refund for damaged goods",
"free shipping on orders over $50",
"15% off for verified students",
# Specific actions we allow
"i've cancelled your subscription",
"you will receive a confirmation email",
],
disallowed=[
"lifetime warranty",
"money-back guarantee",
"price match",
"same-day delivery",
"free of charge",
"i can refund that immediately",
],
)
Wire it in¶
from chatbot_auditor import (
ConfidentLiesDetector,
audit,
default_registry,
)
registry = default_registry()
registry.register(ConfidentLiesDetector(policy=policy))
detections = audit(conversations, detectors=registry)
How detection changes¶
Without a policy, every commitment becomes a medium-severity "review recommended" flag. With a policy:
| Bot says | Outcome |
|---|---|
| "I'll process your refund within 3 business days." (allowed) | no detection |
| "You'll get a full refund for damaged goods." (allowed) | no detection |
| "We have a lifetime warranty." (disallowed) | critical |
| "I'll cancel your plan and refund you." (neither) | high (unverifiable) |
Iterating¶
After a first audit with your initial policy, review the unverifiable (high-severity) detections. For each recurring pattern, decide:
- Is it allowed? Add the phrase to
allowed_commitments. - Is it forbidden? Add the topic to
disallowed_commitment_topics. - Is it situational? Leave it unverifiable — human review is the right answer.
Rerun the audit. The unverified count should drop with each iteration until only genuine edge cases remain.
Generating policy from existing docs¶
If you have a help center or refund policy page, you can bootstrap a PolicyBase by extracting the commitment-like sentences:
import re
import httpx
def bootstrap_policy_from_url(url: str) -> PolicyBase:
html = httpx.get(url).text
# very rough: pick sentences containing commitment verbs
sentences = re.findall(
r"[^.!?\n]*\b(refund|within|guarantee|cancel|ship|deliver)\b[^.!?\n]*[.!?]",
html,
flags=re.IGNORECASE,
)
cleaned = [s.strip().lower() for s in sentences if len(s) < 200]
return PolicyBase.from_iterables(allowed=cleaned)
For production, use an LLM to extract clean policy statements from your docs. Swap that extraction into the same PolicyBase shape — the detector doesn't care how the strings were obtained.
The same pattern for FactBase¶
FactBase follows the same shape for ConfidentMisinformationDetector:
from chatbot_auditor import ConfidentMisinformationDetector, FactBase
facts = FactBase(
facts={
"store_hours": "9 AM to 6 PM on weekdays",
"support_email": "support@example.com",
"premium_price": "$29.99 per month",
},
aliases={
"store_hours": ("opening hours", "hours", "when you open"),
},
)
registry.register(ConfidentMisinformationDetector(facts=facts))
See Knowledge bases for how FactBase contradictions are detected.
Related¶
- Knowledge bases — conceptual overview
- Detector reference — full API