Groundedness Testing
Every AI response must be grounded in source material. We evaluate:
- Response Accuracy: Does the response accurately reflect the source documents?
- Citation Quality: Are citations relevant and accurate?
- Hallucination Detection: Automated detection of unsupported claims
- Confidence Scoring: Confidence levels for each response component
Citation Accuracy Measurement
Citations are only valuable if they're accurate. We measure:
- Citation Relevance: Do citations support the claims made?
- Citation Completeness: Are all claims properly cited?
- Source Verification: Can citations be verified against source documents?
- Citation Precision: Are citations specific enough to be useful?
Refusal Behavior Handling
AI systems must know when to refuse requests. We evaluate:
- Appropriate Refusal: Does the system refuse out-of-scope or unsafe requests?
- Refusal Clarity: Are refusal messages clear and helpful?
- False Refusals: Does the system refuse valid requests?
- Escalation Paths: Clear paths for handling refused requests