HardML EngineeringSystem Design

How do you monitor performance drift and hallucinations in production LLMs?

Your production LLM application serves 1M requests per day. Over time, output quality degrades — some due to model drift, some due to data distribution shift, some due to hallucinations. Design a monitoring system. Cover: what signals indicate quality degradation without human evaluation on every output, how to detect hallucinations at scale (factuality checks, citation validation, consistency checks), how to set up alerting thresholds that catch real regressions without alert fatigue, and how to distinguish model degradation from prompt drift from upstream data changes.

Sign in to attempt this problem

Free account gives you full access to community problems with the complete solution reveal — golden answer, senior walkthrough, and score breakdown — after submission.

Start free →Already have an account? Sign in