Anthropic
RAG evaluation metrics masking real-world answer quality failures
mediumsoftware engineer
Pro problem
Company-verified problems β including their full description, IDE, scoring, and golden answer β are available on Velocode Pro.
βFull problem description + context
βBrowser IDE β attempt and submit
βProduction scoring across 6 dimensions
βScore vs 3-LLM tournament golden answer
βSenior engineer walkthrough
β110+ problems from 50+ AI companies
Already Pro? Sign in β
β Browse free community problems