SampleThis is a sample. Real assessments are custom-built from your job description.

One question. One submission. The full scoring report.

Below is exactly what a candidate sees, what they submit, and what you receive — for the same question used in our candidate-experience demo. No code is executed; everything is pre-recorded so you can read the rubric without a sales call.

Candidate's view
AI EngineeringHardSubmitted at 38:24

Design a production RAG pipeline with citation tracking

Sub-500ms p95 latency. 2M legal documents. One citation per factual claim. Resolve conflicting retrieved facts. Walk through chunking, retrieval, and prompt assembly.

solution.pyPython
def build_rag_pipeline(corpus, config):
    """Production RAG pipeline with citation tracking."""

    # Semantic chunking — paragraph-boundary preserves legal citations
    chunks = semantic_chunk(corpus, strategy="paragraph_boundary")

    # Hybrid index: BM25 catches exact citations, dense catches semantics
    index = HybridIndex(bm25=True, dense=True)
    index.build(chunks)

    def retrieve_and_stuff(query: str, budget: int = 4000):
        # Reserve tokens for system prompt + answer
        retrieval_budget = budget - 800

        results = index.search(query, k=20)
        results = rerank(results, query)

        # Greedy-pack at sentence level (not chunk level)
        return greedy_pack_sentences(results, retrieval_budget)

    return retrieve_and_stuff
Tab switches: 0
Paste events: 1
AI tool: Claude (allowed)
Total time: 38m 24s
Recruiter's view
Composite 87/100Advancing
Priya L.
87/ 100
Correctness
92vs golden95
All edge cases handled correctly. Conflict-resolution logic is present and tested against multi-source overlap.
Architecture
85vs golden90
Clean separation of indexing and retrieval; service boundaries are sensible. Hybrid index choice is correct for legal corpus.
Token efficiency
88vs golden92
Reserves output budget. Sentence-level packing avoids wasteful chunk boundaries. Slight over-reserve on system prompt.
Security
82vs golden88
Sanitization on retrieved sources before prompt assembly. One missing auth check on the retriever endpoint.
Gap analysis: Strongest on correctness; smallest margin to golden on architecture (−5). Recommend deeper probe on service-boundary design in the follow-up round.
See pricing →Book a demo →

Real assessments include 3-6 problems custom-built from your job description, full anti-cheat trail, and a senior engineer walkthrough for every dimension.