Sample assessment

One question. One submission. The full scoring report.

Below is exactly what a candidate sees, what they submit, and what you receive — for the same question used in our candidate-experience demo. No code is executed; everything is pre-recorded so you can read the rubric without a sales call.

Candidate's view

AI EngineeringHardSubmitted at 38:24

Design a production RAG pipeline with citation tracking

Sub-500ms p95 latency. 2M legal documents. One citation per factual claim. Resolve conflicting retrieved facts. Walk through chunking, retrieval, and prompt assembly.

solution.pyPython

def build_rag_pipeline(corpus, config):
    """Production RAG pipeline with citation tracking."""

    # Semantic chunking — paragraph-boundary preserves legal citations
    chunks = semantic_chunk(corpus, strategy="paragraph_boundary")

    # Hybrid index: BM25 catches exact citations, dense catches semantics
    index = HybridIndex(bm25=True, dense=True)
    index.build(chunks)

    def retrieve_and_stuff(query: str, budget: int = 4000):
        # Reserve tokens for system prompt + answer
        retrieval_budget = budget - 800

        results = index.search(query, k=20)
        results = rerank(results, query)

        # Greedy-pack at sentence level (not chunk level)
        return greedy_pack_sentences(results, retrieval_budget)

    return retrieve_and_stuff

Tab switches: 0

Paste events: 1

AI tool: Claude (allowed)

Total time: 38m 24s

Recruiter's view

Composite 87/100Advancing

Priya L.

87/ 100

Correctness

92vs golden95

All edge cases handled correctly. Conflict-resolution logic is present and tested against multi-source overlap.

Architecture

85vs golden90

Clean separation of indexing and retrieval; service boundaries are sensible. Hybrid index choice is correct for legal corpus.

Token efficiency

88vs golden92

Reserves output budget. Sentence-level packing avoids wasteful chunk boundaries. Slight over-reserve on system prompt.

Security

82vs golden88

Sanitization on retrieved sources before prompt assembly. One missing auth check on the retriever endpoint.

Gap analysis: Strongest on correctness; smallest margin to golden on architecture (−5). Recommend deeper probe on service-boundary design in the follow-up round.

See pricing →Book a demo →

Real assessments include 3-6 problems custom-built from your job description, full anti-cheat trail, and a senior engineer walkthrough for every dimension.