A company has three data sources:
- A Postgres production database (orders, customers)
- A Stripe payments API (charges, refunds)
- A Salesforce CRM (leads, opportunities)
They want analysts to be able to join data from all three in a single SQL query — for example, "show me revenue by acquisition channel, broken down by lead source from Salesforce."
Describe the role of a data engineer in making this possible:
- What systems would you build or configure for ingestion, storage, and transformation?
- What tools would you use, and what tradeoffs guide those choices?
- Once the pipelines are running, what ongoing responsibilities would you own — what breaks, and how do you know?
- What is the single most likely production failure mode in the first 6 months, and how would you design the system to surface it quickly?