A fast-growing e-commerce startup uses PostgreSQL as their production database. The analytics team runs daily revenue reports that take 45 minutes and frequently slow down the application during peak hours.
(1) Explain why running analytics queries on the production PostgreSQL database causes these problems. Be specific about the database internals — why does an aggregation query slow down OLTP traffic?
(2) Propose a solution. What system would you add, how would data get there, and how would the architecture change? Pay particular attention to data freshness expectations.
(3) Why would a columnar database (Snowflake, BigQuery) handle the revenue report significantly faster than PostgreSQL on the same underlying data? Explain in terms of storage layout, IO, and parallel execution.
(4) The CFO asks if you should just buy a bigger PostgreSQL instance. Why is this the wrong solution at this scale?