“YOU CAN BUILD AI”
Building a Successful Modern Data Analytics Platform in the Cloud
I worked with dozens of companies migrating their legacy data warehouses or analytical databases to the cloud. I saw the difficulty to let go of the monolithic thinking and design and to benefit from the modern cloud architecture fully. In this article, I’ll share my pattern for a scalable, flexible, and cost-effective data analytics platform in the AWS cloud, which was successfully implemented in these companies.
TL;DR, design the data platform with three layers, L1 with raw files data, L2 with optimized files data, and L3 with cache in mind. Ingest the data as it comes into L1, and transform each use-case independently into L2, and when a specific access pattern demands it, cache some of the data into a dedicated data store.
Mistake 1: “One Data Store To Rule Them All”
The main difference that companies are facing when modernizing their existing data analytics platform is giving up on a single database that was used in their legacy system. It is hard to give on it after the massive investment of building it and operating it. I met companies that spent millions of dollars and hundreds of years of development to built their data warehouse and the many ETL processes, stored procedures, and reporting…