“YOU CAN BUILD AI”

Building a Successful Modern Data Analytics Platform in the Cloud

ML-Guy
12 min readOct 20, 2019

I worked with dozens of companies migrating their legacy data warehouses or analytical databases to the cloud. I saw the difficulty to let go of the monolithic thinking and design and to benefit from the modern cloud architecture fully. In this article, I’ll share my pattern for a scalable, flexible, and cost-effective data analytics platform in the AWS cloud, which was successfully implemented in these companies.

TL;DR, design the data platform with three layers, L1 with raw files data, L2 with optimized files data, and L3 with cache in mind. Ingest the data as it comes into L1, and transform each use-case independently into L2, and when a specific access pattern demands it, cache some of the data into a dedicated data store.

The three layers of a modern data analytics platform

Mistake 1: “One Data Store To Rule Them All”

The main difference that companies are facing when modernizing their existing data analytics platform is giving up on a single database that was used in their legacy system. It is hard to give on it after the massive investment of building it and operating it. I met companies that spent millions of dollars and hundreds of years of development to built their data warehouse and the many ETL processes, stored procedures, and reporting…

--

--

ML-Guy

Guy Ernest is the co-founder and CTO of @aiOla, a promising AI startup that closes the loop between knowledge, people & systems. He is also an AWS ML Hero.