A data engineer with focus on building scalable distributed data systems.
Designed, developed and implemented high-volume event pipelines and analytics platforms processing 100s of Terabytes of data daily.
Page 02
Designed, developed and implemented high-volume event pipelines and analytics platforms processing 100s of Terabytes of data daily.
Python, Spark, AWS, Airflow
Customers servedAdvertising, IoT, Telecom
Page 03
Ad Clickstream Analytics - Data Lakehouse
Large-Scale Data Processing - ETL Pipeline Design
Enhanced Quality - Data Observability & Reliability
Scaling and Optimization - Distributed Systems
Petabytes of Event-driven data processing systems
~100TBs of data ingested, transformed and analyzed daily
90% less customer reported Data Completness issues
Resources and processing times optimized by 35%
Page 04
Event Sources
Data Stream Ingestion
Raw Micro-Batch Processing
Transformations
Aggregations
Analytics & APIs
Athena + API
Page 05
Page 06
Platform for Completeness, Reliability
Monitoring - Metrics, alerts, dashboards
Page 07
Built around measurable objectives and a system design that keeps data quality visible, actionable, and resilient.
Operational Excellence
CI/CD
Page 08
of data processed daily
less data defects reported
detection time reduced from 2 days
optimization in resources and processing times
of events data extracted, transformed and loaded
of customers served with analytics and insights