Project
  • Github
  • LinkedIn
  • 500px
  • Stack-overflow

Google BigQuery Data Platform

Project Overview:

This project aimed to design and implement a scalable and efficient data platform for processing, transforming, and analyzing sales and inventory data from a pizza store's operational system. The end goal was to produce meaningful business metrics such as total sales, total profit, and quarterly profit using advanced data engineering principles and modern technologies. (Github Repository...)

Live Interactive Report ~ Tableau Public

Integration Diagram

Technologies and Their Roles:

Apache Airflow (Orchestration):
  • Used as the orchestration tool to schedule, manage, and monitor all ETL workflows.
  • Ensured seamless integration between various stages of the data pipeline, from data extraction to loading and transformation.
  • Orchestrated Python scripts for data extraction and BigQuery CLI commands for data uploads.
Python Scripts (Data Extraction):
  • Extracted raw data from the MS SQL Server, converting it into structured CSV files.
  • Automated the creation of CSVs to ensure efficient transfer to staging directories..
BigQuery CLI (Data Loading):
  • Facilitated the transfer of the CSV files into BigQuery's staging area.
  • Handled large volumes of data efficiently and ensured accurate placement in staging tables for further processing.
BigQuery (Data Repository):
  • Served as the primary data warehouse to store raw, transformed, and aggregated data.
  • Ensured high-performance query execution and scalability for large datasets.
dbt Core(Data Transformation & Semantic Layer):
  • Transformed raw OLTP tables into a well-defined semantic layer, adhering to best practices for data modeling.
  • Created reusable, documented models to calculate business metrics and measures such as total sales, total profit, and inventory trends.
  • Enabled data consistency and efficiency in reporting processes.
Metrics Delivery:
  • Generated business-critical metrics like total sales, total profit, and quarterly performance to support decision-making.
  • The semantic layer provided self-service analytics capabilities for stakeholders.
  • Back Portfolio