Stepping into a migration project of national scale, I quickly learned that transforming legacy systems isn’t just technical—it’s strategic. As a Microsoft Fabric Certified Data Engineer, I specialize in the design and implementation of metadata-driven data pipelines and scalable medallion cloud-native architecture. At the NHS Business Services Authority, I contribute to one of the UK’s largest data platform transformations—re-architecting legacy Oracle-based systems onto a modern, cost-optimized Microsoft Fabric environment. This initiative involves complex ETL/ELT pipeline orchestration, data model refactoring, and the integration of Fabric onelake and Lakehouses to support enterprise-grade analytics and Machine Learning Models. This transformation underpins services that support the priorities of the NHS, government, and local health economies, enabling the management of approximately £48 billion in NHS spending annually.
This project demonstrates the implementation of a complete data ingestion and transformation pipeline using Microsoft Fabric and Apache Spark. It simulates a real-world data engineering workflow by ingesting external sales data via HTTP, storing it in a Lakehouse (OneLake), transforming it using PySpark notebooks, and automating the entire ETL process within a Fabric pipeline.
This project demonstrates the process of managing and analyzing the Snapshot Serengeti dataset using Microsoft Fabric, focusing on data loading, transformation, and machine learning. Key tasks include configuring a Data Factory pipeline to move data from Blob Storage to a Lakehouse, converting JSON files into Parquet format, and performing data exploration and transformation with Apache Spark.
Designed a data model to analyze global developer demographics, average age by continent, popular tools/platforms, language preferences, and education paths. .