Eneco Project
Client: Eneco
Duration: 2 years
Tools: Snowflake, Oracle, Azure Data Factory (ADF), Azure Data Lake Storage Gen2 (ADLS), Airflow/Python, Kafka
Goal: Migrate data from a legacy Oracle system to the Snowflake data platform and develop robust ETL pipelines for diverse data sources.
Outcomes: Successful migration to a modern data platform, automated data quality processes, improved accessibility and scalability of data sources.
Description
I've played a key role in executing the data migration from a legacy Oracle system to the Snowflake data platform. My responsibilities evolved across several phases, allowing me to contribute to and shape the data ingestion strategy.
- Data Migration to Snowflake Initially, my role focused on migrating data from multiple databases to the Snowflake data lake. Using Azure Data Factory (ADF) as the ETL service and Azure Data Lake Storage Gen2 (ADLS) for temporary storage and archiving, I managed the creation of data objects within Snowflake. This involved designing, testing, and managing ETL pipelines, as well as testing and monitoring data for quality assurance.
- Developing ETL Pipelines for Non-Standard Data Sources Later in my role, I focused on developing ETL pipelines for new and non-standard data sources, including data from SFTP servers, APIs, and the Kafka streaming platform. I was involved in the design and development of these pipelines using tools like ADF, Airflow/Python, and Snowflake features, handling semi-structured data formats such as JSON, XML, and PARQUET.
- Building Automation for Data Quality Beyond data ingestion, I created reusable building blocks within the data platform, including stored procedures for automated tasks, maintenance, and monitoring to ensure high data quality. These procedures automated data transformation processes, metadata logging, deletion handling, and data monitoring, increasing platform efficiency and reliability.
In addition to these responsibilities, I contributed to various smaller projects within Eneco, such as integrating the Genesys API into Snowflake and retrieving and storing asset data via Kafka streams. This role required a high degree of self-direction and creativity, and I enjoyed the challenges of working with a range of tools and evolving tasks.