Data Science Engineer
Harnham
Job Description
Our client is seeking a versatile Data Science Engineer to join our team. You will be responsible for designing, building, and operationalizing advanced data-driven solutions that support business decision-making and product innovation. This role spans the entire data lifecycle: from architecting robust ingestion pipelines in Azure Data Factory to developing and deploying machine learning models within Microsoft Fabric.
Collaborating closely with the Senior Manager of Enterprise Applications & Data, you will lead discussions with cross-functional teams to design appropriate data architectures and build robust pipelines using Azure cloud services. This position requires a unique blend of technical depth in data engineering and the analytical skills necessary to extract meaningful insights from complex datasets. You will ensure that the underlying data infrastructure is performant, governed, and scalable.
This role combines strong statistical and machine learning expertise with robust software engineering and data engineering practices. The ideal candidate can develop models end-to-end—from data ingestion and feature engineering through deployment, monitoring, and optimization in production environments.
MAJOR RESPONSIBILITES:
- Uses Data Pipeline Development to design, build, and maintain enterprise-scale ETL/ELT pipelines using Azure Data Factory and Fabric Data Factory.
- Leverages Microsoft Fabric (OneLake, Lakehouse, and Warehouse) to unify disparate data sources for downstream science workloads to build and optimize data workflows.
- Uses Model Engineering to develop, train, and tune machine learning models using Synapse Data Science (Notebooks) and MLflow while monitoring performance, data drift, and system reliability.
- Uses Feature Engineering to transform raw data into curated datasets using PySpark and SQL to optimize model performance.
- Implements CI/CD patterns for machine learning, ensuring models are versioned, monitored, and easily redeployed.
- Implements data quality checks, monitoring, and validation processes to ensure data integrity.
ESSENTIAL FUNCTIONS:
- Fabric Management: Creates, updates, and secures Fabric items, specifically Lakehouses, Warehouses, Notebooks, and Dataflows Gen2 within the OneLake platform.
- Advanced Orchestration: Manages data orchestration using advanced knowledge of Azure Data Factory pipelines, activities, triggers, and Self-hosted Integration Runtimes.
- Data Manipulation: Utilizes a strong command of Python (Pandas, Scikit-learn, PySpark) and SQL to create dataflows and notebooks for ingestion and analytics.
- Architecture Best Practices: Implements Star Schema and Medallion Architecture (Bronze/Silver/Gold) principles to ensure data scalability.
- Quality Assurance: Participates in code reviews and contributes to the evolution of best practices for the team.
QUALIFICATIONS:
- Bachelor’s degree or equivalent in Computer Science or a related field preferred
- Minimum of three (3+) of work experience as a data science engineer with a proven track record.
- Proven hands-on experience with the Azure Data Stack (ADLS Gen2, Azure SQL, Key Vault).
- Experience with Power BI for visualizing model outputs and creating user dashboards.
- Proficiency in SQL, Python, and PySpark.
- Experience with data pipeline development and AI/ML applications.
- Knowledge of data warehousing, data lake architecture, and Microsoft Fabric platform.
- Familiarity with machine learning, AI, and Generative AI technologies.