Data Engineer – AWS & Airflow
Will Sparrow Technologies
Job Description
About the Role We are looking for a talented and motivated Data Engineer to join our growing engineering team at Will Sparrow Technologies. In this role, you will design, build, and maintain scalable data pipelines and infrastructure that power analytics and machine learning workloads. You will collaborate closely with data analysts, data scientists, and software engineers to ensure reliable, high-quality data delivery across the organisation.
Key Responsibilities Design, develop, and maintain robust Apache Airflow DAGs for data orchestration on Astronomer Airflow (managed platform). Build and optimise ETL/ELT pipelines ingesting data from diverse sources into AWS Redshift and AWS RDS. Develop and manage data cataloguing, transformation, and integration jobs using AWS Glue (Glue Studio, Glue Crawlers, Glue Data Catalog).
Write efficient, well-tested Python scripts and SQL queries for data transformation, validation, and analysis. Containerise data services with Docker and deploy/orchestrate workloads on Kubernetes clusters. Monitor pipeline health, set up alerting, and perform root-cause analysis for data quality or latency issues.
Collaborate with stakeholders to understand data requirements and translate them into scalable engineering solutions. Implement best practices for data security, access control, and compliance within AWS environments. Maintain thorough documentation of data models, pipeline logic, and infrastructure decisions.
Participate in code reviews, contribute to engineering standards, and mentor junior team members. Required Skills & Qualifications Apache Airflow – Proven experience authoring and managing production-grade DAGs; familiarity with XComs, custom operators, and dynamic task mapping. Astronomer – Experience deploying and operating Airflow on the Astronomer platform (ASTRO CLI, Deployments, Environments).
AWS Redshift – Strong command of Redshift architecture, distribution keys, sort keys, VACUUM/ANALYZE, and RA3 cluster management. AWS Glue – Hands-on experience with Glue ETL jobs (PySpark & Python shell), Glue Crawlers, and Data Catalog integration. Python – Proficient in writing clean, modular, and testable Python code; experience with Pandas, PySpark, or similar libraries.
SQL – Advanced SQL skills including window functions, CTEs, query optimisation, and performance tuning. AWS RDS – Experience working with PostgreSQL/MySQL on RDS, including replication, parameter groups, and connection management. Docker – Ability to build and publish container images, write multi-stage Dockerfiles, and use Docker Compose for local development.
Kubernetes – Familiarity with K8s concepts (Pods, Deployments, CronJobs, ConfigMaps, Secrets); experience deploying workloads via kubectl or Helm. Strong understanding of data warehousing concepts, dimensional modelling, and ELT design patterns. Experience with version control (Git) and CI/CD pipelines for data engineering workflows.
Nice to Have Experience with dbt (data build tool) for transformation layer development. Knowledge of AWS services: S3, Lambda, Step Functions, EventBridge, Secrets Manager. Familiarity with streaming platforms such as Apache Kafka or AWS Kinesis.
Exposure to data quality frameworks (Great Expectations, Soda Core). Experience with Terraform or AWS CDK for infrastructure-as-code. Understanding of data governance and metadata management practices.
Contributions to open-source projects or a public GitHub portfolio.