Data Engineer
Koch
Job Description
Your Job As a Data Engineer at Koch Capabilities, you will design, build, and operate scalable, reliable data pipelines and platforms that power advanced analytics and machine learning applications across the enterprise. You will work across distributed data environments, enabling high‑quality, trusted, and timely data for business, analytics, and AI use cases. Our Team You will be part of the Data & Analytics organization, collaborating closely with product managers, analytics teams, data scientists, and engineering peers.
Our team focuses on building enterprise‑grade data systems, modernizing data platforms, and enabling self‑service consumption of high‑quality data through technical excellence and engineering best practices. What You Will Do Design, build, optimize, and maintain scalable batch and streaming ETL/ELT pipelines supporting analytics, BI, and ML workloads. Work extensively with Enterprise Data Lake environments to manage ingestion, curation, storage, and transformation of large‑scale datasets.
Own and maintain data models, schemas, metadata, and promote strong data engineering standards and governance across platforms. Implement automated testing, monitoring, alerting, and quality frameworks to ensure accuracy, reliability, and observability of data pipelines. Optimize data storage, query performance, and compute cost across cloud data warehouses and data lakes.
Partner with analytics, product, and ML engineering teams to build robust data foundations, ensuring seamless consumption for downstream use cases. Mentor junior engineers and promote a culture of engineering excellence, innovation, and continuous improvement. Apply hands‑on knowledge of Agentic AI to enhance data engineering workflows and accelerate development productivity.
Who You Are (Basic Qualifications) 6+ years of experience as a Data Engineer or in a similar data engineering role. Strong hands‑on expertise with cloud platforms, particularly AWS (Glue, Lambda, CloudWatch, Bedrock). Proficiency in SQL and one or more programming languages such as Python, Scala, or Java.
Practical experience with Apache Spark and Kafka for large‑scale data processing and streaming. Experience building and orchestrating workflows using tools such as Airflow or Dagster. Strong understanding of cloud storage technologies (e.g., Amazon S3, ADLS) and modern cloud data warehouses (Snowflake, BigQuery).
Demonstrated ability to work in distributed, cloud‑native data environments. What Will Put You Ahead Experience working in large‑scale enterprise data lake or data mesh architectures. Exposure to performance tuning, cost optimization, and distributed systems scaling.
Familiarity with ML/AI workloads and data requirements for model training and inference. Experience driving engineering best practices, automation, and CI/CD for data pipelines. Prior experience enabling Agentic AI or AI‑driven data engineering accelerators.
Additionally, everyone has individual work and personal needs. We seek to enable the best work environment that helps you and the business work together to produce superior results.