Job Summary

We are seeking a skilled Data Engineer with strong expertise in Java and big data technologies to design, develop, and maintain scalable batch data pipelines. The ideal candidate will have hands-on experience working with modern data Lakehouse architectures, cloud-native data platforms, and automation tools to support high-performance analytics and data processing workloads.

Experience - 5-8 Years

Must Haves

Bachelors or masters degree in computer science, Engineering, or a related technical field.
Strong proficiency in Java programming with solid understanding of object-oriented design principles.
Proven experience designing and building ETL/ELT pipelines and frameworks.
Excellent command of SQL and familiarity with relational database management systems.
Hands-on experience with big data technologies such as Apache Spark, Hadoop, and Kafka or equivalent streaming and batch processing frameworks.
Knowledge of cloud data platforms, preferably AWS services (Glue, EMR, Lambda) and Snowflake.
Experience with data modeling, schema design, and concepts of data warehousing.
Understanding of distributed computing, parallel processing, and performance tuning in big data environments.
Strong analytical, problem-solving, and debugging skills.
Excellent communication and teamwork skills with experience working in Agile environments.

Nice to Have

Experience with containerization and orchestration technologies such as Docker and Kubernetes.
Familiarity with workflow orchestration tools like Apache Airflow.
Basic scripting skills in languages like Python or Bash for automation tasks.
Exposure to DevOps best practices and building robust CI/CD pipelines.
Prior experience managing data security, governance, and compliance in cloud environments.

Responsibilities:

Design, develop, and optimize scalable batch data pipelines using Java and Apache Spark to handle large volumes of structured and semi-structured data.
Utilize Apache Iceberg to manage data lakehouse environments, supporting advanced features such as schema evolution and time travel for data versioning and auditing.
Build and maintain reliable data ingestion and transformation workflows using AWS Glue, EMR, and Lambda services to ensure seamless data flow and integration.
Integrate with Snowflake as the cloud data warehouse to enable efficient data storage, querying, and analytics workloads.
Collaborate closely with DevOps and infrastructure teams to automate deployment, testing, and monitoring of data workflows using CI/CD tools like Jenkins.
Develop and manage CI/CD pipelines for Spark/Java applications, ensuring automated testing and smooth releases in a cloud environment.
Monitor and continuously optimize the performance, reliability, and cost-efficiency of data pipelines running on cloud-native platforms.
Implement and enforce data security, compliance, and governance policies in line with organizational standards.
Troubleshoot and resolve complex issues related to distributed data processing and integration. Work collaboratively within Agile teams to deliver high-quality data engineering solutions aligned with business requirements

2567 - Java Data Engineer

Job description & requirements

Location :

Create alert for similar jobs

similarJobs

Enquiry for: