Data Engineer
LogixHealth
Job Description
Job Title: Data Engineer / Senior Data Engineer
Location: Bangalore
Experience: 5+ years
Job Type: (Hybrid, Fulltime)
Immediate joiners or notice period of less than 10 days are needed
Purpose:
As a Data Engineer at LogixHealth, you will work with a globally distributed team of engineers to design and build cutting edge solutions that directly improve the healthcare industry. You’ll contribute to our fast-paced, collaborative environment and bring your expertise to continue delivering innovative technology solutions, while mentoring others.
Duties and Responsibilities:
- Contribute to the creation of a self-service data platform for reporting and analytics
- Design and build data solutions using Databricks, SQL, Python, Spark, and Delta Lake in the Azure ecosystem (Blob Storage, Data Factory, Event Hubs)
- Adhere to best practices of ETL / ELT processes (data quality management, data processing, data partitioning, maintainability and reusability)
- Collaborate with engineers, product, and business leaders to ensure data platform is integrated with other systems and technologies (Tableau, Power BI, APIs, custom applications)
- Establish CI/CD processes, test frameworks, infrastructure-as-code tools, and monitoring/alerting (Git, Terraform, Azure DevOps / GitHub Actions / Jenkins, Azure Monitor / Datadog)
- Adhere to the Code of Conduct and be familiar with all compliance policies and procedures stored in LogixGarden relevant to this position
Qualifications:
To perform this job successfully, an individual must be able to perform each duty satisfactorily.
The requirements listed below are representative of the knowledge, skills, and/or ability required. Reasonable accommodation may be made to enable individuals with disabilities perform the duties.
Education (Degrees, Certificates, Licenses, Etc.):
BS (or higher, MS / PhD) degree in Computer Science / related field, or equivalent technical experience.
Experience:
- 5+ years of strong hands-on experience in Apache Spark and Databricks, building scalable data pipelines and distributed data processing systems in cloud environments
- Deep expertise in Databricks ecosystem including:
- Delta Lake
- Delta Live Tables (DLT)
- Unity Catalog
- Workflow orchestration (Jobs)
- Strong programming experience in PySpark / Spark (Python or Scala preferred) for large-scale data engineering workflows
- Proven experience designing high-performance Spark jobs, optimization techniques (partitioning, caching, AQE, joins, skew handling)
- Experience integrating Databricks with (good to have):
- Azure Data Factory
- Event Hubs / streaming pipelines
- External orchestration tools like Airflow
- Working knowledge of cloud data platforms (Azure preferred) including Blob Storage, NoSQL DB's
- Experience with relational databases (MS SQL, PostgreSQL, MySQL) is good to have.
- Exposure to data governance, security, and compliance (Unity Catalog, RBAC, data lineage)
Core Skills (Needed):
Expert-level Spark (PySpark/Scala):
- DataFrames, Spark SQL, Structured Streaming
- Performance tuning & debugging
- Handling large-scale datasets (TB+ scale)
Databricks Expertise:
- Notebooks, Jobs, Workflows
- Delta Lake (ACID, schema evolution, optimization)
- Delta Live Tables (pipeline design & orchestration)
- Unity Catalog (data governance, access control)
Data Engineering on Databricks:
- Batch + streaming pipelines
- Medallion architecture (Bronze/Silver/Gold)
- Incremental processing & CDC patterns