What you’ll do (responsibilities):

Responsibilities include:

Design, develop, and enhance components related to database server internals, including storage, indexing, query execution, and transaction processing.
Implement and refine query planners, optimizers, and execution engines with a focus on performance and scalability.
Analyze and optimize complex SQL and distributed queries, ensuring minimal latency and resource efficiency.
Contribute to Apache Spark or related open-source ecosystems, including performance improvements, extensions, and debugging.
Build and maintain large-scale distributed data processing pipelines.
Perform deep query analysis, profiling, troubleshooting, and root cause investigation for performance bottlenecks.
Design cloud-native microservices using Kubernetes and Docker.
Troubleshoot and debug production issues using advanced Linux debugging tools, logs, and metrics.
Collaborate with cross-functional engineering teams to define technical strategies and architectural improvements.
Mentor junior developers, conduct code reviews, and contribute to development best practices.

What you’ll need:

Strong expertise in:

Database server internals
Query planners & optimizers
Query execution frameworks
Hands-on experience with query optimization and SQL performance tuning.
Proven contributions to Spark open-source, or strong experience working with Spark internals.
Strong proficiency in Scala and/or Java, with deep understanding of concurrency, memory management, and functional programming concepts.
Solid experience with Kubernetes (K8s) and Docker for container orchestration and deployment.
Strong Linux fundamentals and hands-on experience with:
Linux profiling tools (perf, strace, lsof, etc.)
Kernel-level debugging (preferred)
Deep knowledge of distributed system design (networking, partitioning, replication, fault tolerance).
Experience with CI/CD pipelines and version control (Git).

What's nice to have (preferred qualifications)

Icing on the cake…

Experience contributing to large-scale open-source projects (Apache Spark, Presto, Trino, etc.).
Familiarity with columnar formats (Parquet, ORC) and vectorized execution engines.
Experience with cloud platforms (AWS, Azure, GCP).
Knowledge of JVM performance tuning and GC optimization.
Exposure to big data query engines or OLAP systems.

Sr. Software Development Engineer - Scala, Spark

Job description & requirements