What you’ll do (responsibilities):
Responsibilities include:
- Design, develop, and enhance components related to database server internals, including storage, indexing, query execution, and transaction processing.
- Implement and refine query planners, optimizers, and execution engines with a focus on performance and scalability.
- Analyze and optimize complex SQL and distributed queries, ensuring minimal latency and resource efficiency.
- Contribute to Apache Spark or related open-source ecosystems, including performance improvements, extensions, and debugging.
- Build and maintain large-scale distributed data processing pipelines.
- Perform deep query analysis, profiling, troubleshooting, and root cause investigation for performance bottlenecks.
- Design cloud-native microservices using Kubernetes and Docker.
- Troubleshoot and debug production issues using advanced Linux debugging tools, logs, and metrics.
- Collaborate with cross-functional engineering teams to define technical strategies and architectural improvements.
- Mentor junior developers, conduct code reviews, and contribute to development best practices.
What you’ll need:
- 9+ years of professional software development experience.
Strong expertise in:
- Database server internals
- Query planners & optimizers
- Query execution frameworks
- Hands-on experience with query optimization and SQL performance tuning.
- Proven contributions to Spark open-source, or strong experience working with Spark internals.
- Strong proficiency in Scala and/or Java, with deep understanding of concurrency, memory management, and functional programming concepts.
- Solid experience with Kubernetes (K8s) and Docker for container orchestration and deployment.
- Strong Linux fundamentals and hands-on experience with:
- Linux profiling tools (perf, strace, lsof, etc.)
- Kernel-level debugging (preferred)
- Deep knowledge of distributed system design (networking, partitioning, replication, fault tolerance).
- Experience with CI/CD pipelines and version control (Git).
What's nice to have (preferred qualifications)
Icing on the cake…
- Experience contributing to large-scale open-source projects (Apache Spark, Presto, Trino, etc.).
- Familiarity with columnar formats (Parquet, ORC) and vectorized execution engines.
- Experience with cloud platforms (AWS, Azure, GCP).
- Knowledge of JVM performance tuning and GC optimization.
- Exposure to big data query engines or OLAP systems.