Direct message the job poster from Ellwood Consulting
Are you passionate about building robust, scalable systems that keep critical services running smoothly? We’re looking for a Engineer – Site Reliability to be a key force behind our infrastructure, ensuring peak performance, stability, and efficiency.
In this role, you'll bridge the gap between development and operations, designing resilient architectures, driving automation, and embedding a culture of reliability across the engineering team. Your work will directly impact the user experience and uptime of our most essential services.
- Salary up to MYR 14,000
- Working hours: 5pm - 3am (Afternoon Shift) | 8am - 5pm (Day Shift)
- Flexi Hybrid
What You’ll Be Doing:
- Design and implement high-availability, scalable system architectures that can handle production-grade workloads.
- Develop tools and automation to streamline operations, reduce manual tasks, and improve response times.
- Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and improve system reliability.
- Conduct detailed post-incident reviews and lead root cause analyses to prevent recurrence.
- Collaborate across engineering, QA, and infrastructure teams to build and maintain resilient systems.
- Diagnose and resolve complex issues across databases, networks, and deployment pipelines—including Kubernetes and VMs.
- Ensure adherence to Service Level Agreements (SLAs) by proactively managing incidents and performance.
- Continuously tune and optimize systems for performance, scalability, and reliability.
- Document processes, incident resolutions, and system designs to enable knowledge sharing and operational transparency.
What You’ll Bring:
- Strong programming skills in Python, Golang, Java, or similar languages—especially for automation and tooling.
- Experience designing and operating distributed systems at scale.
- Deep understanding of SRE and DevOps principles, including observability, reliability, and incident response.
- Hands-on experience in cloud environments such as AWS, Azure, or Google Cloud.
- Proficiency in Linux system administration, performance tuning, and troubleshooting.
- Solid grasp of networking concepts and infrastructure troubleshooting.
- A proactive mindset, excellent problem-solving skills, and the ability to drive improvements autonomously.
- Comfortable working independently while collaborating in cross-functional environments.
- Fluency in Mandarin (both written and spoken), preferred, but not necessary, for communication with clients in the China market.
Bonus Points For:
- Experience with monitoring tools like Prometheus, Grafana, Datadog, or similar.
- Familiarity with CI/CD pipelines, Infrastructure as Code (IaC) (e.g., Terraform), and containerization tools (Docker, Kubernetes).
- Knowledge of automation and scripting for system tasks (e.g., Bash, Python).
- A strong understanding of DevOps culture and its best practices.
- Seniority levelMid-Senior level
- Employment typeFull-time
- Job functionInformation Technology
- IndustriesTechnology, Information and Media
Referrals increase your chances of interviewing at Ellwood Consulting by 2x
Get notified about new Site Reliability Engineer jobs in Greater Kuala Lumpur.
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 1 month ago
WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 3 weeks ago
Petaling Jaya, Selangor, Malaysia 1 month ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 4 days ago
WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 1 week ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 1 month ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 3 days ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 1 month ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 2 weeks ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 1 week ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 1 month ago
Junior DevOps / Site Reliability EngineerKuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 2 months ago
Petaling Jaya, Selangor, Malaysia 1 month ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 1 week ago
Petaling Jaya, Selangor, Malaysia 9 months ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 1 month ago
Petaling Jaya, Selangor, Malaysia 4 days ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 3 weeks ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 2 months ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 2 months ago
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 2 weeks ago
Senior Site Reliability Engineer (DevOps)WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 3 days ago
WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia 1 week ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr