[LPS] L3 Cloud Engineer
LPS
Job Description
We are seeking a highly skilled and experienced Cloud Engineer lead (Level 3) to support cloud infrastructure for Commercial and Singapore Government-appointed agency operating across commercial cloud platforms. This role requires experiences managing multi-cloud environments predominantly on Amazon Web Services (AWS), with knowledge in Microsoft Azure and Google Cloud Platform (GCP). The ideal candidate will demonstrate strong Infrastructure-as-Code (IaC) capabilities, comprehensive OS lifecycle and patching operations, application deployment and troubleshooting expertise, and proactive operational leadership.
This role emphasizes hands-on technical proficiency, security awareness, automation-driven practices, mentorship capabilities, and familiarity with strict uptime, compliance, and audit requirements in network separation environments.Key ResponsibilitiesMulti-Cloud Infrastructure OperationsOperate and maintain cloud-native services in production across AWS, Microsoft Azure, and Google Cloud Platform: Hands-on experience with cloud services including: Lambda, ECS/EKS, FSx, Glue, SES, GuardDuty, WAF, Shield Advanced, Security Hub, KMS, Secret Manager, SNS, SQS, EventBridge, API Gateway, EC2, S3, CloudWatch, Systems Manager, Azure Virtual Machines, Azure Kubernetes Service (AKS), Azure Functions, Azure Storage, Azure Monitor, Compute Engine, Google Kubernetes Engine (GKE), Cloud Functions, Cloud Storage, Cloud MonitoringMonitor and troubleshoot infrastructure performance, uptime, and scalability across all platformsSupport production and staging environments with 24/7 reliability objectivesAble to participate in 24/7 shift rotation to provide round-the-clock operational support and assist a team of L2 engineers with hands-on troubleshooting of technical issues.Infrastructure as Code (IaC)With working knowledge, able to maintain infrastructure deployment pipelines with 1 of the following: Terraform, Ansible, and/or Azure Resource Manager (ARM) templatesTroubleshoot environment drift and pipeline failures across multi-cloud environments.Promote and be empowered to drive automation in cloud operations and continuous improvement initiatives.Implement and maintain GitOps practices for infrastructure deploymentOperating System Lifecycle & Patch ManagementLead OS patching operations across RHEL (v8 to v10) and Windows Server (2016→2025) using AWS Patch Manager, Azure Update Management, WSUS, SCCM, and YUM/DNFMaintain basic knowledge of Linux administration with deep expertise in Wintel Operating System patching and managementSchedule, automate, and track patches across all environmentsCoordinate patch approvals and ensure compliance with organizational policiesExecute monthly and quarterly patch cycles with minimal disruptionPerform post-patch validation and remediation activitiesApplication Deployment & TroubleshootingDeploy and troubleshoot applications across Windows and Linux operating systemsSupport application teams with OS-level diagnostics and performance optimizationCollaborate with development teams to resolve infrastructure and OS-related application issuesImplement and maintain application monitoring and alerting frameworksSecurity & ComplianceExecute CIS (Center for Internet Security) security remediations across cloud platformsPerform security hardening based on CIS Benchmarks and government security baselinesConduct vulnerability remediation using tools such as Trend Micro Vision One, Qualys, Tenable, and AWS ConfigTrack SSL certificate renewals across all environmentsIdentify and remediate End-of-Life (EOL) components including OS versions and Lambda runtimesSupport compliance with government-level security, audit, and regulatory requirementsContainer & DevSecOpsDemonstrate knowledge of container technologies (Docker, Kubernetes, ECS, EKS, AKS, GKE)Familiarity or insights of DevSecOps practices using SHIP-HATS (Secure Hybrid Integration Pipeline - Hive Agile Testing Solutions) under Singapore Government technology stackSupport CI/CD pipeline operations and integration with security scanning toolsITIL & Service ManagementAdhere to ITIL processes including Incident, Problem, Change, and Request ManagementManage and resolve ITSM tickets via ServiceNow, Jira, or similar platformsDrive ITSM ticket escalation between engineering teams and stakeholdersCoordinate change management activities and participate in Change Advisory Board (CAB) reviews with junior engineers.Maintain service level agreements (SLAs) and operational level agreements (OLAs)Tool Integration & ObservabilityIntegrate third-party tools including NGINX, monitoring dashboards, and observability stacksConfigure and maintain observability tools for metrics, logs, and alerts across multi-cloud environmentsImplement log aggregation and analysis using CloudWatch, Azure Monitor, and GCP Cloud LoggingDocumentation & Knowledge ManagementCreate and maintain comprehensive infrastructure runbooks, system documentation, and change tracking logs and infrastructure architecture design of Application assigned.Develop standard operating procedures (SOPs) and knowledge base articlesEnsure audit-readiness through meticulous documentation disciplineMaintain configuration management databases (CMDB) and asset inventoriesLeadership & MentorshipProvide technical guidance and mentorship to Level 2 and junior engineersLead technical discussions and architecture reviewsFacilitate knowledge transfer sessions and training programsAct as escalation point for complex technical issuesDrive continuous improvement initiatives and best practice adoptionSoft Skills & CompetenciesProblem Solving – Advanced troubleshooting of complex multi-cloud systemsCommunication – Clear and effective communication with technical and non-technical teams, stakeholders, and managementLeadership – Ability to guide teams and drive technical initiativesCollaboration – Cross-functional teamwork across engineering, security, and business teamsAdaptability – Responsive and effective in rapidly changing environmentsAccountability / Attention to Detail – Takes ownership of outcomes and service delivery, ensures accurate and secure implementationsCustomer Focus – Supportive, service-oriented approach with stakeholder managementContinuous Learning – Stays current with evolving cloud and security practicesResilience – Performs effectively under pressure and during incident responseMentorship – Develops and supports junior team engineersSME Expectations – Role BehaviorThis Subject Matter Expert (SME) role requires:Proficiency across Amazon Web Services with working knowledge of Azure and GCPProven experience in uptime-critical and compliance-driven environmentsStrong mentorship and leadership capabilities for junior and mid-level engineersProactive initiative in incident prevention and operational excellenceCalm, structured, and methodical approach to incident handling with strict adherence to change management and incident response processesAudit-readiness mindset with comprehensive documentation practicesAbility to drive escalations and manage stakeholder communications effectivelyExperience working within Singapore Government technology frameworksTechnical Skills & ExperienceAreaSkills RequiredCloud PlatformsHands-on production experience with Amazon Web Services or Microsoft Azure or Google Cloud PlatformInfrastructure as CodeTerraform, ARM TemplatesOperating SystemsWindows Server (2012/2016/2019/2022/2025), basic to intermediate Linux/RHEL administrationPatch ManagementAWS Patch Manager, Azure Update Management, WSUS, SCCM, YUM/DNF, Air Gapped Linux RepoApplication SupportOS-level application deployment, troubleshooting, and performance optimizationSecurity & HardeningCIS Benchmarks, security remediation, vulnerability management, IAM best practicesContainersDocker, Kubernetes, ECS, EKS, AKSDevSecOpsFamiliarity with SHIP-HATS and/or DevSecOps frameworksITIL & ITSMIncident, Problem, Change, Request Management; ServiceNow, JiraSSL/Certificate ManagementEnd-to-end SSL certificate lifecycle and renewal trackingScripting & AutomationPowerShell, Bash, Python, AWS CLI, Azure CLI, gcloud CLIDocumentationRunbooks, SOPs, Logs, and technical documentationRequired QualificationsBachelor's degree in Computer Science, Information Systems, or related fieldMinimum 3 years of experience in Commercial Cloud Engineering rolesAt least 2 years of experience in public sector or regulated cloud environmentsMinimum 3 years of hands-on experience with AWS or Microsoft Azure or Google Cloud PlatformExperience in 24/7 operational support environments with shift rotationDemonstrated experience in mentoring and leading junior engineersStrong background in ITIL processes and ITSM platforms with experiences on CIS security hardening and remediationFamiliarity with Singapore Government technology standards and frameworks (e.g., SHIP-HATS, IM8 Policy)Preferred CertificationsAWS Certified Solutions Architect – Associate / ProfessionalAWS Certified SysOps Administrator – Associate (preferred)Microsoft Certified: Azure Administrator Associate or Azure Solutions Architect ExpertMicrosoft Certified: Windows Server Hybrid Administrator AssociateRHCE or Linux Professional Institute Certification (LPIC)ITIL v3/v4 FoundationWork ArrangementsThis role requires participation in 24/7 shift rotation to support critical infrastructure operationsExtended work hours may be required during incidents, maintenance windows, and change implementationsOn-call support responsibilities as part of rotation scheduleFlexibility to work outside normal office hours for patching activities and emergency response