Site Reliability Engineer
Converge ICT Solutions Pasig Temporary
Key Responsibilities
Production Operations & System Reliability- Maintain the reliability, availability, and performance of cloud-native, distributed production environments.
- Write software and scripts to automate routine operational tasks, eliminating repetitive manual work ("toil").
- Participate in system architectural reviews to ensure new features and services conform to scalability and reliability standards.
- Build, maintain, and optimize the enterprise observability stack (metrics, logs, traces) to ensure comprehensive visibility into system health.
- Create and refine real-time dashboards and proactive alerting rules to catch anomalies before they impact end-users.
- Help define, track, and report on critical reliability metrics, including Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
- Participate in the team's on-call rotation, acting as a first responder to mitigate active production degradation and outages.
- Lead and support technical troubleshooting efforts across application, network, and database layers during incidents.
- Conduct thoroughly documented, blameless post-mortems to identify root causes and track long-term engineering fixes to prevent recurrence.
- Provision, configure, and maintain multi-region cloud infrastructure exclusively using Infrastructure as Code (IaC).
- Collaborate with DevOps and Platform teams to build and support safe, automated deployment pipelines (CI/CD) featuring canary or blue/green release patterns.
- Ensure data safety by maintaining and regularly testing disaster recovery (DR), backup, and automated failover systems.
- Analyze system utilization trends and perform capacity planning to handle traffic surges efficiently and cost-effectively.
- Introduce chaos engineering principles by intentionally injecting faults into controlled environments to uncover hidden system weaknesses.
- Conduct load, stress, and performance testing on core infrastructure components to identify bottlenecks.
- Work hand-in-hand with software engineering teams to help them architect their applications for high availability and fault tolerance.
- Create and update high-quality documentation, operational runbooks, and architectural diagrams.
- Champion an internal culture of engineering-driven operations and reliability best practices.
Job Requirements
Qualifications and Experience- Education: Bachelor’s degree in Computer Science, Computer Engineering, Information Technology, or a related field (or equivalent practical experience).
- Experience: Minimum of 3–5 years of experience in an SRE, DevOps, or Software Engineering role supporting production environments.
- Systems & Networking: Deep understanding of Linux/Unix systems internals, networking protocols, and performance tuning.
- Software Development: Solid background in software development with the ability to write clean, maintainable code for automation and internal tooling.
- Work Setup: Must be willing to work strictly On-site.
- Programming/Scripting: High proficiency in languages such as Python, Go, Bash, or Java.
- Cloud Architecture: Hands-on experience with major cloud platforms (AWS, GCP, or Azure), specifically around networking, IAM, and managed services.
- Containers & Orchestration: Strong operational knowledge of Docker and production Kubernetes (EKS, GKE, AKS, or self-managed).
- Infrastructure as Code: Proficiency with Terraform, OpenTofu, or Pulumi.
- Observability Tools: Experience configuring and querying tools like Prometheus, Grafana, OpenTelemetry, ELK Stack, Datadog, or New Relic.
- Core Networking: Solid understanding of TCP/IP, HTTP/S, DNS, load balancers, and CDN caching layers.
- Analytical Mindset: Exceptional troubleshooting skills; able to methodically isolate failures in complex, distributed architectures under high-pressure scenarios.
- Collaboration: Excellent communication skills with a proven track record of bridging the gap between product developers and infrastructure engineers.
- Continuous Learning: Curiosity and eagerness to keep up with the evolving cloud-native and CNCF ecosystem.
- Kubernetes: Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD).
- Cloud Platforms: AWS Certified Solutions Architect (Associate or Professional), AWS Certified DevOps Engineer, or Google Cloud Professional Cloud DevOps Engineer.
- HashiCorp: Terraform Associate.
Converge ICT SolutionsPasig
and Experience
• Education: Bachelor’s degree in Computer Science, Information Systems, Engineering, or equivalent practical experience.
• Core Experience: Minimum of 3–5 years of professional experience in a DevOps, Platform Engineering, or Site Reliability...
Taguig, 6 km from Pasig
Job Description
Posted on 19 June 2026
Ka-Eastern Engineer for Service Delivery Engineer is responsible for ensuring high‑quality, timely, and efficient installation, isolation, and restoration of data circuits—including IPLC, Domestic Leased Lines...
Union Bank Of The PhilippinesPasig
Job Description
Automation Test Engineer
Location: UnionBank Plaza, Ortigas, Pasig City
Employment Type: Full-time
About The Role
The Automation Test Engineer responsible for creating and maintaining automated test scripts to validate...