Site Reliability Engineer - Makati - ref. c79178415

apartmentFlexisource IT placeMakati scheduleFull-time calendar_month 

SITE RELIABILITY ENGINEER

We are seeking a skilled Site Reliability Engineer (SRE) to join our dynamic team and enhance collaboration between our development and operations teams. As an SRE, you will play a crucial role in improving customer experience, optimizing operations planning, and implementing gradual changes to maintain system reliability.

Your expertise in application monitoring, automation, metrics analysis, and incident response will be essential in ensuring the seamless delivery of our software services.

RESPONSIBILITIES:

  • Facilitate improved collaboration between development and operations teams by implementing SRE practices and fostering effective communication channels.
  • Enhance customer experience by utilizing SRE tools and automation to minimize software errors that impact end users, enabling the prioritization of new feature development.
  • Plan and execute incident response strategies to minimize downtime and mitigate the impact of failures on business operations and end users.
  • Monitor application performance using service-level agreements (SLAs), service-level indicators (SLIs), and service-level objectives (SLOs) to ensure optimal system health.
  • Collaborate with developers to identify critical performance parameters and configure monitoring tools accordingly.
  • Collect and analyze metrics to identify resource consumption, abnormal behavior, and potential performance bottlenecks.
  • Utilize logs generated by software systems to troubleshoot and investigate issues, ensuring a thorough understanding of the chain of events leading to problems.
  • Monitor and manage application latency, traffic, errors, and saturation levels to maintain high system performance.
  • Develop and implement service-level objectives (SLOs) and service-level indicators (SLIs) to set quantifiable goals and measure actual performance against those goals.
  • Collaborate with stakeholders to define and enforce service-level agreements (SLAs) that outline the consequences of not meeting established SLOs.
  • Manage operations tasks, including emergency incident response, change management, and IT infrastructure management.
  • Provide system support to development teams, assisting in the creation of new features and stabilization of production systems.
  • Conduct post-incident reviews to identify areas for improvement and document solutions in a shared knowledge base, enabling efficient problem resolution in the future.

REQUIREMENTS:

  • Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent work experience).
  • Proven experience as a Site Reliability Engineer (SRE) or in a similar role.
  • Proven AWS experience and Terraform.
  • Strong understanding of software development lifecycle and DevOps principles.
  • Proficiency in application monitoring tools and techniques.
  • Familiarity with service-level agreements (SLAs), service-level indicators (SLIs), and service-level objectives (SLOs).
  • Experience with incident response management and change management processes.
  • Solid knowledge of system performance metrics and analysis.
  • Strong troubleshooting and problem-solving skills.
  • Excellent communication and collaboration abilities, with the capacity to work effectively in cross-functional teams.
  • Knowledge of automation tools and scripting languages is preferred.
  • Familiarity with cloud technologies and distributed systems is a plus.
  • Ability to adapt to a fast-paced and dynamic work environment.
Work Details
  • Shift: Monday to Friday: 6:00am- 3:00pm or 7:00am- 4:00pm PH Time; depending on business needs
  • Location: Makati | *Work from Home Until Further Notice
  • Status: Full time / Contractor
thumb_up_altRecommended

Reliability Engineer - Manila

apartmentManila North Harbour Port, Inc.placeManila, 6 km from Makati
Develop and maintain standard operating procedures (SOPs) and technical documentation related to maintenance practices and reliability engineering practices.   6.  Review and continue development and execution of a reliability centered maintenance...
electric_boltImmediate start

Site Reliability Engineer - Manila

placeManila, 6 km from Makati
and scalability.  •  Collaborate with development teams to implement best practices for reliability and security.  •  Optimize system performance through continuous improvement and proactive maintenance.  •  Implement effective monitoring and alert systems...
apartmentConnectOSplaceMandaluyong, 3 km from Makati
Administrator, or DevOps Engineer) – Helpful for managing cloud infrastructure and services. What will you do?  •  Provide operational support and engineering solutions for core systems and Laravel-based applications.  •  Monitor and maintain system stability...