Senior Data Platform Reliability Engineer | Onsite

apartmentTerraBarn Inc placeMandaluyong scheduleFull-time calendar_month 

About OpsWerks

OpsWerks is a technical consulting company specializing in operational services for the high-tech industry. We partner with platform and infrastructure teams to operate multi-cloud environments, execute complex migrations, and enable seamless, scalable application deployments.

Your Role

As a Senior Data Platform Engineer, you will be responsible for the operation, reliability, and continuous improvement of data platforms running on Kubernetes (on-premise and/or AWS/GCP), including frameworks such as DoEKS (Data on EKS) and AIoEKS (AI on EKS).

Key Responsibilities
  • Operate, maintain, and enhance data platforms deployed on Kubernetes environments
  • Deploy platform updates, releases, and configuration changes using GitOps/DevOps practices
  • Monitor system health using logs, metrics, and observability tools to ensure high availability
  • Participate in incident response, root cause analysis (RCA), and 24x7 on-call rotations
  • Improve platform reliability through automation, observability, and self-service tooling
  • Troubleshoot user and system issues, including integrations, performance bottlenecks, and misconfigurations
  • Collaborate with cross-functional teams to ensure seamless data platform operations
  • Provide technical mentorship and guidance to junior engineers
  • Champion platform standards, security best practices, and operational excellence
Qualifications
  • 3+ years experience supporting production data platforms (e.g., Spark, Airflow, Jupyter)
  • 5+ years hands-on experience in ETL/ELT pipelines, data processing, and transformation (Python/Java & SQL)
  • Strong experience with Kubernetes, including managed services (AWS EKS / GCP GKE)
  • Solid understanding of Linux systems, microservices architecture, and service communication patterns
  • Strong troubleshooting skills (application failures, latency, scaling, resource contention)
  • Proficiency in monitoring, logging, and observability tools (e.g., Prometheus, Grafana, ELK, Splunk)
Nice to Have
  • Experience with modern data/AI platforms: Flink, Trino, Druid, Ray.io
  • Automation and scripting skills (Bash, Python)
  • Relevant certifications (e.g., CKAD, AWS Certified Data Engineer)
Why Join OpsWerks?
  • Work on cutting-edge data platforms in multi-cloud environments
  • Exposure to large-scale, enterprise-grade infrastructure
  • Collaborative, engineering-driven culture focused on reliability and innovation
  • Opportunities for technical growth, certification, and mentorship
apartmentGenpactplaceMakati, 3 km from Mandaluyong
Site Reliability Engineer Makati City Ready to build the future with AI? At Genpact, we don’t just keep up with technology—we set the pace. AI and digital innovation are redefining industries, and we’re leading the charge. Genpact’s AI...
placeMandaluyong
Supports the technology systems performance and reliability to meet service level targets. Assists with the creation and deploys continuous performance and capacity models using performance and availability monitoring tools, processes...
apartmentLondon Stock Exchange GroupplaceTaguig, 8 km from Mandaluyong
We are seeking a highly motivated and experienced Senior Associate to join the Shared Site Reliability Engineering (SRE) team supporting Risk Intelligence Services within the Markets and Risk Intelligence division. This role is essential...