Site Reliability Engineer

AI overview

Support reliability, scalability, and performance of applications on AWS and Kubernetes, focusing on solving complex Tier 3 issues and enhancing customer experience.

 

Important Information:

  • Years of Experience: 5+ years

  • Job Mode: Full-time

  • Work Mode: Remote within Mexico

Position Overview 

We are looking for a Site Reliability Engineer (SRE) to join our team and ensure the reliability, scalability, and performance of custom platforms built on AWS infrastructure and Kubernetes containers. This role will focus on resolving Tier 3 issues, collaborating with engineering teams to prepare operations for new releases, and proactively improving platform stability and customer experience. 

Key Responsibilities 

  • Troubleshoot and resolve Tier 3 platform issues on AWS-based custom applications. 
  • Work closely with engineering teams to prepare Operations for new releases and feature enhancements. 
  • Identify recurring issues and develop automated solutions or process improvements. 
  • Implement strategies to enhance platform reliability, scalability, and performance. 
  • Monitor system health and proactively address potential risks. 
  • Collaborate with internal stakeholders to improve customer experience and product robustness. 
  • Participate in incident response, root cause analysis, and post-mortem reviews. 
  • Contribute to documentation, runbooks, and operational readiness plans. 

Required Qualifications 

  • Hands-on experience with AWS cloud infrastructure and services. 
  • Strong knowledge of Kubernetes and container orchestration. 
  • Proficiency in Python or Go for scripting and automation. 
  • Experience in platform support, troubleshooting, and performance optimization. 
  • Familiarity with CI/CD pipelines, monitoring tools, and observability practices. 
  • Strong problem-solving skills and an engineering mindset. 
  • Familiarity with Observability stack (Open Source) PLGJ Prometheus + Loki + Grafana + Jaeger 
  •  

Preferred Qualifications 

  • Experience with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation. 
  • Knowledge of microservices architecture and distributed systems. 
  • Exposure to DevOps practices and SRE principles. 
  • AWS certifications (Solutions Architect, SysOps, or DevOps Engineer) are a plus. 

Soft Skills 

  • Excellent communication and collaboration skills. 
  • Ability to work in a fast-paced, dynamic environment. 
  • Strong analytical and critical thinking abilities. 

Encora provides tailored software engineering and digital product development solutions for fast-growing technology companies. With a global team of over 9,000 experts, we specialize in a wide range of practices, including cloud services, product engineering, and AI engineering, making us a trusted partner for enterprises looking to innovate and modernize their digital infrastructure.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job