Site Reliability Engineer

AI overview

Join a dynamic team to enhance the reliability, scalability, and performance of platforms leveraging AWS and Kubernetes, focusing on proactive stability improvements and collaboration.

 

Important Information:

  • Years of Experience: 5+ years

  • Job Mode: Full-time

  • Work Mode: Remote within Mexico

Position Overview 

We are looking for a Site Reliability Engineer (SRE) to join our team and ensure the reliability, scalability, and performance of custom platforms built on AWS infrastructure and Kubernetes containers. This role will focus on resolving Tier 3 issues, collaborating with engineering teams to prepare operations for new releases, and proactively improving platform stability and customer experience. 

Key Responsibilities 

  • Troubleshoot and resolve Tier 3 platform issues on AWS-based custom applications. 
  • Work closely with engineering teams to prepare Operations for new releases and feature enhancements. 
  • Identify recurring issues and develop automated solutions or process improvements. 
  • Implement strategies to enhance platform reliability, scalability, and performance. 
  • Monitor system health and proactively address potential risks. 
  • Collaborate with internal stakeholders to improve customer experience and product robustness. 
  • Participate in incident response, root cause analysis, and post-mortem reviews. 
  • Contribute to documentation, runbooks, and operational readiness plans. 

Required Qualifications 

  • Hands-on experience with AWS cloud infrastructure and services. 
  • Strong knowledge of Kubernetes and container orchestration. 
  • Proficiency in Python or Go for scripting and automation. 
  • Experience in platform support, troubleshooting, and performance optimization. 
  • Familiarity with CI/CD pipelines, monitoring tools, and observability practices. 
  • Strong problem-solving skills and an engineering mindset. 
  • Familiarity with Observability stack (Open Source) PLGJ Prometheus + Loki + Grafana + Jaeger 
  •  

Preferred Qualifications 

  • Experience with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation. 
  • Knowledge of microservices architecture and distributed systems. 
  • Exposure to DevOps practices and SRE principles. 
  • AWS certifications (Solutions Architect, SysOps, or DevOps Engineer) are a plus. 

Soft Skills 

  • Excellent communication and collaboration skills. 
  • Ability to work in a fast-paced, dynamic environment. 
  • Strong analytical and critical thinking abilities. 

Encora specializes in delivering customized software engineering solutions and digital product development services to fast-growing technology firms, leveraging advanced technologies to foster innovation and growth across various industries.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job