Site Reliability Engineer (SRE)

AI overview

Ensure the reliability, scalability, and performance of custom platforms on AWS and Kubernetes while enhancing platform stability and customer experience.

Important Information 

Location: Brazil
Job Mode: Full-time 
Work Mode: Work from home

 

Job Summary

We are looking for a Site Reliability Engineer (SRE) to join our team and ensure the reliability, scalability, and performance of custom platforms built on AWS infrastructure and Kubernetes containers. This role will focus on resolving Tier 3 issues, collaborating with engineering teams to prepare operations for new releases, and proactively improving platform stability and customer experience. 

 

Responsibilities and Duties

  • Troubleshoot and resolve Tier 3 platform issues on AWS-based custom applications;
  • Work closely with engineering teams to prepare Operations for new releases and feature enhancements;
  • Identify recurring issues and develop automated solutions or process improvements;
  • Implement strategies to enhance platform reliability, scalability, and performance;
  • Monitor system health and proactively address potential risks;
  • Collaborate with internal stakeholders to improve customer experience and product robustness;
  • Participate in incident response, root cause analysis, and post-mortem reviews;
  • Contribute to documentation, runbooks, and operational readiness plans. 

 

Essential Skills

  • Hands-on experience with AWS cloud infrastructure and services;
  • Strong knowledge of Kubernetes and container orchestration;
  • Proficiency in Python or Go for scripting and automation;
  • Experience in platform support, troubleshooting, and performance optimization;
  • Familiarity with CI/CD pipelines, monitoring tools, and observability practices;
  • Strong problem-solving skills and an engineering mindset. 

 

Highly Desirable Skills

  • Experience with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation;
  • Knowledge of microservices architecture and distributed systems;
  • Exposure to DevOps practices and SRE principles;
  • AWS certifications (Solutions Architect, SysOps, or DevOps Engineer) are a plus. 

 

 

About Encora

Encora is the preferred digital engineering and modernization partner of some of the world’s leading enterprises and digital native companies. With over 9,000 experts in 47+ offices and innovation labs worldwide, Encora’s technology practices include Product Engineering & Development, Cloud Services, Quality Engineering, DevSecOps, Data & Analytics, Digital Experience, Cybersecurity, and AI & LLM Engineering.

At Encora, we hire professionals based solely on their skills and qualifications, and do not discriminate based on age, disability, religion, gender, sexual orientation, socioeconomic status, or nationality.

Encora specializes in delivering customized software engineering solutions and digital product development services to fast-growing technology firms, leveraging advanced technologies to foster innovation and growth across various industries.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job