Cloud Site Reliability Engineer

Yarmouth , United States

AI overview

Enhance the reliability and performance of cloud-based solutions by collaborating across teams and implementing observability tooling and incident response best practices.
Responsibilities
  • Implement observability tooling to monitor AWS EKS-based systems focusing on performance, reliability, and scalability.
  • Participate in on-call rotations, providing critical support as needed. Ensure timely response to incidents and support requests, collaborating effectively on solutions.
  • Conduct root cause analysis and implement preventative measures to minimize toil and impact on customers.
  • Lead and participate in incident retrospectives to enhance future response efforts.
  • Ensure that architecture and deployment models are sufficient to support SLA commitments and are well prepared for future problems of scale.
  • Apply software engineering best practices to comprehensively address and resolve problems.
  • Collaborate with product support teams to improve reliability, drive efficiency, and enhance customer experience through self-service tools and automation.
Qualifications
  • 2-3+ years of a successful cloud operations, software engineering, or technical operations career within reputable technology firms, particularly with large-scale cloud applications.
  • Experience building observability platforms and using monitoring tools, such as Datadog.
  • Experience driving incident response best practices and using incident management tools, such as PagerDuty and JSM Ops.
  • Experience deploying and supporting containerized applications on cloud platforms, preferably EKS on AWS.
  • Experience with infrastructure as code technologies, such as Terraform.
  • Experience with languages like Python, JavaScript, or Go.
  • Familiarity with DevOps and CI/CD methodologies.
  • Bachelor’s degree in Computer Science or related field.
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job