Senior Site Reliability Engineer

TLDR

Design and scale high-performing systems while improving resilience and deployment practices, contributing to a culture of operational excellence.

Who we are

DigiCert is a global leader in intelligent trust. We protect the digital world by ensuring the security, privacy, and authenticity of every interaction. Our AI-powered DigiCert ONE platform unifies PKI, DNS, and certificate lifecycle management, to secure infrastructure, software, devices, messages, AI content and agents. Learn why more than 100,000 organizations, including 90% of the Fortune 500, choose DigiCert to stop today’s threats and prepare for a quantum-safe future at www.digicert.com

 

Job summary

As a Senior Site Reliability Engineer, you’ll play a key role in designing, operating, and scaling reliable, high-performing systems that support critical business services. You’ll partner closely with engineering teams to improve system resilience, observability, and deployment practices while driving automation across infrastructure and operations. This role is hands-on and highly technical, with ownership over Kubernetes-based platforms, cloud infrastructure, and CI/CD pipelines. You’ll also help shape engineering best practices, mentor teammates, and contribute to a culture of continuous improvement and operational excellence.

 

What you will do

  • Design, implement, and maintain highly available and scalable systems
  • Improve system reliability, performance, and observability
  • Automate infrastructure provisioning, configuration, and operational tasks
  • Support and evolve Kubernetes-based platforms, including cluster management
  • Collaborate with development teams to enable CI/CD and deployment best practices
  • Participate in an on-call rotation to support production systems and respond to incidents
  • Troubleshoot production issues across distributed systems
  • Help tutor and mentor other team members by sharing knowledge, best practices, and guidance
  • Contribute to infrastructure standards, documentation, and continuous improvement initiatives

Technologies You’ll Work With

  • Operating Systems: Linux & Windows
  • Scripting: Bash
  • Version Control: Git
  • Configuration Management: Salt
  • Container Orchestration & Management: Kubernetes, Rancher
  • CI/CD & Delivery: Harness, GitActions
  • Infrastructure as Code: Terraform
  • Cloud Platforms: Private and public cloud environments (e.g., AWS, Azure, GCP, or equivalents)

 

What you will have

  • 5+ years of experience in Site Reliability Engineering, DevOps, or similar roles
  • Strong Linux systems administration and Bash scripting experience
  • Hands-on experience running and supporting Kubernetes in production
  • Experience with Kubernetes management platforms such as Rancher
  • Proven experience with Infrastructure as Code (Terraform preferred)
  • Experience building, maintaining, and supporting CI/CD pipelines
  • Solid understanding of cloud infrastructure (public and/or private)
  • Strong troubleshooting skills across complex, distributed systems
  • Comfortable collaborating across teams and mentoring junior or mid-level engineers

 

Nice to have

  • Experience with observability tools (monitoring, logging, alerting)
  • Experience with large-scale or high-availability systems
  • Familiarity with security best practices in cloud-native environments
  • AEM Cloud management experience
  • Experience supporting regulated or mission-critical environments

 

Benefits

  • Generous time off policies  
  • Top shelf benefits  
  • Education, wellness and lifestyle support  

 

#LI-KK1

 

DigiCert is a global leader in intelligent trust, providing organizations with comprehensive solutions to protect their digital interactions. Through our AI-powered DigiCert ONE platform, we deliver a suite of tools for managing certificates, DNS, and risk mitigation, ensuring secure and authentic digital experiences for businesses across industries.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Senior Site Reliability Engineer Q&A's
Report this job
Apply for this job