We are seeking an experienced DevOps Engineer to join our team. In this role, you will be a key technical expert ensuring the reliability, scalability, and performance of our systems. You will partner with development, infrastructure, and operations teams to design architectures, lead incident response, and drive continuous improvement across our platforms.

This role requires work and assists during U.S. hours or CET hours 5:00 PM - 1:00 AM.

What you’ll do:

Partner with development, infrastructure, and operations teams to design highly available, fault-tolerant, and disaster recovery–ready systems.
Implement Infrastructure-as-Code (e.g., Terraform) to automate provisioning, scaling, and management of cloud services (AWS, Azure).
Lead and support incident triage, resolution, and recovery efforts during critical events.
Provide advanced troubleshooting expertise and guide teams during outages.
Conduct detailed postmortems, document lessons learned, and drive improvements to reduce Mean Time to Recovery (MTTR).
Collaborate with developers, QA, and product teams to embed reliability principles throughout the software development lifecycle.
Mentor peers on observability tools, performance optimization, and SRE best practices.
Identify opportunities for continuous improvement in reliability, performance, and cost efficiency.
Evaluate and recommend emerging technologies to enhance scalability and resilience.
Contribute to internal documentation, ensuring best practices are accessible across the organization.

What we’re looking for:

4+ years of experience in DevOps, Site Reliability Engineering, or a related role.
Proven track record as a technical lead or subject matter expert (no direct people management required).
Hands-on expertise with cloud platforms (AWS, Azure) and Infrastructure-as-Code (Terraform preferred).
Strong understanding of systems architecture, high availability, fault tolerance, and disaster recovery.
Experience leading incident response and conducting root cause analysis.
Familiarity with observability tools and performance optimization practices.
Strong collaboration and communication skills with the ability to mentor peers and influence best practices.

Nice to Have:

Experience with containerization and orchestration (e.g. Kubernetes, Docker).
Familiarity with CI/CD pipelines and automation frameworks.
Knowledge of security best practices in cloud environments.
Exposure to cost optimization strategies for large-scale cloud infrastructure.
Contributions to open-source projects or active engagement in the DevOps/SRE community.

We appreciate the interest of all applicants. Please note that only those whose qualifications align closely with the position requirements will be contacted for the next steps in the selection process.

All applications will be handled with confidentiality.

IW_Privacy Protection Statement for Job Applicants

Site Reliability Engineer (Serbia)

AI overview