Production Engineer III

TLDR

Make a high-impact contribution to the architecture and reliability of Veeam's first global SaaS product suite while shaping a modern SRE organization.

Veeam is the Data and AI Trust Company, specializing in helping organizations ensure their data and AI are fully understood, secured, and resilient to enable the acceleration of safe AI at scale. As the market leader in both data resilience and data security posture management, Veeam is built for the convergence of identity, data, security, and AI risk. Headquartered in Seattle with offices in more than 30 countries, Veeam protects over 550,000 customers worldwide, who trust Veeam to keep their businesses running. Join us as we go fearlessly forward together, growing, learning, and making a real impact for some of the world’s biggest brands.

About the Role

As a Production Engineer, you will play a key role in supporting reliable, scalable systems for Veeam's Data Cloud platform. You will own production efficiency, automation and documentation projects, contribute to reliability and observability improvements, and own or participate in the full incident lifecycle — from on-call response, through mitigation, to leading post-incident reviews and driving improvements across support and development teams.

You will work as part of a team of skilled engineers, collaborating with support and development as a bridge and driving force for change. You will communicate with product managers and security professionals to ensure our services are production-ready, performant, and fault-tolerant, and that we rapidly incorporate user feedback into improvements

 

What You Will Do

Production

  • Own complex and escalated production issues from support, and drive long-term fixes in collaboration with engineering, including code, configuration, and architecture changes.

  • Proactively identify and address risks that are identified during the problem solving process

  • Lead production efficiency initiatives, develop and maintain processes, run-books and knowledge base integrity

Operational Excellence

  • Define, build and maintain production monitoring systems

  • Continuously improve alerting to minimize noise and ensure actionable, well-documented runbooks.

  • Define and maintain SLIs/SLOs for key services, and use error budgets to guide operational and product decisions.

  • Turn manual processes into automation

  • Own and drive post-mortem review process and actions arising from incident analysis.

Team Collaboration

  • Collaborate with support organization as an escalation point and feed back knowledge & improvement recommendations.

  • Collaborate with developers throughout the lifecycle of changes, from design through rollout and patch delivery, ensuring safe deployments and efficient incident mitigation.

  • Participate in design reviews to ensure services are operable with minimal manual intervention in production (automation, safe deployments, clear runbooks), and share learnings through documentation and feedback.

 

What We Are Looking For

  • 5+ years of experience in software engineering, site reliability, production engineering, or senior technical support roles operating distributed systems.

  • Experience with log analysis and advanced troubleshooting

  • Basic programming experience (e.g., JS, Go, Typescript, Java, or C#).

  • Experience deploying and troubleshooting systems on a public cloud platforms (Azure preferred).

  • Familiarity with observability tooling (e.g., Elastic, Prometheus, Grafana, Open Telemetry).

  • Understanding of distributed systems, networking, automation and CI/CD.

Preferred

  • Prior on-call or incident response experience.

  • Background in automation, performance testing, or service scalability.

  • Familiarity with compliance or security best practices.

Note: Juniors or Interns, requesting you to not apply as we are not hiring for the same as of now. If we are there will be separate post available for the same.

Why Join Veeam?

  • Make a high-impact contribution to the architecture and reliability of Veeam's first global SaaS product suite.

  • Help shape a modern SRE organization from the ground up, influencing best practices, tooling, and culture.

  • Collaborate with highly skilled teams across product, cloud engineering, security, and support.

  • Access professional development resources including internal mentorship, technical training platforms, and volunteer days.

  • Enjoy competitive compensation and benefits tailored to local markets in the US, Czechia, India, and Australia.

Join us and help define the future of cloud-native data protection.


Veeam Software is an equal opportunity employer and does not tolerate discrimination in any form on the basis of race, color, religion, gender, age, national origin, citizenship, disability, veteran status or any other classification protected by federal, state or local law. All your information will be kept confidential.

Please note that any personal data collected from you during the recruitment process will be processed in accordance with our Recruiting Privacy Notice.  

The Privacy Notice sets out the basis on which the personal data collected from you, or that you provide to us, will be processed by us in connection with our recruitment processes. 

By applying for this position, you consent to the processing of your personal data in accordance with our Recruiting Privacy Notice.

By submitting your application, you acknowledge that the information provided in your job application and any supporting documents is complete and accurate to the best of your knowledge. Any misrepresentation, omission, or falsification of information may result in disqualification from consideration for employment or, if discovered after employment begins, termination of employment.

Veeam Software leads the market in data resilience, offering robust solutions for data backup, recovery, portability, security, and intelligence. Our platform supports a wide range of environments—including cloud, virtual, physical, SaaS, and Kubernetes—empowering organizations to maintain control over their data, ensuring it’s always protected and available. Trusted by over 550,000 customers globally, Veeam is dedicated to helping businesses not only recover from data loss but thrive beyond it.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Production Engineer Q&A's
Report this job
Apply for this job