Senior System Reliability Engineer, AI Platform

AI overview

Drive operational reliability and compliance by designing governance frameworks and maintaining the resilient infrastructure of Veeam’s intelligent automation ecosystem.

Veeam, the #1 global market leader in data resilience, believes businesses should control all their data whenever and wherever they need it. Veeam provides data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep their businesses running. Join us as we move forward together, growing, learning, and making a real impact for some of the world’s biggest brands. The future of data resilience is here - go fearlessly forward with us.

About the Role

As a Senior AI Platform & Reliability Engineer, you will play a key role in ensuring the stability, security, and scalability of Veeam’s intelligent automation ecosystem. You’ll partner with engineering, security, and platform teams to build governance frameworks and resilient infrastructure that support both professional and citizen developers. Your work will help enable safe, scalable AI-driven automation across the organization while maintaining strong operational reliability and compliance.

What You’ll Do

  • Design and implement governance frameworks for automation and AI platforms across the enterprise
  • Manage deployment, scaling, and reliability of automation tools and supporting infrastructure
  • Build monitoring, observability, and alerting systems to ensure platform health and performance
  • Support automated incident response and recovery workflows to improve platform resilience
  • Drive lifecycle management practices to enable smooth promotion of automation assets to production
  • Ensure secure integrations, identity management, and compliance with company security standards
  • Collaborate with engineering and security teams to improve platform reliability and operational efficiency

Technologies You'll Work With

  • Microsoft Azure
  • Microsoft Power Platform
  • Copilot Studio
  • Microsoft Foundry
  • Automation platforms such as n8n, Zapier, or similar tools
  • Observability and monitoring platforms
  • Identity and security tooling

What You’ll Bring

  • 7+ years of experience in site reliability engineering, cloud architecture, or systems engineering
  • Strong expertise in Azure cloud services and enterprise cloud environments
  • Hands-on experience with Microsoft Power Platform, Copilot Studio, or similar automation ecosystems
  • Experience managing and scaling automation platforms in complex environments
  • Proficiency in scripting or automation using modern tools or languages
  • Ability to design secure, scalable platform architectures
  • Strong collaboration skills and experience working across engineering and security teams

Bonus Skills

  • Experience building governance frameworks or platform operating models
  • Familiarity with enterprise observability, incident automation, or self-healing systems
  • Knowledge of identity architecture, DLP strategies, or tenant-level security models
  • Experience supporting citizen developer platforms or internal automation programs
  • Exposure to AI platform operations or enterprise automation strategy

What You’ll Get 

  • Two weeks of paid vacation, 12 statutory holidays, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
  • Paid parental leave: 8 days for fathers, 122 days for birthing parents, 92 days for adoptive parents
  • Medical, dental, and vision coverage fully funded through INS Premium for employees and dependents
  • Mental health support, therapy sessions, and virtual care via our Employee Assistance Program
  • Retirement and social security contributions through Costa Rica’s statutory programs
  • Life insurance equal to 24x monthly salary, plus disability and funeral coverage
  • Daily cafeteria subsidy
  • Fertility, adoption, and surrogacy support, plus 24 paid volunteer hours through Veeam Cares
  • Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning

Please note: The position is based in San Jose, Costa Rica. If the applicant is permanently located outside of Costa Rica, Veeam reserves the right to decline the application. All applications must be submitted in English.

#LI-FT1
#LI-REMOTE


Veeam Software is an equal opportunity employer and does not tolerate discrimination in any form on the basis of race, color, religion, gender, age, national origin, citizenship, disability, veteran status or any other classification protected by federal, state or local law. All your information will be kept confidential.

Please note that any personal data collected from you during the recruitment process will be processed in accordance with our Recruiting Privacy Notice.  

The Privacy Notice sets out the basis on which the personal data collected from you, or that you provide to us, will be processed by us in connection with our recruitment processes. 

By applying for this position, you consent to the processing of your personal data in accordance with our Recruiting Privacy Notice.

By submitting your application, you acknowledge that the information provided in your job application and any supporting documents is complete and accurate to the best of your knowledge. Any misrepresentation, omission, or falsification of information may result in disqualification from consideration for employment or, if discovered after employment begins, termination of employment.

Perks & Benefits Extracted with AI

  • Health Insurance: Medical, dental, and vision coverage fully funded through INS Premium for employees and dependents.
  • Learning and development opportunities: Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning.
  • Support for fertility and adoption: Fertility, adoption, and surrogacy support, plus 24 paid volunteer hours through Veeam Cares.
  • Paid Parental Leave: Paid parental leave: 8 days for fathers, 122 days for birthing parents, 92 days for adoptive parents.
  • Vacation and paid volunteer hours: Two weeks of paid vacation, 12 statutory holidays, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares.
  • Wellness Stipend: Mental health support, therapy sessions, and virtual care via our Employee Assistance Program.

Veeam®, the #1 global market leader in data protection and ransomware recovery, is on a mission to empower every organization to not just bounce back from a data outage or loss but bounce forward. With Veeam, organizations achieve radical resilience through data security, data recovery, and data freedom for their hybrid cloud.  The Veeam Data Platform delivers a single solution for cloud, virtual, physical, SaaS, and Kubernetes environments that gives IT and security leaders peace of mind that their apps and data are protected and always available. Headquartered in Seattle with offices in more than 30 countries, Veeam protects over 450,000 customers worldwide, including 74% of the Global 2000, who trust Veeam to keep their businesses running.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Reliability Engineer Q&A's
Report this job
Apply for this job