FBS Site Reliability Engineer

AI overview

Focus on ensuring solutions are highly available by performing infrastructure and code reviews, participating in chaos testing, and improving application resilience.

Our Client is one of the United States’ largest insurers, providing a wide range of insurance and financial services products with gross written premiums well over US$25 Billion (P&C). They proudly serve more than 10 million U.S. households with more than 19 million individual policies across all 50 states through the efforts of over 48,000 exclusive and independent agents and nearly 18,500 employees. Finally, our Client is part of one the largest Insurance Groups in the world.

Job Summary

This position will focus on infrastructure & code reviews to ensure solutions built and delivered are Highly Available and to minimize unplanned downtime.

Key Responsibilities

•Expert troubleshooter within IT who has broad technical experience in multiple disciplines of IT and is willing to help our Incident and Problem Management teams

•Understand root cause and the necessary tasks needed to ensure this incident does not recur.

•Validate root cause of incidents in nonproduction regions, ensuring that the cause is validated and then work with teams to determine the best approach to resolve.

•Participate in chaos testing - where we leverage a third-party tool to disable functions on a server and we verify that we can alert teams to the failure and then assemble a technical troubleshooting call to identify and restore the service.

•Leverage Observability tools set to define key transactions and observe their performance within systems

•Create golden signal reporting and error budgets for development teams. Must know the framework

•Perform failure analysis, leveraging chaos testing practices to break nonproduction systems to find weak points and work with infrastructure and development teams to improve the applications resilience.

Requirements

•At least 6 years of experience in a similar role as a Reliability Engineer or Resilience Engineer

•Full English Fluency

•BS in Computer Science or similar

•Very strong experience using Code (writing, testing leveraging observability process) Ideally JAVA, C++

•Hands on approach, troubleshooting, very technical background.

Technical & Business Skills

  • Site Reliability Engineer - Advanced
  • Trend & Pattern Analysis – Advanced, Optimization,
  • Resilience Engineering – Advanced
  • Golden Signal Cyber Reliability (MUST)
  • Dynatrace - Intermediate (4-6 Years) Desirable, not a must, any other Observabilty tool
  • Gremlin - Entry Level (1-3 Years) Chaos testing, Failure modeling experience or similiar (Very Desirable)
  • Cloud Infrastructure, Experience: AWS / Azure / GCP - Intermediate (4-6 Years)
  • Strong Coding experience

Benefits

This position comes with competitive compensation and benefits package:

  1. Competitive salary and performance-based bonuses
  2. Comprehensive benefits package
  3. Career development and training opportunities
  4. Flexible work arrangements (remote and/or office-based)
  5. Dynamic and inclusive work culture within a globally renowned group
  6. Private Health Insurance
  7. Pension Plan
  8. Paid Time Off
  9. Training & Development

About Capgemini

Capgemini is a global leader in partnering with companies to transform and manage their business by harnessing the power of technology. The Group is guided everyday by its purpose of unleashing human energy through technology for an inclusive and sustainable future. It is a responsible and diverse organization of over 340,000 team members in more than 50 countries. With its strong 55-year heritage and deep industry expertise, Capgemini is trusted by its clients to address the entire breadth of their business needs, from strategy and design to operations, fueled by the fast evolving and innovative world of cloud, data, AI, connectivity, software, digital engineering and platforms. The Group €22.5 billion in revenues in 2023.

Perks & Benefits Extracted with AI

  • Health Insurance: Private Health Insurance
  • Other Benefit: Pension Plan
  • Paid Time Off: Paid Time Off

Get the future you want At Capgemini, we are driven by a shared purpose: Unleashing human energy through technology for an inclusive and sustainable future.Technology shapes the way we live our lives. How we work, learn, move and communicate. That means our technology expertise, combined with our business knowledge, does more than help you transform and manage your business. It can help you realize a better future and create a more sustainable, inclusive world.It’s a responsibility we don’t take lightly. That’s why, since our inception more than 50 years ago, we have always acted as a partner to our clients, not a service provider. A diverse collective of nearly 350,000 strategic and technological experts across more than 50 countries, we are all driven by one shared passion: to unleash human energy through technology.As we leverage cloud, data, AI, connectivity, software, digital engineering, and platforms to address the entire breadth of business needs, this passion drives a powerful commitment. To unlock the true value of technology for your business, our planet, and society at large. From advancing the digital consumer experience, to accelerating intelligent industry and transforming enterprise efficiency, we help you look beyond ‘can it be done?’ to define the right path forward to a better future.

View all jobs
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job