Sr. Site Reliability Engineer, Incident Excellence

US (Remote); #LI-Remote

Our Organization

HashiCorp helps solve development, operations, and security challenges in infrastructure so organizations can focus on business-critical tasks. We build products to give organizations a consistent way to manage their move to cloud-based IT infrastructures for running their applications.

We use the Tao of HashiCorp as our guiding principles for product development and operate according to a strong set of company principles for how we interact with each other. We value top-notch collaboration and communication skills, both among internal teams and in how we interact with our users.

Our Team

The HashiCorp Incident Excellence team is responsible for improving HashiCorp’s incident response while maximizing learning from incidents. Our focus is on helping all engineers feel confident when they are on-call and improving communication to efficiently resolve incidents and build trust in our brand. We partner closely with teams to drive a holistic incident management strategy and share learnings to help our business continuously improve.

About this Role

This engineering role is on a nascent engineering team. The team is responsible for products that touch many areas of engineering organizations at HashiCorp, so applicants will need to excel at collaboration, have product-focused mindsets, and be comfortable iterating in an agile manner towards solutions.

In this role, you can expect to:

Utilize your professional software engineering experience to periodically solve problems, build automation, and create components of our incident lifecycle management processes.

Coordinate disaster recovery processes and identify strategic process improvements.
Be responsible for and drive incident management capabilities and culture.

Participate in incident command on-call rotation.
Support incident management tooling.
Build technical skills and relationships within a team of engineers and SREs.
Learn, teach, and collaborate cross-functionally.

You may be a good fit for our team if:

Professional experience designing or operating disaster recovery processes in a distributed cloud environment.
Professional experience with incident management in cloud environments.
Enjoy working on a variety of scopes spanning software engineering, cloud infrastructure, and SRE.
Experience contributing to efficiency improvements of software at scale.
Experience collaborating cross-functionally to deliver engineering culture change.
Worked on infrastructure teams in customer-centric and agile organizations with empathy and compassion
Worked with SaaS or another type of managed software offering
Experience in one or more of the major public clouds

Individual pay within the range will be determined based on job related-factors such as skills, experience, and education or training.

The base pay range for this role in the SF Bay Area / NYC area is:

$176,500—$207,600 USD

The base pay range for this role in Seattle Metro, Denver / Boulder Metro, New York (excluding NYC), Washington D.C., or California (excluding SF Bay Area) is:

$161,800—$190,300 USD

The base pay range for this role in Colorado (excluding Denver / Boulder Metro) and Washington (excluding Seattle Metro) is:

$147,100—$173,000 USD

Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Senior Site Reliability Engineer Q&A's

Report this job