Exadel Inc is hiring a

Middle Site Reliability Engineer

We're looking for a talented Middle Site Reliability Engineer to be embedded within the product development team and manage those applications’ overall reliability and availability.

Work at Exadel - Who We Are 

Since 1998, Exadel has been engineering its products and custom software for clients of all sizes. Headquartered in Walnut Creek, California, Exadel has 2,000+ employees in development centers across America, Europe, and Asia. Our people drive Exadel’s success and are at the core of our values.

About the Customer

The leading provider of vehicle lifecycle solutions, with headquarters in Chicago, enables the companies that build, insure, and replace vehicles to power the next generation of transportation. Its platform delivers advanced mobile, artificial intelligence, and car technologies. It connects a network of 350+ insurance companies, 24,000+ repair facilities, hundreds of parts suppliers, and dozens of third-party data and service providers. The customer's collective solutions enhance productivity and help clients deliver better experiences for end consumers.

Project Tech Stack

Java 8, Java 11, Spring MVC, Spring Boot, Docker, Kubernetes, Weblogic, Oracle, Postgres, Query, Hibernate, Vue.Js, AngularJS

Requirements

  • Understanding of troubleshooting Java web applications (issue resolution, escalations)
  • Proficiency in the full software delivery lifecycle
  • Experience with AWS cloud watch implementation is preferred (experience with similar solutions from other cloud providers could be considered, too)
  • Knowledge of Kubernetes
  • Background in using application monitoring tools (for example, Grafana, Prometheus, APPD/Dynatrace/Datadog, or similar)
  • Skills in managing deployment pipelines using tools such as Jenkins and/or GitHub
  • Ability to demonstrate strong skills in observability implementation on large-scale enterprise web applications and microservice frameworks 
  • Capability to analyze and troubleshoot complicated, cross-platform issues by handling OS, Networking, Database (SQL), and applications in cloud-based environments
  • Proven facility to dig through metrics, logs, and available sources to triage and resolve an incident
  • Capacity to document solutions, SRE architectural patterns, and best practices to ensure that teams have guidance as needed

Nice to Have

  • Background in Java web application development
  • Any OTel implementation experience
  • Proficient in infrastructure as code practices
  • Knowledge of building CI/CD pipelines from scratch

Responsibilities

  • Monitor application/infrastructure and take steps to improve overall system software performance, availability, and reliability by incorporating changes through defined feedback loops within the software delivery lifecycle
  • Configuring and maintaining the monitoring tooling as it relates to the target application
  • Document tribal knowledge as you acquire it over time by creating runbooks/playbooks and ensuring critical system information is readily available to those who need it through dashboards
  • Resolve NOC escalations and help prevent the reiteration of incidents by creating processes and automation
  • Apply automation to any tasks/parts of the system that are performed manually
  • Collaborate Work closely with software developers and testers to ensure the product is responding correctly to non-functional requirements such as security, performance, and availability
  • Be a key part of our response to high-severity internal customer incidents, ensuring we meet all SLAs and SLOs
  • Embrace failures and treat incidents as learning opportunities through conducting blameless postmortem reports
  • Participate in product engineering stand-ups and related design activities
  • Coach other team members to ensure systems are supported by following SRE best practices

Advantages of Working with Exadel

Exadel is a global company, and benefits can vary depending on your location and contract type. Your recruiter will provide specific information about the benefits available to you.

  • International projects
  • In-office, hybrid or remote mode
  • Medical healthcare
  • Recognition program
  • Professional & personal development opportunities
  • Foreign languages classes
  • Well-being program
  • Corporate events
  • Sports compensation
  • Referral program
  • Equipment provision
  • Paid vacation & sick days
Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job