Xero is hiring a

Lead Engineer - Site Reliability (Metrics and Monitoring)

Melbourne, Australia
Full-Time
About the role:

As a Lead Engineer within the Site Reliability Engineering (SRE) Metrics & Monitoring team you’ll have a thorough knowledge of industry-leading observability practices and extensive hands on experience. You’ll have a proven ability to provide strong technical mentorship, guiding engineers to upskill and enabling a focus on continual improvement. Working closely with the Product Manager and Team Lead, you’ll contribute your technical expertise and leadership to align team deliverables with the wider SRE and Xero initiatives. 

You’ll help to adapt and grow observability at Xero, informed by a strong understanding of systems and reliability engineering and modern SRE principles. You’ll drive uplift in observability at Xero, paving the way for engineering teams to adopt Open Telemetry. You’ll be a strong advocate for the customer while contributing to the technical direction and roadmap for SRE products. You’ll model a growth mindset and help improve our services by identifying gaps, promoting capability growth, discovering technical solutions to business problems and championing modern practices. 

What you'll do:

  • Design systems to improve adoption of Xero's observability tools with a strong focus on reducing toil in managing our monitoring and logging platforms.
  • Have a strong focus on developing and growing engineers through technical mentoring and coaching. 
  • Provide leadership around observability standards and practices.
  • Create systems that support and enable our product teams to uplift their observability practices.
  • Improve the implementation of system instrumentation as and when required.
  • Be a key member of the pod leadership, contributing to technical strategy, feasibility, backlog management and enabling delivery.
  • Participate in the wider SRE team on-call roster responding to Xero-wide incidents.
  • Empower other engineering teams at Xero to achieve a high standard of system awareness so they can create efficient, scalable and reliable applications for Xero's customers.

What you'll bring:

  • Experience with agile software development methodology including continuous integration and delivery.
  • An understanding of how solutions architecture or architecture design works in a large software delivery organisation.
  • Experience building and implementing observability with large distributed cloud environments (ideally AWS).
  • Excellent knowledge of reliability and observability concepts and practices.
  • An understanding of Open Telemetry and how it works.
  • Experience being on call and helping to resolve production incidents in a complex environment.
  • Experience in instrumenting applications and integrating with monitoring solutions like New Relic, Datadog, Dynatrace, SignalFX, Scalyr, Sumo Logic or Splunk (ideally New Relic).
  • Proficiency in one or more object-oriented programming languages such as C#, JavaScript, Golang, Python etc. 
  • Experience with DevOps tooling, eg. Linux, Docker, Kubernetes, IaC, CICD tools.
  • The ability to help structure work to make optimal use of the team’s resources.
  • The ability to set quarterly and annual objectives for the team in collaboration with the Product Manager and Team Lead
  • Proven ability to engage, influence and build relationships with internal stakeholders.
  • Experience in managing and maintaining healthy observability platforms for a large user base.

Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Lead Engineer Q&A's
Report this job
Apply for this job