Senior Site Reliability Engineer, Cloud Platform

AI overview

Join a team focused on enhancing production reliability and observability while implementing SRE best practices within a collaborative environment that values safety.
About Woven by Toyota Woven by Toyota is enabling Toyota’s once-in-a-century transformation into a mobility company. Inspired by a legacy of innovating for the benefit of others, our mission is to challenge the current state of mobility through human-centric innovation — expanding what “mobility” means and how it serves society. Our work centers on four pillars: AD/ADAS, our autonomous driving and advanced driver assist technologies; Arene, our software development platform for software-defined vehicles; Woven City, a test course for mobility; and Cloud & AI, the digital infrastructure powering our collaborative foundation. Business-critical functions empower these teams to execute, and together, we’re working toward one bold goal: a world with zero accidents and enhanced well-being for all. ========================================================================= TEAM Our mission is to make software development for Woven by Toyota and the greater Toyota organization as a whole more productive and efficient. We use the latest technologies to help engineering teams go faster, with safety as our top priority. Our modern, agile, and transparent services are designed to bring to life Woven by Toyota's vision of "Mobility to Love, Safety to Live." WHO ARE WE LOOKING FOR? The Stargate SRE team collaborates with the product development team, sharing the same codebase, but with a primary focus on non-functional requirements. Our objective is to enhance production readiness and reliability. We are looking for a Senior SRE engineer with a background in software engineering, observability, and cloud engineering. You will be passionate about establishing SRE best practices, and you'll report to our SRE Manager. This role is hybrid, requiring on-site presence three days per week. RESPONSIBILITIES
  • Provide technical leadership to the team by guiding technical decision‑making, supporting roadmap planning, enabling effective cross‑team collaboration, and offering ongoing mentorship
  • Develop software systems for observability platforms, and enhance product monitoring, reliability, and development efficiency
  • You will have on-call responsibilities to monitor and respond to incidents, ensuring service health. Our 8-hour on-call rotation includes workdays, weekends, and holidays, and can be done remote
  • Provide L2 support for user requests, assisting customers with troubleshooting product usage issues
  • Learn from incidents through blameless post-mortems and address service reliability issues through hands-on coding
  • Establish SRE best practices within product teams, including capacity planning, chaos testing, and disaster recovery drills
  • Improve the efficiency of development and operations teams by reducing toil through automation
  • MINIMUM QUALIFICATIONS
  • 7+ years of experience in roles such as SRE, DevOps, cloud engineering, observability engineering, or backend development
  • Intermediate to advanced skills in Go, Python, or comparable programming languages, coupled with solid expertise in data structures, algorithms, and software design principles
  • Intermediate to advanced level of expertise in public cloud technologies, Kubernetes, and Infrastructure as Code
  • Proficient in production on-call, troubleshooting, and incident management
  • Business level English skills
  • NICE TO HAVES
  • Hands-on experience in SRE best practices, including SLO monitoring, disaster recovery planning, chaos testing, capacity planning, automation, toil reduction and more
  • Experience with APM solutions and monitoring systems such as Prometheus, Grafana, and GCP monitoring
  • Previous experience as a technical lead or team lead within SRE, DevOps, or Platform Engineering teams
  • AWS, GCP, or Kubernetes Certifications
  • Japanese language skill to talk with customers
  • =========================================================================
    Important Points
    ・All interviews will be arranged via Google Meet, unless otherwise stated.
    ・The same job descriptions are available in both English and Japanese; therefore, we kindly ask that you apply to only one version.
    ・We kindly request that you submit your resume in English, if possible. However, Japanese resumes are also acceptable. Please note that, depending on the English proficiency requirements of the role, we may request an English version of your resume later in the process.

    WHAT WE OFFER
    ・Competitive Salary - Based on experience
    ・Work Hours - Flexible working time
    ・Paid Holiday - 20 days per year (prorated)
    ・Sick Leave - 6 days per year (prorated)
    ・Holiday - Sat & Sun, Japanese National Holidays, and other days defined by our company
    ・Japanese Social Insurance - Health Insurance, Pension, Workers’ Comp, and Unemployment Insurance, Long-term care insurance
    ・Housing Allowance
    ・Retirement Benefits
    ・Rental Cars Support
    ・In-house Training Program (software study/language study)

    Our Commitment
    ・We are an equal opportunity employer and value diversity.
    ・Any information we receive from you will be used only in the hiring and onboarding process. Please see our privacy notice for more details.

    Perks & Benefits Extracted with AI

    • Flexible Work Hours: Work Hours - Flexible working time
    • Health Insurance: Japanese Social Insurance - Health Insurance, Pension, Workers’ Comp, and Unemployment Insurance, Long-term care insurance
    • Other Benefit: In-house Training Program (software study/language study)
    • Paid Time Off: Sick Leave - 6 days per year (prorated)

    Woven by Toyota helps Toyota develop next-gen cars for a safe and happy mobility society.

    View all jobs
    Ace your job interview

    Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

    Senior Site Reliability Engineer Q&A's
    Report this job
    Apply for this job