Associate Manager - Reliability Operations

Hyderabad , India
full-time

AI overview

Lead a team to ensure service level objectives in a demanding 24x7 SaaS environment, collaborating with SRE to drive operational accountability and reliability.
About us Build the future of banking. Zeta is a next-generation banking technology company providing cloud-native, fully stackable processing and core banking platforms for issuers. With a focus on scalability, compliance, and innovation, Zeta empowers financial institutions to modernize their technology infrastructure and deliver secure, seamless digital banking experiences.  Our impact runs at real-world scale. Today, over 25 million cards are live on Zeta-powered platforms across 7 countries, supported by a passionate team of 1,700+ Zetanauts across India, the US, EMEA, and Asia. Backed by SoftBank Vision Fund, Mastercard, and other reputed strategic investors, we reached a valuation of $2 billion in 2025. Our focus is on establishing product lines that focus on key outcomes by addressing real customer pain points, modernizing legacy systems, and strengthening core fundamentals. As a result, our systems and platforms support a wide range of banking and payments capabilities, including: 1. Tachyon, our cloud-native banking stack built for population-scale systems 2. Cipher, our unified authentication platform for secure, high-volume banking environments. 3. Digital Credit as a Service, enabling banks to launch credit lines on UPI. 4. Elena, our intelligent and conversational AI platform for banking. 5. Pixel, India’s first digital-native credit card, launched in partnership with HDFC Bank, for whom we also revamped their PayZapp mobile app: Winner of the Celent Model Bank Award for Payments Innovation 2024. 6. Sparrow, the leading card experience for non-prime cardholders in the US …and more across cards, payments, lending, and core banking. We are an engineering-first organization that values ownership, bias for action, and long-term thinking. Together, we solve some of the hardest problems in banking tech. Our culture is built around trust, collaboration, and creating the conditions for you to drive impact proportionate to your potential. Reinforcing our commitment to creating an inclusive and supportive workplace, we have been consistently recognized as a Great Place to Work. If you want to build cutting-edge banking tech that enables banks to serve millions reliably, securely, and at a population scale, Zeta is your playground. If you would like to learn more about how we have grown and evolved over the years, watch our journey here. You can also explore our website and follow us on LinkedIn, Instagram,YouTube, and X. Role
  • The Associate Manager - Reliability Operations leads a team to rigorously uphold service level objectives (SLOs) through expert alert management, SOP-compliant ticket escalations, and coordinated support for SRE-signed deployments across multiple sites.
  • This role drives operational accountability, fosters seamless SRE partnerships, and ensures production stability in a high stakes 24x7 SaaS environment
  • Responsibilities
  • Drives SLO adherence by implementing advanced metric monitoring, enforcing error budgets, and spearheading proactive initiatives to prevent breaches and elevate system reliability.
  • Ensures all alerts receive immediate acknowledgment, with tickets escalated to SRE teams for any issues lacking defined SOPs, systematically reducing escalations, downtime, and MTTR.
  • Coordinates standard deployments across sites following SRE sign-off, overseeing logistics, real-time rollout health monitoring, and rigorous post-deployment SLO validation.
  • Collaborates strategically with SRE teams on deployment planning, comprehensive risk assessments, troubleshooting, and post-release optimizations for flawless execution and rapid recovery.
  • Oversees and refines team processes for alert triage, SOP documentation/updates, and knowledge sharing, integrating automation to minimize manual toil and enhance operational resilience.
  • Mentors staff on SLO-driven decision-making, conducts in-depth audits of alert/ticket workflows, analyses trends in operational data, and delivers actionable reliability KPI reports to stakeholders.
  • Skills
  • Proven track record in 24x7 SaaS/cloud support operations, handling high-pressure incidents and customer-impacting events.
  • Strong proficiency in monitoring/incident tools (Prometheus, Grafana, Splunk, PagerDuty) and ticketing systems.
  • Effective leadership and people management, with excellent communication for technical/non-technical collaboration.
  • Analytical skills to interpret operational data, identify trends, and drive process recommendations.
  • Experience and Qualifications
  • Familiarity with ITIL frameworks, SRE principles (e.g., error budgets, toil reduction), and cloud platforms (AWS, Azure, GCP).
  • Experience with process improvement methodologies and shift handoff protocols.
  • Knowledge of basic reliability concepts and observability stacks.
  • Education: Bachelor's degree in Information Technology, Business, or related field; relevant IT certifications (e.g., ITIL Foundation) are a plus.
  • Experience: 6-8 years in operations support, reliability operations, or IT service management, including 2+ years in supervisory roles managing 24x7 teams.
  • Shift Information
  • 24x7 Operational Oversight: Role with on-call and shift responsibilities for escalations; provides oversight for 24x7 team operations, including shift scheduling and off-hour incident coordination.
  • Zeta is an equal opportunity employer.  
    We celebrate diversity and are committed to creating an inclusive environment for all employees. We encourage applicants from all backgrounds, cultures, and communities to apply and believe that a diverse workforce is key to our success.

    Zeta Optima is changing how corporates manage employee meal e vouchers and other digital tax saving benefits. All Optima grants can be used via app, card or tag.

    View all jobs
    Ace your job interview

    Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

    Associate Q&A's
    Report this job
    Apply for this job