Manager - Reliability Operations

Hyderabad , India
full-time On-site

AI overview

Manage a 24x7 team to ensure the reliability and performance of SaaS products while driving operational efficiencies and maintaining high customer support standards.
About Zeta Build the future of banking. Zeta is a next-generation banking technology company providing cloud-native, fully stackable processing and core banking platforms for issuers. With a focus on scalability, compliance, and innovation, Zeta empowers financial institutions to modernize their technology infrastructure and deliver secure, seamless digital banking experiences.  Our impact runs at real-world scale. Today, over 25 million cards are live on Zeta-powered platforms across 7 countries, supported by a passionate team of 1,700+ Zetanauts across India, the US, EMEA, and Asia. Backed by SoftBank Vision Fund, Mastercard, and other reputed strategic investors, we reached a valuation of $2 billion in 2025. Our focus is on establishing product lines that focus on key outcomes by addressing real customer pain points, modernizing legacy systems, and strengthening core fundamentals. As a result, our systems and platforms support a wide range of banking and payments capabilities, including: 1. Tachyon, our cloud-native banking stack built for population-scale systems 2. Cipher, our unified authentication platform for secure, high-volume banking environments 3. Digital Credit as a Service, enabling banks to launch credit lines on UPI 4. Elena, our intelligent and conversational AI platform for banking. 5. Pixel, India’s first digital-native credit card, launched in partnership with HDFC Bank, for whom we also revamped their PayZapp mobile app: Winner of the Celent Model Bank Award for Payments Innovation 2024. 6. Sparrow, the leading card experience for non-prime cardholders in the US …and more across cards, payments, lending, and core banking. We are an engineering-first organization that values ownership, bias for action, and long-term thinking. Together, we solve some of the hardest problems in banking tech. Our culture is built around trust, collaboration, and creating the conditions for you to drive impact proportionate to your potential. Reinforcing our commitment to creating an inclusive and supportive workplace, we have been consistently recognized as a Great Place to Work. If you want to build cutting-edge banking tech that enables banks to serve millions reliably, securely, and at a population scale, Zeta is your playground.If you would like to learn more about how we have grown and evolved over the years, watch our journey here. You can also explore our website and follow us on LinkedIn, Instagram,YouTube, and X. Job Summary: The Manager - Reliability Operations role is responsible for overseeing the daily operations, availability, and performance of our SaaS products. This role includes managing a 24x7 team of reliability associates who monitor, respond to, and resolve service issues to ensure uninterrupted availability for our customers. The Manager will drive operational efficiencies, improve alert management processes, and uphold high standards of customer support and system reliability. Responsibilities:
  • Team Leadership and Development: Lead, mentor, and develop a 24x7 operations support team, ensuring the team has the skills, tools, and motivation needed to deliver excellent service. Foster a collaborative and proactive culture focused on problem-solving and customer satisfaction.
  • Operational Management: Oversee daily operational support, including real-time monitoring, incident response, and troubleshooting for SaaS applications. Prioritize and manage escalations, coordinating with other teams to resolve issues promptly.
  • Alert/Incident and Escalation Management: Act as an escalation point for critical alerts/incidents, guiding the team through resolution processes, coordinating with other departments, and ensuring accurate, timely communications to stakeholders.
  • Service and Performance Monitoring: Work with the team to maintain service health dashboards and set up alerts, enabling quick detection and response to potential issues. Ensure adherence to SLAs and focus on optimizing response and resolution times.
  • Deployments and Change Management: Manage deployment lifecycle of the applications. Proactively engage with stakeholders to resolve deployment process issues or challenges.
  • Customer Experience: Partner with customer support teams to provide seamless customer experiences during incidents, ensuring prompt updates and thorough follow-ups on service issues impacting users.
  • Process Optimization: Identify and implement improvements in support and incident management processes to enhance operational efficiency and reduce response times. Focus on standardizing workflows and documentation to build team efficiency.
  • Shift Management: Ensure adequate shift coverage for 24x7 operations. Schedule shifts, manage on-call rotations, and oversee team performance during off-hours.
  • Collaboration: Collaborate with Product and Engineering teams to align support strategies with broader business and product goals. Work with cross-functional teams to provide input on product reliability and stability from an operations perspective.
  • Reporting and Analytics: Provide regular reports on alert, incident, CMR and Ad-hoc trends, response times, and other key metrics to senior leadership, offering insights and recommendations for further improvement in operational performance.
  • Skills:
  • Proven experience managing 24x7 support operations for SaaS or cloud-based services.
  • Familiarity with monitoring and incident management tools.
  • Strong leadership and people management skills, with experience handling high-pressure situations and customer-impacting incidents.
  • Excellent communication skills for collaborating with both technical and non-technical teams.
  • Ability to analyse operational data, report trends, and make informed recommendations.
  • Familiarity with ITIL or other service management frameworks.
  • Knowledge of cloud platforms (AWS, Azure, or GCP) and basic understanding of Reliability concepts.
  • Proficiency in monitoring and alerting tools, such as Prometheus, Grafana, Datadog, or Splunk.
  • Experience with process improvement methodologies
  • Experience and Qualifications:
  • Education: Bachelor's degree in information technology, Business, or related field; IT certifications are a plus.
  • Experience: 8+ years of experience in operations support, reliability operations, or IT service management, with 2+ years in a supervisory or management role.
  • Zeta is an equal opportunity employer.  
    At Zeta, we are committed to equal employment opportunities regardless of job history, disability, gender identity, religion, race, marital/parental status, or another special status. We are proud to be an equitable workplace that welcomes individuals from all walks of life if they fit the roles and responsibilities.

    Zeta Optima is changing how corporates manage employee meal e vouchers and other digital tax saving benefits. All Optima grants can be used via app, card or tag.

    View all jobs
    Ace your job interview

    Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

    Manager Q&A's
    Report this job
    Apply for this job