Lead the Middleware & Messaging Operations team, overseeing the management and stability of messaging systems (Kafka, RabbitMQ, Redis) and other middleware supporting software applications.
Ensure the continuous health, performance, and availability of critical messaging infrastructure by implementing proactive monitoring, alerting, and automated systems.
Develop and execute Business Continuity Planning (BCP) and Disaster Recovery (DR) strategies, ensuring the messaging infrastructure is resilient to outages and disasters.
Collaborate with cross-functional teams to implement self-service tools for messaging configuration, empowering developers through version-controlled systems like Git.
Drive capacity planning, performance optimization, and security enforcement across all messaging platforms, ensuring future scalability and regulatory compliance.
Conduct incident response, root cause analysis, and continuous improvement efforts to optimize performance and prevent recurring issues.
Foster a culture of automation and operational excellence, identifying and eliminating manual tasks through scripting and tooling.
Implement and maintain self-service configuration tools, allowing teams to manage Kafka configurations through systems like Git.
Maintain detailed documentation of all messaging systems, configurations, and processes, and encourage knowledge sharing within and outside the team.
Provide 24x7 on-call support, ensuring incidents are swiftly handled and minimizing downtime.
Educational background: Bachelor’s or Master’s degree in Computer Science or a related field
At least 3+ years of proven experience in managing messaging platforms like Kafka, RabbitMQ, and Redis in a production environment.
Experience managing small teams, including people leadership, mentoring, and resource allocation.
Demonstrated success in leading and scaling operations teams, with a strong focus on reliability and performance.
Expertise in business continuity planning, disaster recovery, capacity planning, and incident management.
Strong experience with automation tools and scripting (e.g., Python, Ansible, Terraform, GitHub Actions, Jira Groovy) for operational tasks.
Familiarity with security best practices, including securing data in transit and managing access controls.
Excellent communication skills, with the ability to collaborate with development and operations teams.
Proven track record in project management, with the ability to lead cross-functional initiatives and deliver results on time.
Nice-to-Haves:
Experience with container orchestration platforms (e.g., Kubernetes, Docker).
Experience with other messaging platforms such as RabbitMQ or Redis.
Prior experience in setting up self-service configuration management systems using GitOps.
Experience with Solr and other middleware is a plus.
Who you are:
A proactive leader who thrives on solving complex operational challenges.
Detail-oriented and committed to ensuring high availability, performance, and security.
Passionate about fostering a collaborative and knowledge-sharing environment.
Comfortable with leading through ambiguity and prioritizing the team’s focus on critical tasks.
A forward thinker, constantly seeking opportunities to automate and improve processes.
Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!
Get hired quicker
Be the first to apply. Receive an email whenever similar jobs are posted.
Ace your job interview
Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.