Job Summary:
We are seeking a highly skilled and motivated Infrastructure Manager to lead and manage our Infrastructure Operations (InfraOps), Application Operations (AppOps), and Site Reliability Engineering (SRE) teams. This is a pivotal role within our engineering department, tasked with ensuring our platform's reliability, scalability, and security while fostering a high-performing team culture.
The ideal candidate will bring a strong technical background in infrastructure and operations management, coupled with exceptional leadership and organizational skills.
Main Areas of Responsibility:
1. Team Leadership and Development
- Lead and mentor three teams: InfraOps, AppOps, and SRE.
- Recruit, develop, and retain top talent to ensure a high-performing team.
- Foster a collaborative culture with a strong focus on accountability, innovation, and continuous improvement.
- Define team goals and KPIs aligned with organizational objectives.
2. Infrastructure Management
- Oversee the design, deployment, and maintenance of scalable, reliable, and secure infrastructure.
- Ensure compliance with uptime SLAs (99.99%) through proactive monitoring and incident management.
- Drive automation initiatives to reduce manual work and improve efficiency.
- Manage capacity planning and cost optimization strategies.
3. Application Operations (AppOps)
- Ensure the seamless operation of deployed applications and services.
- Optimize application performance and reliability, working closely with engineering teams.
- Oversee release management processes to minimize downtime and ensure smooth rollouts.
4. Site Reliability Engineering (SRE)
- Implement and uphold SRE practices to enhance platform reliability and scalability.
- Oversee observability initiatives, including logging, monitoring, and alerting frameworks.
- Drive post-incident reviews to identify root causes and implement preventive measures.
5. Security and Compliance
- Collaborate with security teams to enforce best practices across infrastructure and applications.
- Ensure compliance with industry standards and regulations (e.g., ISO 27001, GDPR).
6. Cross-functional Collaboration
- Work closely with engineering, product, and business stakeholders to align infrastructure initiatives with organizational goals.
- Serve as a point of escalation for critical infrastructure and operational issues.
Requirements
Qualifications and Skills:
Educational :
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
Technical Experience:
- Proven experience in infrastructure management, operations, and SRE practices.
- Expertise in cloud platforms (e.g., AWS, Azure, Google Cloud).
- Strong knowledge of automation tools (e.g., Terraform, Ansible, or similar).
- Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).
- Hands-on experience with CI/CD pipelines and DevOps best practices.
Leadership Experience:
- 5+ years of leadership experience managing multiple teams.
- Proven ability to develop and implement team strategies aligned with company goals.
Soft Skills:
- Exceptional problem-solving and decision-making abilities.
- Strong interpersonal and communication skills.
- Ability to thrive in a fast-paced, dynamic environment.