Site Reliability Engineer (Ex - Fidelity Exp)

AI overview

Design and manage Kubernetes environments while implementing CI/CD pipelines and enhancing automation tools with AI/ML to improve application reliability.

Role: Site Reliability Engineer (Ex - Fidelity Exp)

Location: Remote

Position Type: Contract

Key Responsibilities

• Design, implement, and manage Kubernetes environments from deployment to configuration, monitoring, and troubleshooting

• Build and maintain scalable and reliable infrastructure using infrastructure as code principles

• Develop comprehensive monitoring solutions and implement alerting strategies

• Analyze system performance bottlenecks and implement improvements

• Implement and maintain CI/CD pipelines for seamless deployments

• Conduct incident response, root cause analysis, and implement preventative measures

• Create and enhance automation tools leveraging AI/ML where applicable

• Collaborate with development teams to improve application reliability and performance

Required Qualifications

• 5-7 years of experience in SRE or DevOps roles

• Strong expertise with Kubernetes ecosystem and container orchestration

• Deep understanding of Linux/Unix operating systems and performance analysis tools (NMON, etc.)

• Experience with log analysis, monitoring systems, and observability tools

• Proficiency in database administration and performance tuning (Oracle, SQL Server)

• Strong programming skills in at least one of: Python, Go, Java, or Node.js

• Experience developing automation tools and frameworks

• Proven track record of proactive problem identification and resolution

Preferred Qualifications

• Experience with AI/ML integration into operational workflows

• Cloud platform experience (AWS, GCP, Azure)

• Knowledge of service mesh technologies

• Experience with distributed systems architecture

• Familiarity with security best practices and compliance requirements

Personal Qualities

• Proactive mindset with strong analytical and problem-solving abilities

• Collaborative approach to working across development and operations teams

• Excellent communication skills and ability to explain complex technical concepts

• Self-motivated with the ability to work independently and as part of a team

• Passion for continuous improvement and learning