Role: Site Reliability Engineer (Ex - Fidelity Exp)
Location: Remote
Position Type: Contract
Key Responsibilities
• Design, implement, and manage Kubernetes environments from deployment to configuration, monitoring, and troubleshooting
• Build and maintain scalable and reliable infrastructure using infrastructure as code principles
• Develop comprehensive monitoring solutions and implement alerting strategies
• Analyze system performance bottlenecks and implement improvements
• Implement and maintain CI/CD pipelines for seamless deployments
• Conduct incident response, root cause analysis, and implement preventative measures
• Create and enhance automation tools leveraging AI/ML where applicable
• Collaborate with development teams to improve application reliability and performance
Required Qualifications
• 5-7 years of experience in SRE or DevOps roles
• Strong expertise with Kubernetes ecosystem and container orchestration
• Deep understanding of Linux/Unix operating systems and performance analysis tools (NMON, etc.)
• Experience with log analysis, monitoring systems, and observability tools
• Proficiency in database administration and performance tuning (Oracle, SQL Server)
• Strong programming skills in at least one of: Python, Go, Java, or Node.js
• Experience developing automation tools and frameworks
• Proven track record of proactive problem identification and resolution
Preferred Qualifications
• Experience with AI/ML integration into operational workflows
• Cloud platform experience (AWS, GCP, Azure)
• Knowledge of service mesh technologies
• Experience with distributed systems architecture
• Familiarity with security best practices and compliance requirements
Personal Qualities
• Proactive mindset with strong analytical and problem-solving abilities
• Collaborative approach to working across development and operations teams
• Excellent communication skills and ability to explain complex technical concepts
• Self-motivated with the ability to work independently and as part of a team
• Passion for continuous improvement and learning
Axiom is a global information technology, consulting and outsourcing company and services provider. Our IT solutions empower organizations and individuals throughout the world to maximize value and quality to succeed in today's challenging business environment. As a fast-growing new economy company, we focus our strengths to offer world-class solutions and services through the convergence of technology, innovation, expertise and experience. We provide software consulting, development and IT-enabled services to clients across the globe. We work towards delivering sustained value creation for customers, employees, industries and society at large. Core offerings include data warehousing, middleware development, product development and web-enablement of legacy applications in verticals like telecom, finance, healthcare, manufacturing, energy & utilities, retail & distribution, enablement of legacy Relentless exploration of technology horizons and a Global Delivery Model that is a judicious combination of onsite, offsite and offshore development, offer a complete range of high-ROI business solutions spanning the consulting, technology, operations and process outsourcing value chain.
Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!
Be the first to apply. Receive an email whenever similar jobs are posted.
Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.
Site Reliability Engineer Q&A's