Healthmap Solutions is hiring a

Site Reliability Engineer

Full-Time
Position Summary:
The Site Reliability Engineer (SRE), will be a key player in building and scaling our cloud infrastructure.
 
Responsibilities: 
  • Architect and manage Amazon Web Services (AWS) cloud environments, including EC2, VPC, S3, and other key resources, ensuring resilience, scalability, and cost-efficiency
  • Lead the design, deployment, and optimization of Kubernetes clusters using AWS EKS, leveraging container orchestration to support the scalability of our applications
  • Collaborate closely with our software engineers to streamline and enhance our CI/CD pipelines, infrastructure as code (IaC) practices, and containerization processes
  • Implement and maintain monitoring and alerting systems (Datadog or similar) to ensure performance, reliability, and early detection of potential issues
  • Manage and oversee high-impact incidents, swiftly troubleshooting and collaborating with cross-functional teams to restore services and ensure operational continuity
  • Strategically plan capacity requirements by analyzing, forecasting, and optimizing cloud infrastructure for future growth while maintaining cost-effectiveness
  • Develop and maintain automation tools that minimize manual tasks and elevate operational efficiency
  • Ensure our cloud infrastructure adheres to best practices in security and compliance, safeguarding our platform and services
  • Perform other duties as assigned
Requirements:
  • Bachelor’s degree in Computer Science or a related field, or equivalent practical experience
  • 5+ years of experience with AWS or other cloud platforms and cloud security
  • AWS certification such as Solutions Architect with in-depth knowledge of AWS services like EC2, VPC, Lambda, RDS is a plus
  • Knowledge of Gitlab and Jenkins is a plus
  • Proven experience managing Kubernetes clusters in production, especially with AWS EKS
Skills:
  • Excellent communication skills
  • Strong analytical and problem-solving skills with a drive for continuous improvement
  • Able to work with cutting-edge technologies with desire for continuous learning
  • Strong monitoring capabilities to drive growth and performance 
  • Contribute to a supportive, cross-functional work environment
Travel:
 
Limited Travel, Scheduled per needs of the business


#LI-REMOTE

Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job