HighLevel

Lead Site Reliability Engineer

New Delhi , India

Full-Time Remote

About HighLevel: HighLevel is a cloud-based, all-in-one white-label marketing and sales platform that empowers marketing agencies, entrepreneurs, and businesses to elevate their digital presence and drive growth. With a focus on streamlining marketing efforts and providing comprehensive solutions, HighLevel helps businesses of all sizes achieve their marketing goals. We currently have ~1200 employees across 15 countries, working remotely as well as in our headquarters, which is located in Dallas, Texas. Our goal as an employer is to maintain a strong company culture, foster creativity and collaboration, and encourage a healthy work-life balance for our employees wherever they call home. Our Website - https://www.gohighlevel.com/ YouTube Channel - https://www.youtube.com/channel/UCXFiV4qDX5ipE-DQcsm1j4g Blog Post - https://blog.gohighlevel.com/general-atlantic-joins-highlevel/ Our Customers: HighLevel serves a diverse customer base, including over 60K agencies & entrepreneurs and 500K businesses globally. Our customers range from small and medium-sized businesses to enterprises, spanning various industries and sectors. Scale at HighLevel: We operate at scale, managing over 40 billion API hits and 120 billion events monthly, with more than 500 micro-services in production. Our systems handle 200+ terabytes of application data and 6 petabytes of storage. About the Role: We are looking for a Lead Site Reliability Engineer to join our team and help ensure the availability, performance, and scalability of our critical systems. You will work closely with development and operations teams to automate processes, enhance system reliability, and improve observability. Requirements:

Experience: 5+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles

Cloud Expertise: Hands-on experience with GCP and AWS

Infrastructure as Code (IaC): Terraform, Helm, or equivalent tools

Containerisation & Orchestration: Docker, Kubernetes (GKE)

Observability: Experience with Prometheus, Grafana, ELK, OpenTelemetry, or similar monitoring/logging tools

Programming/Scripting: Proficiency in Python, Bash, or Shell scripting. Basic understanding of API parsing and JSON manipulation

CI/CD Pipelines: Hands-on experience with Jenkins, GitHub Actions, ArgoCD, or similar tools

Incident Management: Experience with on-call rotations, SLOs, SLIs, SLAs, Escalation Policies, and incident resolution

Databases: Experience in monitoring MongoDB, Redis, ES, Queue based etc

Responsibilities:

Develop and improve observability using monitoring, logging, tracing, and alerting tools (Prometheus, Grafana, ELK, OpenTelemetry, etc.)

Optimize system performance, troubleshoot incidents, and conduct post-mortems/RCA to prevent future issues

Collaborate with developers to enhance application reliability, scalability, and performance

Drive cost optimisation efforts in cloud environments.

Monitor multiple databases (MongoDB, Redis, ES, Queue based etc.)

Provide technical leadership and mentorship to SRE team members, fostering a culture of continuous learning and knowledge sharing in site reliability practices

EEO Statement:

The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government recordkeeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.

#LI-Remote #LI-RS1

HighLevel

Website LinkedIn

View all jobs

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Lead Site Reliability Engineer Q&A's

Report this job

Lead Site Reliability Engineer

This job is no longer available