Bengaluru, India

Full-Time

About the Team

When 5% of Indian households shop with us, it’s important to build resilient systems to manage millions of orders every day. We’ve done this – with zero downtime! 😎

Sounds impossible? Well, that’s the kind of Engineering muscle that has helped Meesho become the e-commerce giant that it is today. We value speed over perfection, and see failures as opportunities to become better. We’ve taken steps to inculcate a strong ‘Founder’s Mindset’ across our engineering teams, making us grow and move fast.

We place special emphasis on the continuous growth of each team member - and we do this with regular 1-1s and open communication. As Site Reliability Engineer (SRE), you’ll be part of self-starters who thrive on teamwork and constructive feedback.

We know how to party as hard as we work! If we aren’t building unparalleled tech solutions, you can find us debating the plot points of our favourite books and games – or even gossiping over chai. So, if a day filled with building impactful solutions with a fun team sounds appealing to you, join us.

About the Role

As Site Reliability Engineer, you’ll contribute to the growth of the company by planning, building, establishing and following site reliability engineering practices proactively. You’ll also work closely with other engineering teams and establish well-oiled collaborations within the organisation to ensure seamless reliability/scalability.

As an SDE2 SRE, you will be responsible for managing and supporting our company's BAU operations, handling observability between different business units and conducting proof-of-concepts (POCs) to cater to the requirements of engineering teams. You would also be responsible to ensure uptime of Meesho’s systems are met and take part in incident management.

What you will do

Manage and maintain day-to-day BAU operations, including monitoring system performance, troubleshooting issues, and ensuring high availability.
Contribute to the automation and platformization of repeated tasks using conventional and advanced technology by building software to support.
Collaborate with engineering teams and support the observability requirements maintenance of the Observability Stack (Grafana, VMStack etc).
Develop and maintain automation scripts and tools to streamline operational processes.
Maintain SLI, and SLO and ensure these adhere to SLA defined.
Ensure observability around the uptime of systems are available and take necessary actions to triage issues with respective service teams and stakeholders.
Manage Org-wide observability setup including metrics and logging. Enhancing capability and getting well-versed with the PromQL queries.
Collaborate with engineering teams to provide quicker solutions during the firefighting and help improve the overall process.
Support DevOps team for managing BAU Monitor and analyze system logs and performance metrics to identify areas for improvement and take proactive measures.
Have an understanding of the incident management process and alerting systems like Pagerduty, Alertmanager.
Stay up to date with industry trends and best practices in SRE, observability, alerting and infrastructure automation.
Actively participate in rotational on-call duties to ensure continuous operational support.

What you will need

Experience as a DEVOPS / SRE for at least 3+ year.Strong knowledge of cloud computing platforms such as AWS or Google Cloud.
Experience with containerization technologies such as Docker and orchestration tools like Kubernetes.
Proficiency in programming and scripting languages such as Java, Python, Go, JavaScript, and Bash.
Have the capability to develop and maintain software written in any programming language.
Understanding with infrastructure-as-code (IaC) tools such as Terraform, Ansible or CloudFormation.
Having a development mindset and excellent coding skills with hands-on experience in API/UI development is mandatory.
Excellent communication and collaboration skills.
Bachelor's degree in Computer Science, and IT from Premium Universities.

Apply for this job

Meesho is hiring a

Site Reliability Engineer II