Site Reliability Engineer (SRE)

Hyderabad , India
full-time

AI overview

Ensure the reliability and performance of systems while collaborating with multiple teams to troubleshoot issues and implement best practices for service reliability.

About ProArch:

At ProArch, we partner with businesses around the world to turn big ideas into better outcomes through IT services that span cybersecurity, cloud, data, AI, and app development. We’re 400+ team members strong across 3 countries (we call ourselves ProArchians)—and here’s what connects us all:

  • A love for solving real business problems
  • A belief in doing what’s right

What’s it like to work here?

  • You’ll keep growing. You’ll work alongside domain experts who love to share what they know.
  • You’ll be supported, heard, and trusted to make an impact.
  • You’ll take on projects that touch industries, communities, and lives.
  • You’ll have the time to focus on what matters most in your life outside of work.

At ProArch, you’ll be part of teams that design and deliver technology solutions solving real business challenges for our clients. With services spanning AI, Data, Application Development, Cybersecurity, Cloud & Infrastructure, and Industry Solutions, your work may involve building intelligent applications, securing business‑critical systems, or supporting cloud migrations and infrastructure modernization. 

Every role here contributes to shaping outcomes for global clients and driving meaningful impact. You’ll collaborate with experts across data, AI, engineering, cloud, cybersecurity, and infrastructure—solving complex problems with creativity, precision, and purpose. You’ll join a culture rooted in technology, curiosity, and continuous learning. A place where we move fast, trust you to make an impact, encourage innovation, and support your growth.

ProArch is looking for a passionate and skilled Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for ensuring the reliability, availability, and performance of our systems and services. You will collaborate with various teams to optimize production environments, troubleshoot performance issues, and implement best practices for service reliability. Your contributions will be critical to improving system uptime and enhancing user satisfaction.

Key Responsibilities:

  • Monitor system performance and reliability, ensuring uptime meets organizational SLAs.
  • Implement and maintain observability tools to gather metrics and logs for proactive issue detection.
  • Troubleshoot and resolve complex production issues across various components of our infrastructure.
  • Collaborate with software engineering teams to design and implement scalable, fault-tolerant architectures.
  • Develop and maintain automation scripts for deployment, monitoring, and system management.
  • Participate in on-call rotation to respond to production incidents and perform root cause analysis.
  • Contribute to capacity planning and performance tuning to ensure optimal resource utilization.
  • Document infrastructure, processes, and incident responses to promote knowledge sharing.

Requirements

Required Qualifications:

  • 8+ years of experience as a Site Reliability Engineer, DevOps Engineer, or related role.
  • Strong experience with cloud providers such as AWS, Azure, or GCP.
  • Proficiency in scripting languages such as Python, Bash, or Go.
  • Experience with container orchestration tools like Kubernetes.
  • Familiarity with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI).
  • Solid understanding of networking and security principles.
  • Experience with monitoring and logging tools such as Prometheus, Grafana, or ELK stack.
  • Excellent problem-solving skills and a proactive attitude.
  • Strong communication and teamwork skills, with an emphasis on collaboration.

Preferred Qualifications:

  • Experience with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation.
  • Knowledge of service mesh architectures and modern microservices patterns.
  • Background in software development and familiarity with Agile methodologies.

We are a value-driven consulting and engineering partner, helping companies to design and execute their most challenging digital transformations in the Cloud.Moving to the Cloud is merely the foundation of your digital transformation. Once migration is complete, we integrate cutting-edge technologies into all areas of your organization to redefine the way you do business. Our aim is to take you on a Cloud-centric journey to unlock the value hidden in your data and compete in an increasingly competitive and connected world. We take an evidence-based approach to setting up your transformation, leveraging ProArch’s solution set to accelerate your time to value.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job