ProdOps Engineer 3
TLDR
Support and stabilize large-scale production systems in a hands-on role, focusing on incident management, monitoring, and customer-facing communications.
Black Duck Software, Inc. helps organizations build secure, high-quality software, minimizing risks while maximizing speed and productivity. Black Duck, a recognized pioneer in application security, provides SAST, SCA, and DAST solutions that enable teams to quickly find and fix vulnerabilities and defects in proprietary code, open source components, and application behavior. With a combination of industry-leading tools, services, and expertise, only Black Duck helps organizations maximize security and quality in DevSecOps and throughout the software development life cycle.
Production Operations Engineer 3 – P3 (ProdOps / SRE)
Location: Bangalore – Hybrid
Experience: 5–8 years
Shift: 24/7 Rotational shifts (Including Night Shifts & Weekend On-Call)
About the Role
The Production Operations Engineer will support and stabilize large-scale production systems with a focus on incident management, monitoring, site reliability, and customer-facing communications. This is a hands-on role requiring ownership of critical production issues in a 24/7 environment.
Key Responsibilities
- Own and manage Critical and High production incidents end-to-end.
- Participate in SWARM / Tech Bridge calls and lead incidents during assigned shifts.
- Improve MTTR, MTTA, alert quality, and operational stability.
- Perform root cause analysis (RCA) and drive corrective actions.
- Monitor production systems and proactively detect issues.
- Automate operational tasks using Go, Python, Shell, or Perl.
- Maintain dashboards, alerts, runbooks, and SOPs.
- Handle customer-facing communications during incidents.
- Coordinate with Engineering, Product, CloudOps, and Support teams.
- Guide junior engineers and support shift handovers.
- Lead automation initiatives to reduce toil and manual intervention.
- Write and review operational automation using Go / Python / Shell / Perl.
- Act as a technical reviewer for reliability‑critical changes.
- Influence architecture decisions with operability and reliability in mind.
- Own and standardize runbooks, SOPs, and disaster recovery processes.
Leadership & Mentorship
- Provide technical leadership and mentorship to ProdOps engineers.
- Guide shift teams during complex situations.
- Support onboarding, training, and upskilling of team members.
- Drive operational maturity across the team.
Tech Stack & Expertise
Required Technologies
- Containers & Orchestration: Docker, Kubernetes, Helm
- Cloud Platforms: AWS / GCP / Azure
- Infrastructure as Code: Terraform
- CI/CD: Jenkins, Harness, GitHub Actions, ArgoCD, GitLab CI
- Monitoring & Observability: Prometheus, Grafana, ELK, Datadog, New Relic, Loki
- Version Control: Git, GitHub, GitLab
- Scripting: Go or Python or Shell or Perl
Qualifications
- 6+ years of experience in Production Operations, SRE, or Cloud Reliability roles.
- Proven experience leading major production incidents in customer‑facing systems.
- Strong background in distributed systems, Kubernetes, and cloud environments.
- Experience mentoring engineers and driving reliability initiatives.
- Excellent written and verbal communication skills.
What We Offer
- An opportunity to be part of a dynamic and innovative team.
- Inclusive and collaborative work environment.
- Continuous learning and professional development opportunities.
- Exposure to large-scale and customer-critical systems.
Black Duck considers all applicants for employment without regard to race, color, religion, sex, gender preference, national origin, age, disability, or status as a Covered Veteran in accordance with federal law. In addition, Black Duck complies with applicable state and local laws prohibiting discrimination in employment in every jurisdiction in which it maintains facilities. Black Duck also provides reasonable accommodation to individuals with a disability in accordance with applicable laws.
Black Duck Software, Inc. develops automated solutions for securing and managing open source software, targeting organizations striving for high-quality, secure software development. As a leader in application security, their offerings include SAST, SCA, and DAST tools that empower teams to swiftly identify and remediate vulnerabilities across both proprietary and open source components, integrating seamlessly into the software development lifecycle.
- Founded
- Founded 2002
- Employees
- 500+ employees
- Industry
- Internet Software & Services
- Total raised
- $2M raised