About Us
Selector is building an operational intelligence platform for digital infrastructure. By adopting an AI/ML-based analytics approach, the platform provides actionable multi-dimensional insights to network, cloud, and application operators. It enables operations teams to meet their KPIs through seamless collaboration, search-driven conversational user experience, and automated data engineering pipelines.
Our solutions are used by leading Telecoms, Media Providers, Retail, and Professional Sports organizations across the world. Our novel approach and rapidly expanding footprint put us in the unique position for continued growth to become a category leader. To fuel our growth, we are seeking passionate, high-energy, results-oriented individuals to join our team.
Our mission is to deliver world-class solutions on behalf of the large enterprise. Supported by leading investors, Selector is uniquely positioned to deliver a world-class solution to address large enterprise requirements across the globe.
Job Overview
We are seeking a highly skilled Platform Engineer to join our Engineering team in India. In this role, you will design and author scalable platform architectures that solve real-world business challenges and present them in a consumable, operationally sound manner for DevOps and SRE teams.
You will play a critical role in defining observability, infrastructure, and reliability standards across the organization. This includes improving existing platform implementations, creating detailed technical documentation and playbooks, and serving as an escalation point for complex production issues.
Key Responsibilities
- Design and author scalable platform architectures to address business and operational needs across cloud-native environments.
- Provide architectural guidance for observability and monitoring solutions, including migrations (e.g., transitioning from Promtail to Grafana Alloy).
- Evaluate, test, and improve existing platform implementations to address performance and scalability shortcomings.
- Investigate and resolve infrastructure limitations, such as storage or indexing constraints (e.g., inode saturation in Loki), and recommend durable architectural improvements.
- Implement and optimize solutions leveraging Kubernetes (including RKE), Helm, and Kustomize.
- Author and maintain Infrastructure as Code (IaC) using Terraform/OpenTofu.
- Design and enhance observability stacks using Prometheus, OpenTelemetry, and related tooling.
- Develop clear technical documentation including architecture diagrams, implementation guides, playbooks, and runbooks with common issues and triage procedures.
- Provide Tier 3 escalation and triage support for complex platform issues, performing deep technical reviews and implementing corrective actions.
- Continuously refine documentation and operational guides following incident resolution to prevent recurrence.
- Collaborate closely with DevOps, SRE, and Engineering teams to ensure platform solutions are scalable, maintainable, and production-ready.
- Support CI/CD workflows using Jenkins and Git/GitHub best practices.
- Work within Google Cloud Platform (GCP) environments to build and maintain cloud-native infrastructure.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 6+ years of experience in Platform Engineering, Infrastructure Engineering, or DevOps roles.
- Strong hands-on experience with Kubernetes in production environments (RKE experience preferred).
- Experience designing and maintaining observability platforms using Prometheus and OpenTelemetry.
- Strong experience with Infrastructure as Code tools such as Terraform or OpenTofu.
- Experience working with Helm and Kustomize for Kubernetes configuration management.
- Proficiency in Python for automation, scripting, and tooling enhancements.
- Experience with CI/CD pipelines and Jenkins.
- Experience operating within Google Cloud Platform (GCP).
- Strong troubleshooting and root cause analysis skills across distributed systems.
- Ability to translate business needs into scalable technical architecture.
- Strong documentation skills with experience creating architecture diagrams, runbooks, and operational playbooks.
- Excellent communication skills and ability to collaborate across engineering functions.
Preferred Qualifications
- Experience with distributed platforms
- Experience with Kafka or other distributed streaming platforms.
- Prior experience serving as a technical escalation point or Tier 3 support engineer.
- Experience leading platform modernization or observability transformation initiatives.
Benefits & Perks
- Health Insurance (GMC): Comprehensive medical coverage for employees and dependents, including hospitalization and maternity benefits.
- Personal Accident Insurance (GPA): Coverage for accidental injury, both on and off duty.
- Life Insurance (Term Plan): Life insurance coverage for eligible employees.
- Provident Fund (PF): Company contribution as per statutory requirements.
- Gratuity: As per the Payment of Gratuity Act.
- Paid Time Off: Sick Leave, Earned Leave, and Maternity Leave in line with company policy and applicable laws.
- Holidays: National and regional holidays as per the annual holiday calendar.