Production Operations Engineer

AI overview

Join a global team at Index Exchange to maintain and optimize enterprise-grade infrastructure, ensuring operational stability and reliability in a dynamic environment.

At Index Exchange, we’re reinventing how digital advertising works—at scale. As a global advertising supply-side platform, we empower the world’s leading media owners and marketers to thrive in a programmatic, privacy-first ecosystem. 

We’re a proud industry pioneer with over 20 years of experience accelerating the ad technology evolution. Our proprietary tech is trusted by some of the world’s largest brands and media owners and plays a crucial role in keeping the internet open, accessible, and largely free. 

We process more than 550 billion real-time auctions every day (in comparison, Google processes 8.5 billion searches per day) with ultra-low latency. Our platform is vertically integrated from servers to networks and runs primarily on our own metal and cloud infrastructure. This end-to-end infrastructure is designed to provide both stability and agility, enabling us to adapt quickly as the market evolves. 

At the core of it all is our engineering-first culture. Our engineers tackle internet-scale problems across tight-knit, global teams. From moving petabytes of data and optimizing with AI to making real-time infrastructure decisions, Indexers have the agency and influence to shape the future of advertising. We move fast, build thoughtfully, and stay grounded in our core values. 

About the Team:

The global Production Operations team is integral to ensuring the operational stability and reliability of our worldwide 24x7 on-premises and cloud environments. As the first line of defense this team has ownership of operations engineering. Collaborating closely with IT, SRE, Network, and Data engineering teams, and key stakeholders across business, product, and software engineering teams. We play a crucial role in maintaining systems health, responding to incidents, and optimizing the performance, efficiency, and stability of complex global systems.  

As a Production Operations Engineer, you'll be at the heart of maintaining and improving our global infrastructure. You'll work with a passionate team of engineers who take pride in building reliable, scalable systems that power our business. This role offers an exciting opportunity to work with enterprise-grade data center infrastructure while developing your skills in automation and system optimization. 

You'll be part of a global team that maintains 24x7 coverage of our systems. This means participating in an on-call rotation and occasionally working outside regular hours when urgent issues arise. We provide training and support to ensure you're well-prepared for these responsibilities. 

What We’re Looking For:

Your Technical Foundation: 

We're looking for someone who has built a solid foundation in systems engineering. You should be comfortable with Linux systems administration and have a strong command of bash and Python for automation. Experience with infrastructure automation tools like Ansible and GitLab CI/CD is key, as these are central to how we operate. 

Essential Technical Skills:

  • Strong proficiency in Linux systems administration (especially CentOS/Rocky Linux) and bash scripting 
  • Python programming skills for automation and tooling development 
  • Experience with infrastructure automation tools (e.g., Ansible, Gitlab CI/CD) 
  • Hands-on experience with bare-metal server lifecycle management 
  • Understanding of networking fundamentals and troubleshooting 
  • Experience with observability and monitoring platforms (e.g., ELK Stack, Prometheus) 
  • Working knowledge of big data ecosystems (e.g., Hadoop/HDFS) 

We also value experience with: 

  • Experience with Go programming language 
  • Virtualization platforms 
  • Kubernetes and container orchestration 
  • Infrastructure-as-code practices 
  • Advanced observability platform implementation and integration 
  • Deep understanding of big data tools and architectures 
  • Experience with metrics collection and visualization tools 

Here's what you'll be doing:

Every day brings new challenges in our dynamic environment. You might find yourself managing and optimizing our bare-metal infrastructure across global data centers. You'll work with enterprise-grade hardware, handling everything from firmware updates to performance tuning. When systems need attention, you'll coordinate with remote hands and team members to quickly resolve issues. 

Taking ownership of automation initiatives that improve our operational efficiency. Whether it's crafting a new Ansible playbook or optimizing an existing deployment pipeline, you'll have opportunities to make our systems work smarter, not harder. 

Key Responsibilities:

  • Monitor and maintain system health across our global on-premises infrastructure 
  • Manage bare-metal server lifecycle, including firmware updates and break-fix procedures 
  • Participate in incident response and alert triage 
  • Implement and maintain automation frameworks 
  • Contribute to system documentation and team knowledge sharing 
    Here's what you need:

The ideal candidate brings 5-7 years of experience in DevOps, Systems Administration, Site Reliability Engineering (SRE), or similar roles. During this time, you should have developed significant hands-on experience with enterprise infrastructure management and automation. 

We're particularly looking for someone who has: 

  • Infrastructure Management Built or maintained private-cloud infrastructure running CentOS/Rocky Linux, working with a mix of bare-metal servers and virtualization technologies. Experience with server lifecycle management in distributed data centers is crucial - you'll be handling everything from break-fix scenarios to firmware updates on enterprise-grade hardware like Dell and Supermicro systems. 
  • Automation & Orchestration Developed and maintained automation frameworks for deployment and maintenance pipelines. You should be comfortable using tools like Ansible and GitLab CI/CD to push out code, manage configurations, and build new infrastructure systems. Experience with message queuing systems and workflow automation is valuable. 
  • System Integration While we primarily operate on-premises, familiarity with public cloud environments (AWS, GCP, Azure) and how they can integrate with on-premise infrastructure is beneficial. Understanding how to bridge these environments effectively demonstrates the kind of systems thinking we value. 
  • Note: We recognize that everyone's path is different. If you've spent meaningful time working with similar technologies or in comparable environments, we'd like to hear about your experience. 

Your Approach:

  •  Technical skills are important, but equally valuable is your approach to problem-solving and teamwork. The characteristics that will make you successful in this role go beyond just technical expertise. 
  •  Communication Clear and effective communication within and across teams is essential. While we place a huge premium on technical skill, we value just as much your ability to work with other people. 
  •  Curiosity Things can (and will) break for different reasons; your curiosity will help drive you to identify and fix the things that go wrong. 
  •  Alertness We can never predict when things will go wrong so it is your job to be vigilant and prepared to respond when they do; you must be ready to reach out, ask questions and sound the alarm when necessary. 
  •  Analytical Thinking Monitor and analyze activity, collaborate with other departments to maintain technical defense. 
  •  Reliability Prioritize the reliability of our systems, ensuring our exchange customers can trust in our services 24x7. Adhere to operational procedures, best practices, and security protocols. 
  •  Continuous Improvement Embrace a culture of continuous learning and innovation, always seeking ways to enhance our operational efficiency. 
  •  Customer-Centricity Committed to providing the best possible experience for our customers, both internal and external. 
  •  Accountability Take ownership of our responsibilities and hold ourselves accountable for the quality of our work. 

Why You’ll Love Working Here:

  • Comprehensive health, dental, and vision plans for you and your dependents  
  • Paid time off, health days, and personal obligation days plus flexible work schedules  
  • Competitive retirement matching plans  
  • Equity packages  
  • Generous parental leave available to birthing, non-birthing, and adoptive parents  
  • Annual well-being allowance plus fitness discounts and group wellness activities   
  • Commuter benefits and discounts, where available  
  • Employee assistance program  
  • Mental health first aid program that provides an in-the-moment point of contact and reassurance  
  • One day of volunteer time off per year and a donation-matching program  
  • Bi-weekly town halls and regular community-led team events  
  • Multiple resources and programming to support continuous learning 
  • A workplace that supports a diverse, equitable, and inclusive environment – learn more here 

Equal employment opportunity

At Index Exchange, we believe that successful products are built by teams just as diverse as the audience who uses them. As such, we are committed to equal employment opportunities. We celebrate diversity of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or expression, or veteran status. Additionally, we realize that diversity is deeper than any status or classification—diversity is the human experience. For those who show grit, passion, and humility—Index will welcome you.

Accessibility for applicants with disabilities

Index Exchange welcomes and encourages individuals with disabilities to apply to work with us.  

If you require an accommodation, please share the details of your request and any information how we can assist you with the hiring recruiter when they contact you. Index Exchange will make reasonable efforts to ensure accommodation requests are met throughout the recruitment process. 

Index Everywhere, Index Anywhere

Our corporate headquarters are in Toronto, with major offices in New York, Montreal, Kitchener, London, San Francisco, and many other global cities. As a major global advertising exchange, we are committed to operating as a tightly-knit global team and embracing and empowering talent wherever our colleagues may be. 

#Ll-PC1

#LI-ONSITE

Perks & Benefits Extracted with AI

  • Health Insurance: Comprehensive health, dental, and vision plans for you and your dependents
  • Continuous learning resources: Multiple resources and programming to support continuous learning
  • Paid Parental Leave: Generous parental leave available to birthing, non-birthing, and adoptive parents
  • Paid Time Off: Paid time off, health days, and personal obligation days plus flexible work schedules
  • Wellness Stipend: Annual well-being allowance plus fitness discounts and group wellness activities
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Operations Engineer Q&A's
Report this job

This job is no longer available