Company Overview
300+ media companies as clients, $40+ billion in revenue processed, 25,000+ worldwide users.
Operative is a revenue accelerant for media companies around the world. No other software company in AdTech space, brings a comparable depth of experience to create truly innovative software that performs across all platforms, revenue models and business units. We are a SAAS (Software as a Service) platform which helps clients manage advertisements both in the linear (TV) and digital space. We have been in the market for over two decades and have 1100+ employees with 12 offices spread across the globe. Operative is proud to play a pivotal role in the way advertising is bought, sold and managed across media industry.
JOB SUMMARY
We are looking for a highly motivated and detail-oriented NOC Engineer to join our Technology Operations Center (TOC) Monitoring Team with a focus on Cloud Platforms. In this role, you will be responsible for monitoring and troubleshooting cloud infrastructure, ensuring optimal performance, and escalating issues as required. You will play a crucial role in maintaining service availability, reliability, and performance for cloud-based applications and services.
MAIN DUTIES AND RESPONSIBILITIES
-
Cloud Infrastructure Monitoring: Monitor cloud environments (AWS, GCP etc.) for resource performance, availability, and security
-
Incident Detection and Reporting: Detect and report network and system anomalies in cloud environments such as downtime, latency issues, or performance degradation
-
Issue Escalation: Escalate critical issues related to cloud services, resources, and applications to senior technical teams for prompt resolution
-
Cloud Resource Management: Assist in monitoring and managing cloud resources (servers, virtual machines, storage, databases, etc.), ensuring proper allocation and optimization
-
Log Analysis: Review and analyse logs from cloud services and platforms (e.g., CloudWatch, ELK) to identify patterns or issues that need resolution
-
Routine Checks: Perform regular health checks on cloud infrastructure, services, and applications to ensure uptime and prevent issues
-
Build Automation: Setup monitoring and perform automation tasks, such as auto-scaling, load balancing, and resource provisioning in the cloud
-
Documentation: Maintain and update records of cloud infrastructure status, incidents, troubleshooting steps, and resolutions
-
Customer Communication: Provide status updates to internal stakeholders or customers regarding cloud-related incidents or maintenance schedules
-
Collaboration: Work with senior cloud engineers and IT teams to resolve cloud infrastructure issues and optimize performance
COMPETENCIES
Must have skills:
- Monitoring & Observability Tools knowledge (Grafana, New Relic, Zabbix, ELK, AWS CloudWatch etc.)
- Familiarity with Cloud platforms (AWS, GCP etc.) and ability to monitor, manage, and troubleshoot cloud infrastructure and services.
- Working knowledge of AWS CloudWatch including creating monitors, setting up alerts, and analysing logs to detect and troubleshoot infrastructure issues.
- Familiarity with Networking concepts (TCP/IP, DNS, DHCP, etc.) and cloud networking configurations.
- Understanding of virtual machines, cloud storage, and cloud databases.
- Must have Python/Shell scripting knowledge. (Atleast working knowledge is desirable).
- Good knowledge & understanding of Operating Systems (Linux, Windows).
Good to have/ Desired skills:
- Working knowledge of AWS Lambda and serverless architecture; ability to monitor Lambda function performance, detect failures, and identify issues in serverless workflows.
- Experience with REST APIs and HTTP concepts; ability to monitor and troubleshoot backend service connectivity and performance issues.
- Good understanding of AI-powered monitoring tools and their role in automating incident detection and remediation in cloud infrastructure.
- Experience with ticketing & workflow tools like Jira for incident and task management.
KEY COMPETENCIES
-
Attention to Detail: Ability to identify potential issues and performance degradation in cloud environments.
-
Problem-Solving: Strong troubleshooting skills, able to diagnose issues and suggest solutions under pressure.
-
Communication Skills: Effective verbal and written communication skills for providing updates and documentation.
-
Teamwork: Able to work collaboratively within a team and communicate effectively with senior engineers and technical teams.
-
Adaptability: A willingness to learn more about cloud technologies, platforms, and monitoring tools in a dynamic environment.
EDUCATION, CERTIFICATION AND EXPERIENCE
- Educational Background: Bachelor’s degree in computer science, Information Technology, Cloud Computing, or related field (or equivalent)
-
Work Experience: Minimum 1 to 2 Years.
Why join us ?
- Operative is a technology-oriented product organization that believes in empowering its people
- We use the latest tech stack and empower our engineers to learn, work and ideate on new technologies available in the market
- We provide flexi work schedules and remote working to encourage work life balance
- We are an equal opportunities employer and recruit based on the experience and skill set.
- We offer a competitive salary and benefits package
Please apply online and upload your CV.
“Operative is a merit-first, equal opportunity employer; diverse applications are encouraged.”
Operative cares about your privacy and protecting your data. By submitting an application for a position with Operative, you acknowledge that you have read the following and consent to how Operative treats your data: 1) the Candidate Privacy Policy available at https://www.operative.com/candidate-privacy-notice/ (or if you are a candidate from Israel the Candidate Privacy Notice (Israel), available at https://www.operative.com/candidate-privacy-notice-israel/, and 2) the Candidate Notice for Data Transfer and Retention available at https://www.operative.com/candidate-notice/.