About Us
Sophos is a global leader and innovator of advanced security solutions that defeat cyberattacks, including Managed Detection and Response (MDR) and incident response services and a broad portfolio of endpoint, network, email, and cloud security technologies. As one of the largest pure-play cybersecurity providers, Sophos defends more than 600,000 organizations and more than 100 million users worldwide from active adversaries, ransomware, phishing, malware, and more. Sophos’ services and products connect through the Sophos Central management console and are powered by Sophos X-Ops, the company’s cross-domain threat intelligence unit. Sophos X-Ops intelligence optimizes the entire Sophos Adaptive Cybersecurity Ecosystem, which includes a centralized data lake that leverages a rich set of open APIs available to customers, partners, developers, and other cybersecurity and information technology vendors. Sophos provides cybersecurity-as-a-service to organizations needing fully managed security solutions. Customers can also manage their cybersecurity directly with Sophos’ security operations platform or use a hybrid approach by supplementing their in-house teams with Sophos’ services, including threat hunting and remediation. Sophos sells through reseller partners and managed service providers (MSPs) worldwide. Sophos is headquartered in Oxford, U.K. More information is available at
www.sophos.com.
Role Summary
We seek a highly motivated and technically skilled Observability Engineer to join our growing team. This role will ensure the optimal monitoring, observability, and automation of our IT infrastructure, applications, and services. The ideal candidate will have a strong background in IT monitoring, DevOps, and automation, with a focus on supporting customised monitoring solutions and streamlining operational processes through custom scripting and integrations.
This individual contributor will take ownership of the monitoring posture across our systems, leveraging tools like LogicMonitor, Opsgenie, CloudWatch, and other integrations.
What you will do
- Monitoring & Observability:
- Design, implement, and manage comprehensive monitoring solutions for infrastructure and applications using LogicMonitor, Opsgenie and CloudWatch.
- Take complete ownership of Operation Center monitoring tools and ensure their best use with standard practices.
- Ensure all critical systems and services are proactively monitored, with alerts configured for critical thresholds.
- Collect the right metrics at the right frequency so the data is readily available/consumable for alerting, reporting, and analysis.
- Create and maintain custom monitoring solutions, including alerting thresholds, dashboards and performance metrics.
- Integrate multiple monitoring systems and external APIs for end-to-end observability.
- Ensure the Observability standard meets the rapidly evolving requirements in IT at the required velocity as we continue to increase the rapid adoption of new and modern architectures, processes, and technologies.
- Work with cross-functional teams to define key performance indicators (KPIs) and monitoring requirements for new services or applications.
- Automation & Scripting:
- Develop and maintain automation scripts to support monitoring tasks (e.g., custom metric collection, alert suppression, auto-remediation).
- Create scripts to integrate monitoring tools with internal systems (e.g., Jira, Opsgenie, Freshworks).
- Automate repetitive tasks and workflows to improve team efficiency and incident response times.
- Incident & Alert Management:
- Collaborate with on-call teams to ensure timely and effective resolution of incidents via Opsgenie integration.
- Develop a solution for auto incident creation across all monitoring tools.
- Analyze incident trends to identify potential areas for improvement in monitoring, alerting, and incident response.
- Proactively identify opportunities to reduce noise in alerting and improve the event-to-noise ratio.
- Performance Tuning & Reporting:
- Regularly review the performance and effectiveness of monitoring systems and make recommendations for optimisation.
- Create reports and dashboards for leadership on system health, incident trends and operational performance.
- Ensure that monitoring data is structured in a way that supports root cause analysis and continuous improvement.
- Collaboration & Documentation:
- Work closely with different teams to ensure monitoring aligns with business needs and technical requirements.
- Develop and maintain documentation for monitoring configurations, alerting procedures, and automation scripts.
- Train team members and cross-functional stakeholders on monitoring tools and best practices.
What you will bring
- 4+ years of experience in IT Operations, Site Reliability Engineering (SRE), or a similar role, with a strong focus on monitoring, observability, and automation.
- Proven experience with Grafana, LogicMonitor and CloudWatch (or similar monitoring tools) for infrastructure and application monitoring.
- Hands-on experience with Opsgenie, PagerDuty or similar incident management optimisation tools.
- Experience with Jira for ITSM, ticket management, and incident resolution.
- Strong scripting skills in languages such as Python, Bash, and PowerShell to automate tasks and customize monitoring.
- Familiarity with cloud platforms (AWS, Azure, GCP) and their native monitoring tools.
- Experience working with REST APIs for integrations and automation.
- Understanding of distributed systems, cloud architectures, and microservices.
- Complete understanding of DevOps practices.
- Familiarity with configuration management tools (e.g., Ansible, Terraform) is a plus.
- Strong troubleshooting and problem-solving skills, with the ability to analyze complex systems and identify root causes.
- Ability to think critically and make data-driven decisions under pressure.
- Excellent communication skills, with the ability to explain complex technical concepts to both technical and non-technical stakeholders.
- Strong attention to detail and a passion for continuous improvement.
- Ability to work independently, prioritise tasks, and manage time effectively.
#LI-Remote
#B1
Ready to Join Us?
At Sophos, we believe in the power of diverse perspectives to fuel innovation. Research shows that candidates sometimes hesitate to apply if they don't check every box in a job description. We challenge that notion. Your unique experiences and skills might be exactly what we need to enhance our team. Don't let a checklist hold you back – we encourage you to apply.
What's Great About Sophos?
· Sophos operates a remote-first working model, making remote work the primary option for most employees. However, some roles may necessitate a hybrid approach. Please refer to the location details in our job postings for further information.
· Our people – we innovate and create, all of which are accompanied by a great sense of fun and team spirit
· Employee-led diversity and inclusion networks that build community and provide education and advocacy
· Annual charity and fundraising initiatives and volunteer days for employees to support local communities
· Global employee sustainability initiatives to reduce our environmental footprint
· Global fitness and trivia competitions to keep our bodies and minds sharp
· Global wellbeing days for employees to relax and recharge
· Monthly wellbeing webinars and training to support employee health and wellbeing
Our Commitment To You
We’re proud of the diverse and inclusive environment we have at Sophos, and we’re committed to ensuring equality of opportunity. We believe that diversity, combined with excellence, builds a better Sophos, so we encourage applicants who can contribute to the diversity of our team. All applicants will be treated in a fair and equal manner and in accordance with the law regardless of gender, sex, gender reassignment, marital status, race, religion or belief, color, age, military veteran status, disability, pregnancy, maternity or sexual orientation. We want to give you every opportunity to show us your best self, so if there are any adjustments we could make to the recruitment and selection process to support you, please let us know.
Data Protection
If you choose to explore an opportunity, and subsequently share your CV or other personal details with Sophos, these details will be held by Sophos for 12 months in accordance with our Privacy Policy and used by our recruitment team to contact you regarding this or other relevant opportunities at Sophos. If you would like Sophos to delete or update your details at any time, please follow the steps set out in the Privacy Policy describing your individual rights. If you have any questions about Sophos’ data protection practices, please contact
[email protected].