We are looking for a driven and experienced Site Reliability Engineering Manager to join our innovative Tech Team. You will lead and empower a team of SREs, partnering closely with Engineering, Product, and Security to ensure our platforms are resilient, scalable, and secure. You will play a key role in shaping reliability strategy, improving system performance, and embedding best practices in observability, incident management, and automation to support the delivery of high-impact solutions in the fight against financial crime.
As a Site Reliability Engineering (SRE) Manager, you will
The role reports to the Director of Infrastructure. You’ll be managing a team of Cloud / Infra Engineers focused on the provision and support of our Cloud and Kubernetes ecosystem underpinning all products and systems across the organisation. The kubernetes clusters run on AWS and GCP, with utilities provided by a common stack of open source technologies. In addition a CloudFlare edge layer provides additional compute and functionality at the edge. As the technology stack underpins all other engineering work, a collaborative mindset is a must.
Our tech stack:
ComplyAdvantage is fully cloud-based, with a modern kubernetes-focused tech stack. All compute workloads run in Kubernetes, with clusters in multiple regions to support the needs of our global client base. Our production services are multi-cloud by design and are currently hosted in AWS and GCP.
We make heavy use of Terraform and Helm to define our infrastructure and services, and lean heavily on GitOps paradigms - production and non-production environments are defined in git and changes to these environments (both cloud infrastructure and Kubernetes applications) are managed via git.
ArgoCD is our tool of choice for controlling our deployments, and paired with our istio mesh, allows us for advanced deployment patterns used by our development teams such a progressive rollouts. Our observability stack consists of Grafana Cloud, along with some on-prem Mimir, amongst others. We focus on Open Telemetry for application metrics, with SLO and metric driven alerting at all levels, from Cloud infra through to application performance.
Across the wider Technology team, teams build and release containerised applications to support the wide array of activities that our teams are engaged in - from developing low latency client-facing APIs, to machine learning models and data processing pipelines.
About you:
As an Site Reliability Engineering (SRE) Manager, you will
Nice to haves:
Benefits:
About us:
Our mission is to empower every business to eliminate financial crime.
By harnessing AI, a unified platform, and an extensive partner ecosystem, we help customers turn compliance into a catalyst for growth, operational resilience, and enduring regulatory trust.
More than 3,000 enterprises across 75 countries rely on our end-to-end platform and the world’s most comprehensive financial crime risk intelligence. With full-stack agentic automation, we help organizations automate up to 95% of KYC, AML, and sanctions reviews, cut onboarding times by 50%, reduce false positives by 70%, and handle 7x more work with the same staff.
ComplyAdvantage is headquartered in London and has global hubs in New York, Lisbon, Singapore, and Cluj-Napoca. It is backed by Balderton Capital, Index Ventures, Ontario Teachers’ Pension Plan, Goldman Sachs, and Andreessen Horowitz. Learn more about compliance re-engineered for the age of AI at complyadvantage.com.
Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!
Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.
Site Reliability Engineering Manager Q&A's