Senior Reliability Engineer (Data Infrastructure)

TLDR

Work on critical data systems and infrastructure using AWS and GCP, leveraging a robust tech stack and collaborating with cross-functional teams to ensure reliability and performance.

What you will be doing:

We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our Data Infrastructure team. You will be responsible for ensuring the reliability, availability, and performance of our critical data systems running on AWS and GCP. Your expertise in cloud infrastructure, automation, and operational excellence will be crucial in supporting our Product trough our global client base.

As a Senior Site Reliability Engineer you will:

  • Design, implement, and maintain highly available and reliable data infrastructure services, including SQL, NoSQL, Kafka, and Spark-based data layers. Define and monitor Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
  • Participate in an on-call rotation to respond to incidents and ensure rapid resolution of production issues. Conduct thorough post-incident reviews to identify root causes and implement preventative measures.
  • Manage and automate cloud infrastructure using Terraform and Helm, adhering to GitOps principles.
  • Implement and maintain comprehensive monitoring, logging, and tracing solutions to proactively identify and resolve performance and reliability issues.
  • Monitor and manage data infrastructure capacity, plan for future growth, and optimize performance through tuning and automation.
  • Develop and maintain automation scripts and tools to streamline operational tasks, improve efficiency, and reduce manual effort.
  • Ensure the security and compliance of data infrastructure services, implementing best practices for access control, data protection, and vulnerability management.
  • Collaborate with development and data engineering teams to ensure smooth deployments and operational support. Maintain thorough documentation of infrastructure configurations, processes, and procedures.
  • Manage and maintain distributed databases running within a Kubernetes environment.

Our Tech Stack:

  • Cloud-Based Infrastructure: Fully cloud-based with a Kubernetes-focused tech stack. Compute workloads run in Kubernetes clusters across multiple regions.
  • Infrastructure Management: Heavy use of Terraform and Helm, adhering to GitOps paradigms for managing cloud infrastructure and Kubernetes applications.
  • Core Technologies: Extensive use of Kafka, distributed PostgreSQL and Cassandra QL, Elasticsearch, and Databricks/Spark. Development of inter-cloud failover options to support multi-cloud plans.
  • Wide Array of Applications: Teams build and release containerised applications for low latency APIs, machine learning models, and data processing pipelines.

About You:

  • Experience as an SRE managing cloud infrastructure (AWS and/or GCP) and data systems (Apache Kafka, Apache Spark, Elasticsearch, PostgreSQL, Cassandra). Proven track record of improving reliability and availability in complex production environments.
  • Extensive experience codifying infrastructure using Terraform and Helm charts.
  • Proven experience managing and troubleshooting distributed databases within Kubernetes.
  • Deep understanding of monitoring, logging, and tracing tools and techniques.
  • Strong incident response and troubleshooting skills.
  • Proficiency in scripting and automation tools.
  • Understanding of security best practices for cloud infrastructure and data systems.
  • Familiarity with CI tooling, test pipelines, and asset generation (e.g., Docker images, Helm charts). Understanding of security considerations in data systems.

Education:

  • BSc/BA degree in computer science, engineering, or related discipline OR equivalent experience in required skills.

Nice to have

  • Familiarity with distributed SQL and NoSQL databases such as Yugabyte, Cockroach, Spanner, HBase, or CouchDB.
  • Familiarity with data modelling, sharding, and indexing strategies for large-scale databases.

What’s in it for you? 

  • Equity as we want you to have a part of what we are building 
  • Private medical insurance designed to keep you ensuring peace of mind while you excel in your career
  • Unlimited Time Off Policy- A work-life balance and focus on our well-being are critical to keeping us performing at our best 
  • We embrace a hybrid approach that requires employees to be in the office for two days a week. We strongly believe that this approach fosters collaboration and enables the building of meaningful relationships
  • You will also get a new starter budget to kit out your home office 
  • Opportunity to work on innovative projects with smart-minded people keen to share their knowledge and continuously improve 
  • Annual learning budget (prorated based on start date) to drive your performance and career development 

About us:

Our mission is to empower every business to eliminate financial crime. 

By harnessing AI, a unified platform, and an extensive partner ecosystem, we help customers turn compliance into a catalyst for growth, operational resilience, and enduring regulatory trust.

More than 3,000 enterprises across 75 countries rely on our end-to-end platform and the world’s most comprehensive financial crime risk intelligence. With full-stack agentic automation, we help organizations automate up to 95% of KYC, AML, and sanctions reviews, cut onboarding times by 50%, reduce false positives by 70%, and handle 7x more work with the same staff.

ComplyAdvantage is headquartered in London and has global hubs in New York, Lisbon, Singapore, and Cluj-Napoca. It is backed by Balderton Capital, Index Ventures, Ontario Teachers’ Pension Plan, Goldman Sachs, and Andreessen Horowitz. Learn more about compliance re-engineered for the age of AI at complyadvantage.com.

Benefits

Equity Compensation

Equity as we want you to have a part of what we are building

Health Insurance

Private medical insurance designed to keep you ensuring peace of mind while you excel in your career

Learning Budget

Annual learning budget (prorated based on start date) to drive your performance and career development

Home Office Starter Budget

You will also get a new starter budget to kit out your home office

Paid Time Off

Unlimited Time Off Policy- A work-life balance and focus on our well-being are critical to keeping us performing at our best

Remote-Friendly

We embrace a hybrid approach that requires employees to be in the office for two days a week.

ComplyAdvantage builds AI-driven solutions focused on fraud detection and anti-money laundering (AML) risk management for the financial markets and regulated industries. Our platform automates compliance processes, helping businesses turn regulatory requirements into a strategic advantage while ensuring robust protection against financial crime. With a rapidly growing customer base of over 3,000 enterprises across 75 countries, we provide cutting-edge risk intelligence that empowers organizations to thrive in a complex compliance landscape.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Senior Reliability Engineer Q&A's
Report this job
Apply for this job