Blink Health

Senior Cloud Resilience Architect

Remote

TLDR

Design and implement resilient cloud architectures focused on disaster recovery, partnering with cross-functional teams to ensure fault tolerance and recovery best practices.

Company Overview:

Blink Health is the fastest growing healthcare technology company that builds products to make prescriptions accessible and affordable to everybody. Our two primary products – BlinkRx and Quick Save – remove traditional roadblocks within the current prescription supply chain, resulting in better access to critical medications and improved health outcomes for patients.

BlinkRx is the world’s first pharma-to-patient cloud that offers a digital concierge service for patients who are prescribed branded medications. Patients benefit from transparent low prices, free home delivery, and world-class support on this first-of-its-kind centralized platform. With BlinkRx, never again will a patient show up at the pharmacy only to discover that they can’t afford their medication, their doctor needs to fill out a form for them, or the pharmacy doesn’t have the medication in stock.

We are a highly collaborative team of builders and operators who invent new ways of working in an industry that historically has resisted innovation. Join us!

Responsibilities

Evaluate and mature the organization’s disaster recovery posture, including recovery objectives (RTO/RPO), dependency mapping, and failure domain analysis across applications, data, and infrastructure.
Define, document, and establish disaster recovery standards and best practices across cloud infrastructure, platforms, and application architectures.
Partner with SRE, platform, security, and product engineering teams to design and implement resilient, fault-tolerant systems, progressing from backup-based recovery to multi-region and active-active architectures.
Lead the disaster recovery roadmap, balancing technical feasibility, cost, risk, and business priorities.
Design and recommend reference architectures for disaster recovery patterns, including pilot-light, warm standby, hot standby, and active-active.
Drive adoption of active-active disaster recovery for critical systems, including traffic management, data replication, consistency models, and automated failover.
Define and operationalize testing strategies for DR, including game days, chaos testing, and regular recovery exercises.
Establish clear documentation, runbooks, and escalation paths to ensure recoverability is well understood and not dependent on individuals.
Evaluate and recommend platform upgrades, cloud services, and tooling that improve resilience, recovery speed, and reliability.
Serve as a technical authority and advisor on disaster recovery and resilience for leadership and engineering teams.
Provide architectural guidance, design reviews, and mentorship to engineers implementing DR-related changes.
Partner with security and compliance teams to ensure DR strategies meet regulatory, audit, and data protection requirements.

Desired Experience

Bachelor’s or Master’s degree in Computer Science or equivalent practical experience.
8+ years of experience in cloud infrastructure, platform engineering, SRE, or reliability-focused architecture roles.

Disaster Recovery & Resilience

Deep understanding of disaster recovery concepts including RTO/RPO, blast radius reduction, failure domains, and dependency isolation.
Proven experience designing and implementing multi-region and multi-availability zone architectures.
Hands-on experience moving systems toward active-active or highly available architectures.
Strong grasp of data replication strategies, consistency tradeoffs, and recovery patterns for databases and stateful systems.

Cloud & Platform Engineering

Extensive experience with major cloud providers (AWS preferred, GCP/Azure acceptable).
Strong understanding of managed cloud services and their DR characteristics and limitations.
Experience with Kubernetes-based platforms, including regional failover, workload portability, and cluster recovery strategies.
Familiarity with global traffic management, DNS, load balancing, and service mesh patterns.

Automation & Infrastructure as Code

Experience designing and maintaining Infrastructure as Code using tools such as Terraform, Pulumi, CloudFormation, or Ansible.
Strong focus on automation for recovery workflows, failover testing, and environment provisioning.
Ability to eliminate manual recovery steps and reduce time-to-recovery through software.

Operational Excellence

Experience defining and running DR tests, game days, and failure simulations.
Comfortable working across organizational boundaries to influence priorities and standards.
Strong documentation and communication skills, with the ability to translate complex technical risk into business impact.

Why Join Us:

It is rare to have a company that both deeply impacts its customers and is able to provide its services across a massive population. At Blink, we have a huge impact on people when they are most vulnerable: at the intersection of their healthcare and finances. We are also the fastest growing healthcare company in the country and are driving that impact across millions of new patients every year. Our business model not only helps people, but drives economics that allow us to build a generational company. We are a relentlessly learning, constantly curious, and aggressively collaborative cross-functional team dedicated to inventing new ways to improve the lives of our customers.

We are an equal opportunity employer and value diversity of all kinds. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Applicants who provide their phone number and consent to receive text messages may receive SMS or MMS updates from Blink Health regarding their application.

Apply for this job

Blink Health

Blink Health is a healthcare technology company that builds innovative products designed to make prescriptions accessible and affordable for everyone. With solutions like BlinkRx and Quick Save, they tackle traditional barriers in the prescription supply chain, ensuring that millions of people can access essential medications and improve their health outcomes.

Founded: Founded 2014
Employees: 201-500 employees
Industry: Health Care Providers & Services
Total raised: $170M raised

View company profile

Architect

Report this job