Senior Cloud Resilience Architect

AI overview

Design and implement resilient cloud architectures focused on disaster recovery, partnering with cross-functional teams to ensure fault tolerance and recovery best practices.

Company Overview:

Blink Health is the fastest growing healthcare technology company that builds products to make prescriptions accessible and affordable to everybody.  Our two primary products – BlinkRx and Quick Save – remove traditional roadblocks within the current prescription supply chain, resulting in better access to critical medications and improved health outcomes for patients. 

BlinkRx is the world’s first pharma-to-patient cloud that offers a digital concierge service for patients who are prescribed branded medications. Patients benefit from transparent low prices, free home delivery, and world-class support on this first-of-its-kind centralized platform. With BlinkRx, never again will a patient show up at the pharmacy only to discover that they can’t afford their medication, their doctor needs to fill out a form for them, or the pharmacy doesn’t have the medication in stock. 

We are a highly collaborative team of builders and operators who invent new ways of working in an industry that historically has resisted innovation. Join us!

Responsibilities

  • Evaluate and mature the organization’s disaster recovery posture, including recovery objectives (RTO/RPO), dependency mapping, and failure domain analysis across applications, data, and infrastructure.

  • Define, document, and establish disaster recovery standards and best practices across cloud infrastructure, platforms, and application architectures.

  • Partner with SRE, platform, security, and product engineering teams to design and implement resilient, fault-tolerant systems, progressing from backup-based recovery to multi-region and active-active architectures.

  • Lead the disaster recovery roadmap, balancing technical feasibility, cost, risk, and business priorities.

  • Design and recommend reference architectures for disaster recovery patterns, including pilot-light, warm standby, hot standby, and active-active.

  • Drive adoption of active-active disaster recovery for critical systems, including traffic management, data replication, consistency models, and automated failover.

  • Define and operationalize testing strategies for DR, including game days, chaos testing, and regular recovery exercises.

  • Establish clear documentation, runbooks, and escalation paths to ensure recoverability is well understood and not dependent on individuals.

  • Evaluate and recommend platform upgrades, cloud services, and tooling that improve resilience, recovery speed, and reliability.

  • Serve as a technical authority and advisor on disaster recovery and resilience for leadership and engineering teams.

  • Provide architectural guidance, design reviews, and mentorship to engineers implementing DR-related changes.

  • Partner with security and compliance teams to ensure DR strategies meet regulatory, audit, and data protection requirements.

Desired Experience

  • Bachelor’s or Master’s degree in Computer Science or equivalent practical experience.

  • 8+ years of experience in cloud infrastructure, platform engineering, SRE, or reliability-focused architecture roles.

Disaster Recovery & Resilience

  • Deep understanding of disaster recovery concepts including RTO/RPO, blast radius reduction, failure domains, and dependency isolation.

  • Proven experience designing and implementing multi-region and multi-availability zone architectures.

  • Hands-on experience moving systems toward active-active or highly available architectures.

  • Strong grasp of data replication strategies, consistency tradeoffs, and recovery patterns for databases and stateful systems.

Cloud & Platform Engineering

  • Extensive experience with major cloud providers (AWS preferred, GCP/Azure acceptable).

  • Strong understanding of managed cloud services and their DR characteristics and limitations.

  • Experience with Kubernetes-based platforms, including regional failover, workload portability, and cluster recovery strategies.

  • Familiarity with global traffic management, DNS, load balancing, and service mesh patterns.

Automation & Infrastructure as Code

  • Experience designing and maintaining Infrastructure as Code using tools such as Terraform, Pulumi, CloudFormation, or Ansible.

  • Strong focus on automation for recovery workflows, failover testing, and environment provisioning.

  • Ability to eliminate manual recovery steps and reduce time-to-recovery through software.

Operational Excellence

  • Experience defining and running DR tests, game days, and failure simulations.

  • Comfortable working across organizational boundaries to influence priorities and standards.

  • Strong documentation and communication skills, with the ability to translate complex technical risk into business impact.

 

Why Join Us:

It is rare to have a company that both deeply impacts its customers and is able to provide its services across a massive population.  At Blink, we have a huge impact on people when they are most vulnerable: at the intersection of their healthcare and finances. We are also the fastest growing healthcare company in the country and are driving that impact across millions of new patients every year.  Our business model not only helps people, but drives economics that allow us to build a generational company. We are a relentlessly learning, constantly curious, and aggressively collaborative cross-functional team dedicated to inventing new ways to improve the lives of our customers.

We are an equal opportunity employer and value diversity of all kinds. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Blink Health is a digital health company revolutionizing the prescription medication industry by providing affordable and accessible medications to millions of people across America. Their cloud-based pharmacy platform eliminates traditional roadblocks...

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Architect Q&A's
Report this job
Apply for this job