Senior Platform Engineer/SRE - Tech Lead Critical Infrastructure Transformation

AI overview

Lead the transformation of CloudLinux's infrastructure by architecting a self-service internal development platform impacting over 4,000 global hosting providers.

Build the internal platform that powers our engineering teams, delivering mission-critical software to 4,000+ cloud hosting providers worldwide.

CloudLinux powers 4,000+ hosting providers managing millions of websites globally. Our infrastructure team is at a critical inflection point – moving from 8+ years of technical debt to building a modern platform. This isn't a typical SRE role; it's a chance to architect the future of infrastructure that cannot fail.

Where we are: Legacy systems, reactive operations, bus factor = 1. OpenNebula bottlenecks blocking releases. 70% time on firefighting.

Where we're going: Self-service platform, Infrastructure as Code, proactive engineering. You'll be one of 2-3 senior engineers leading this transformation alongside a new Infrastructure Director with full B-level support.

What You'll Actually Do

Stabilize & Assess:

  • Deep dive into OpenNebula issues with the existing team
  • Map critical dependencies and single points of failure
  • Implement quick wins (automated VM cleanup, monitoring gaps)
  • Begin documenting undocumented systems

Build Foundation:

  • Leading the design and development of an internal development platform (IDP)
  • Implement GitOps for critical workflows
  • Establish SLIs/SLOs for core services
  • Create runbooks for top incidents

Transform Platform:

  • Architect self-service Internal Developer Platform
  • Drive Infrastructure as Code to 60%+ coverage
  • Eliminate single points of failure
  • Drive development and implementation of complex architectural decisions

Technical Stack You'll Transform

Current:

  • Virtualization: OpenNebula (main bottleneck), oVirt/OpenStack/CloudStack, KVM
  • Storage: Ceph (recently stabilized), Cephadm, Rook
  • Network: Juniper
  • Bare metal (3 Datacenters) + AWS + Google Cloud + Azure
  • Automation: ~5% Terraform coverage, manual operations dominant
  • CI/CD: Gitlab, Jenkins, Gerrit, Github

Your Tools for Transformation:

  • Kubernetes & KubeVirt and/or all necessary
  • Terraform/Terragrunt + Ansible
  • GitOps (ArgoCD/Flux)
  • Python/Go for custom tooling
  • Modern observability stack

Requirements

To thrive in this role, we are looking for someone who has:

  • Migrated legacy systems to modern platforms at scale
  • Strong Kubernetes production experience (multi-tenant, federation)
  • Infrastructure as Code expertise (Terraform/Ansible in production)
  • Linux at scale (RHEL/CentOS/AlmaLinux, 1000+ servers)
  • Network fundamentals, underlay, overlay, (EVPN, BGP, VXLAN, DNS, network architecture & segmentation, native pod networking at scale)
  • Proven ability to work independently with minimal documentation
  • Experience building self-service platforms
  • English B2+ and excellent documentation skills

Critical Mindset:

  • Comfortable with ambiguity and technical debt
  • Pragmatic: know when to fix vs. replace vs. work around
  • Can balance firefighting with strategic improvements
  • Strong opinions, loosely held
  • Teaching mentality – you'll help upskill the team

What Makes You Successful Here:

  • You'll have significant technical decision-making power and direct impact
  • New Infrastructure Director + B-level backing for transformation
  • Approved investment in people and technology
  • Full authority to simplify and modernize
  • Protected time for strategic work, not just operations

The Opportunity

This isn't about maintaining the status quo. You'll:

  • Define infrastructure strategy affecting 4,000+ companies
  • Build an internal development platform
  • Lead technical transformation with real budget and support
  • Become the principal architect of a modern platform
  • Work directly with the Infrastructure Director
  • Shape how critical infrastructure software gets delivered globally

Benefits

What's in it for you?

  • Competitive senior-level compensation.
  • A focus on professional development.
  • Interesting and challenging projects.
  • Fully remote work with flexible working hours, which allows you to schedule your day and work from any location worldwide.
  • Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
  • Compensation for private medical insurance.
  • Co-working and gym/sports reimbursement.
  • Budget for education.
  • The opportunity to receive a reward for the most innovative idea that the company can patent.

Apply If You:

  • Thrive in high-impact, high-autonomy environments
  • Want to transform, not just maintain
  • Can see through chaos to architectural solutions
  • Are excited by the challenge, not scared by the current state
  • Believe infrastructure should be invisible when working, invaluable when measured

We're specifically looking for someone who has successfully navigated similar transformations. If you've only worked in already-stable environments, this role will be challenging. But if you've turned chaos into platform excellence before – let's talk.

By applying for this position, you consent to the processing of your personal data as described in our Privacy Policy (https://cloudlinux.com/candidate-privacy-notice), which provides detailed information on how we maintain and handle your data.

Perks & Benefits Extracted with AI

  • Flexible Work Hours: Fully remote work with flexible working hours, which allows you to schedule your day and work from any location worldwide.
  • Health Insurance: Compensation for private medical insurance.
  • Learning Budget: Budget for education.
  • Reward for innovative ideas: The opportunity to receive a reward for the most innovative idea that the company can patent.
  • Paid Time Off: Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.

CloudLinux is on a mission to make Linux secure, stable, and profitable. We have spent more than 500 combined years working on Linux, and are changing how hosting companies and data centers use this technology we love by bringing it to millions of their customers. With more than 500,000 product installations and 4,000 customers, including Liquid Web, 1&1, and Dell, CloudLinux combines in-depth technical knowledge of hosting, kernel development, and open source with unique client care expertise.CloudLinux team members are not tied to a physical office location, and everyone works remotely full-time. We provide flexible working hours and an open management style, to avoid unnecessary bureaucracy and excessive control while getting the best from our employees. This system allows each of us to fully realize our ideas and ambitions, while comfortably combining work with our usual lifestyles.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Senior Platform Engineer Q&A's
Report this job

This job is no longer available