Senior Platform Engineer/SRE - Tech Lead Critical Infrastructure

Build the internal platform that powers our engineering teams, delivering mission-critical software to 4,000+ cloud hosting providers worldwide.

CloudLinux powers 4,000+ hosting providers managing millions of websites globally. Our infrastructure team is at a critical inflection point – moving from 8+ years of technical debt to building a modern platform. This isn't a typical SRE role; it's a chance to architect the future of infrastructure that cannot fail.

Where we are: Legacy systems, reactive operations, bus factor = 1. OpenNebula bottlenecks blocking releases. 70% time on firefighting.

Where we're going: Self-service platform, Infrastructure as Code, proactive engineering. You'll be one of 2-3 senior engineers leading this transformation alongside a new Infrastructure Director with full B-level support.

What You'll Actually Do

Stabilize & Assess:

Deep dive into OpenNebula issues with the existing team
Map critical dependencies and single points of failure
Implement quick wins (automated VM cleanup, monitoring gaps)
Begin documenting undocumented systems

Build Foundation:

Leading the design and development of an internal development platform (IDP)
Implement GitOps for critical workflows
Establish SLIs/SLOs for core services
Create runbooks for top incidents

Transform Platform:

Architect self-service Internal Developer Platform
Drive Infrastructure as Code to 60%+ coverage
Eliminate single points of failure
Drive development and implementation of complex architectural decisions

Technical Stack You'll Transform

Current:

Virtualization: OpenNebula (main bottleneck), oVirt/OpenStack/CloudStack, KVM
Storage: Ceph (recently stabilized), Cephadm, Rook
Network: Juniper
Bare metal (3 Datacenters) + AWS + Google Cloud + Azure
Automation: ~5% Terraform coverage, manual operations dominant
CI/CD: Gitlab, Jenkins, Gerrit, Github

Your Tools for Transformation:

Kubernetes & KubeVirt and/or all necessary
Terraform/Terragrunt + Ansible
GitOps (ArgoCD/Flux)
Python/Go for custom tooling
Modern observability stack

Requirements

To thrive in this role, we are looking for someone who has:

Migrated legacy systems to modern platforms at scale
Strong Kubernetes production experience (multi-tenant, federation)
Infrastructure as Code expertise (Terraform/Ansible in production)
Linux at scale (RHEL/CentOS/AlmaLinux, 1000+ servers)
Network fundamentals, underlay, overlay, (EVPN, BGP, VXLAN, DNS, network architecture & segmentation, native pod networking at scale)
Proven ability to work independently with minimal documentation
Experience building self-service platforms
English B2+ and excellent documentation skills

Critical Mindset:

Comfortable with ambiguity and technical debt
Pragmatic: know when to fix vs. replace vs. work around
Can balance firefighting with strategic improvements
Strong opinions, loosely held
Teaching mentality – you'll help upskill the team

What Makes You Successful Here:

You'll have significant technical decision-making power and direct impact
New Infrastructure Director + B-level backing for transformation
Approved investment in people and technology
Full authority to simplify and modernize
Protected time for strategic work, not just operations

The Opportunity

This isn't about maintaining the status quo. You'll:

Define infrastructure strategy affecting 4,000+ companies
Build an internal development platform
Lead technical transformation with real budget and support
Become the principal architect of a modern platform
Work directly with the Infrastructure Director
Shape how critical infrastructure software gets delivered globally

Benefits

What's in it for you?

Competitive senior-level compensation.
A focus on professional development.
Interesting and challenging projects.
Fully remote work with flexible working hours, which allows you to schedule your day and work from any location worldwide.
Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
Compensation for private medical insurance.
Co-working and gym/sports reimbursement.
Budget for education.
The opportunity to receive a reward for the most innovative idea that the company can patent.

Apply If You:

Thrive in high-impact, high-autonomy environments
Want to transform, not just maintain
Can see through chaos to architectural solutions
Are excited by the challenge, not scared by the current state
Believe infrastructure should be invisible when working, invaluable when measured

We're specifically looking for someone who has successfully navigated similar transformations. If you've only worked in already-stable environments, this role will be challenging. But if you've turned chaos into platform excellence before – let's talk.

By applying for this position, you consent to the processing of your personal data as described in our Privacy Policy (https://cloudlinux.com/candidate-privacy-notice), which provides detailed information on how we maintain and handle your data.

Senior Platform Engineer/SRE - Tech Lead Critical Infrastructure Transformation

AI overview

Perks & Benefits Extracted with AI

This job is no longer available