Crusoe
Crusoe

Infrastructure Engineer

TLDR

Infrastructure Engineers ensure the stability of Crusoe's AI hardware platform through troubleshooting, automation, and collaboration with various teams.

Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.

We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.

We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.

If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.

About the Role:

At Crusoe, the Infrastructure Engineers on our Fleet Operations team play a crucial role in ensuring the reliability and stability of our hardware platform. This role involves both hands-on diagnosis and repair of rack-level GPU hardware, as well as developing automation to streamline fleet management, capacity delivery, and maintenance operations.

The ideal candidate will be working closely with Data Center Operations and Engineering teams, and playing a key part in the continuous improvement of our hardware platform's reliability and scalability, ensuring that our cutting-edge infrastructure, featuring the latest NVIDIA and AMD GPUs, continue to operate at peak efficiency for our customers.

What You’ll Be Working On

  • Problem Solving and Deep-Level Troubleshooting: Investigating and troubleshooting problems and hardware faults that our automation can’t determine within our GPU platforms. This will involve taking data from system logs, kernel logs, BMC redfish APIs, and if the data is not there, working with hardware and kernel engineers to add information you need to make accurate determinations.

  • Coordination and Collaboration: Working closely with our Data Centre Operations, Hardware Engineering and Capacity Planning teams to repair and remediate failed hardware, ensure consistent delivery of new hardware to customers, and roll out new upgrades across the fleet

  • Automation and Tool Development: Automate routine processes and build Crusoe’s hardware diagnostics, provisioning and repair tooling

  • Build Processes and Documentation: When you figure out the best way to do something, you’ll be working on building processes, documentation and tooling to help the next person who finds this problem

  • Validate and Test new hardware: Crusoe is often the first company in the world to get the latest generation AI hardware, before it’s fully tested. Conducting rigorous testing and validation on such cutting-edge hardware and servers that comes back from repair

  • On-Call: Participate in our on-call rotation, partnering with our US teams to provide follow-the-sun coverage

What You’ll Bring to the Team

  • Strong analytical, troubleshooting and problem-solving skills: Our automation takes care of the easy problems, you’ll be digging deep to figure out the hard ones

  • Linux experience: You’ll have solid unAbout the Rolederstanding of Linux internals and feel at home working in a terminal

  • Server Hardware and Provisioning: Exposure to server-class hardware & provisioning

  • Fundamentals of Hardware and Networking: You don’t need to be an expert, but you should know if an error message is due to a failed hardware component, a firmware bug, or a networking misconfiguration without escalating

  • Excellent communication and collaboration skills: You’ll be working with many different people across a lot of different teams – communication is critical

  • Education: Bachelor's Degree in Computer Science, related field, or self-educated in computer science fundamentals.

Bonus Points

  • Large-scale GPU operations: We work with cutting edge hardware and software, so we understand most people won’t have worked with it – but it would be nice if you have!

  • Programming Proficiency: Proficiency with at least one programming language (Python, Go, or similar).

Benefits:

Crusoe also offers a competitive benefits package designed to support financial security, health, and overall well-being, including pension contributions, private health and dental insurance, income protection, life assurance, and more.

Compensation:

Compensation will be paid as a salary or hourly. Compensation to be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Crusoe builds a vertically integrated AI infrastructure that encompasses every layer from energy production to powering sophisticated AI workloads. We're dedicated to aligning cutting-edge computing with climate solutions, offering approaches that not only maximize resource efficiency but also reduce greenhouse gas emissions.

Founded
Founded 2018
Employees
51-200 employees
Industry
IT Services
Total raised
$750M raised
View company profile
Report this job
Apply for this job