TensorWave

Storage Engineer

Las Vegas, Nevada, U.S.

Full-Time

On-site

TLDR

Design and optimize NFS-based storage systems to support high-performance AI workloads, with a focus on low latency and reliability.

Our mission at TensorWave Cloud is to build seamless, secure, reliable, and resilient AI infrastructure at scale, eliminating barriers and challenging the status quo to empower builders and support AI innovation.

About the role

We are looking for a Storage Engineer with deep expertise in NFS-based storage and modern high-performance file systems, specifically VAST Data and WEKA. This role exists to ensure our shared storage platforms are fast, reliable, and scalable.

You will own the design, operation, and performance of our file storage layer, supporting workloads that depend on low latency, high throughput, and predictable behavior. This is a hands-on role for someone who understands storage at the protocol and system level.

If you think in terms of NFS semantics, metadata performance, failure domains, and throughput per node, this role is for you.

Responsibilities

Design, deploy, and operate NFS-based storage systems for production workloads
Own and operate VAST Data and WEKA clusters in production environments
Architect storage for high-throughput, low-latency shared file access
Tune and optimize NFS performance (mount options, client behavior, server-side tuning)
Manage capacity planning, scaling, and rebalancing for VAST and WEKA systems
Diagnose and resolve storage performance issues (latency spikes, metadata bottlenecks, throughput drops)
Design and test failure and recovery scenarios (node failures, network issues, disk loss)
Lead upgrades, expansions, and maintenance with minimal or zero downtime
Partner with infrastructure and application teams to ensure workloads are well-matched to storage behavior
Document operational runbooks and establish best practices for shared file storage

You Are Obsessed With:

NFS that behaves predictably under load
Consistent latency and throughput at scale
Understanding exactly how storage fails — before it does
File systems that scale without becoming fragile
Making shared storage invisible to users because it just works

Required Experience

Strong hands-on experience with NFS in production environments
Direct experience operating VAST Data and/or WEKA systems
Deep understanding of distributed file systems and shared storage architectures
Strong knowledge of storage performance fundamentals (latency, throughput, metadata operations)
Experience troubleshooting complex storage and networking interactions
Solid Linux systems knowledge, especially around filesystem and I/O behavior
Ability to reason about failure domains, recovery paths, and data integrity

Preferred Experience

Experience supporting AI/ML, HPC, or data-intensive workloads
Familiarity with RDMA, high-speed networking, or NVMe-based storage
Kubernetes workloads backed by shared file system
Experience with multi-rack or multi-site storage deployments
Infrastructure-as-Code experience or automation experience

What We Bring

Mission driven company
Competitive Salary
Stock Options
100% paid Medical, Dental, and Vision insurance
Life and Voluntary Supplemental Insurance
Short Term Disability Insurance
Flexible Spending Account
401(k)
Flexible PTO
Paid Holidays
Parental Leave
Mental Health Benefits through Spring Health

We’re looking for resilient, adaptable people to join our team, people who believe in the mission and think at massive scale. The solutions that worked on a handful of devices will not work at Exascale. Be prepared to be pushed daily, to learn a lot, and literally build the future.

TensorWave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, national origin, or veteran status.

Benefits

Health Insurance

100% paid Medical, Dental, and Vision insurance

Mental Health Benefits

Mental Health Benefits through Spring Health

Paid Parental Leave

Parental Leave

Apply for this job

TensorWave

TensorWave delivers a high-performance cloud computing platform that leverages AMD Instinct™ GPUs to supercharge AI research and advanced workloads. Tailored for developers and researchers in the AI space, our platform removes infrastructure hurdles, enabling innovators to focus on pushing the boundaries of technology.

Founded: Founded 2023
Industry: Internet Software & Services

View company profile

Engineer

Report this job