Site Reliability Engineer (m/f/d) in Austin, TX

Description

Too much data, not enough insight?
We get it. At KNIME, we build software that helps people clean, combine, and understand their data: fast, efficiently, and without code.

And with our focus on Data Analytics & AI, we empower everyone to turn complex challenges into clear, actionable insights.
You can help make that happen.

We’re not just an open-source data analytics company, we’re a fast-growing, globally recognized pioneer at the intersection of data and AI. With users in every industry and an international team from 30+ nationalities as well as a thriving open community.

Join us as a Site Reliability Engineer in Austin* and help us build and operate reliable, scalable cloud platforms for our next generation of products.

Who you are

Reliability-driven: You care deeply about stable, scalable systems and design infrastructure with reliability in mind from day one.

Cloud-native engineer: You have hands-on experience building and operating cloud infrastructure, ideally in Azure; AWS experience is a plus.

Kubernetes-savvy: You’re comfortable running and supporting production Kubernetes clusters and understand common deployment patterns.

Automation-first: You use code and Infrastructure as Code tools to eliminate manual work and scale operations sustainably.

Systems thinker: You bring strong Linux and networking knowledge and understand how distributed systems behave in production.

Curious & improvement-oriented: You enjoy learning, challenging existing setups, and continuously improving how systems are built and operated.

That's the job

Production ownership: Build, operate, and scale cloud-native SaaS platforms used by thousands of users.

Infrastructure as Code: Design and maintain reproducible infrastructure using tools like Terraform, Helm, or similar.

Reliability standards: Define and improve practices around availability, observability, monitoring, and performance.

Incident response: Troubleshoot production issues, perform root cause analysis, and implement long-term fixes.

Identity & security: Support authentication and authorization in multi-tenant SaaS environments.

Cross-team collaboration: Work closely with product and engineering teams to bring features into production reliably.

What we offer

Impact: Ownership of critical infrastructure used by thousands of users worldwide.

Engineering depth: Modern cloud-native stack with real technical challenges.

Autonomy & trust: High responsibility with room to shape solutions.

Open-source mindset: User-first thinking and pragmatic engineering decisions.

Learning: Continuous growth alongside experienced platform engineers.

Flexibility: A work environment that supports sustainable ways of working.


Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job