Site Reliability Engineer - Public Sector

AI overview

Drive mission-grade reliability and security for cloud infrastructure, ensuring compliance and managing automated deployments while enhancing developer experience in federal environments.
At Unstructured, we’re building the backbone of generative AI—helping federal agencies transform PDFs, HTML, Word docs, images, and more into secure, high-performance data pipelines that scale. Our tools are trusted by nearly half of the Fortune 500 and downloaded more than 38 million times in the open-source community. We’re expanding our federal/public sector practice, and we’re hiring a Public Sector Site Reliability Engineer (SRE) to help design, scale, and secure the systems that power the next generation of AI-driven workloads for government. What You’ll Own & Drive 🔐 Mission-Grade Reliability & Security Design, build, and manage secure, highly available, and scalable cloud infrastructure for federal environments. Ensure compliance with FedRAMP, FISMA, and other relevant security and regulatory frameworks. Develop IaC with Terraform, Pulumi, or similar for repeatable, compliant deployments. Build and maintain automated CI/CD pipelines that move fast without sacrificing security or stability. 📊 Full Observability in Sensitive Environments Implement/maintain monitoring, logging, and alerting (Prometheus, Grafana, Datadog, Elastic). Enable real-time visibility and rapid response for mission-critical workloads. Partner with engineering and program teams for high-assurance rollouts. Lead capacity planning, deployment strategies, and resilient architecture design for federal networks. 🔥 Incident Response & Continuous Improvement Lead incident response and root-cause analysis with a blameless, systems-thinking approach. Drive postmortems and reliability improvements. Enhance developer experience with secure automation and streamlined workflows. Help teams iterate quickly while maintaining compliance and operational excellence. What You Bring 5–9 years managing software deployed to US government or Department of Defense (DOD) networks Active SECRET clearance required; TS/SCI strongly preferred Expertise with AWS GovCloud and/or Azure Government. Deep experience with Kubernetes, Docker, and container orchestration at scale. Strong Linux systems and networking fundamentals. Scripting/automation: Python, Bash, or Go. IaC: Terraform, Pulumi, Ansible (or similar). Strong grasp of monitoring, logging, and observability best practices. Travel required up to 20% Bonus Points ML infrastructure or real-time data pipelines experience. Serverless or event-driven architectures. Contributions to open-source DevOps/SRE projects. Hands-on work with US government security/compliance in cloud-native settings. Unstructured values service and encourages veterans of the US military and civilian agencies to apply to this role. Why You’ll Love It Here Mission Impact: Power critical AI workloads in the public sector. Big Technical Challenges: High-assurance problems at the edge of AI, data, and cloud. Elite Team: Sharp, low-ego engineers who value execution and learning. Innovation + Security: Build cutting-edge systems with rigorous reliability for federal use cases.
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job