AppZen is the leader in autonomous spend-to-pay software. Its patented artificial intelligence accurately and efficiently processes information from thousands of data sources so that organizations can better understand enterprise spend at scale to make smarter business decisions. It seamlessly integrates with existing accounts payable, expense, and card workflows to read, understand, and make real-time decisions based on your unique spend profile, leading to faster processing times and fewer instances of fraud or wasteful spend. Global enterprises, including one-third of the Fortune 500, use AppZen’s invoice, expense, and card transaction solutions to replace manual finance processes and accelerate the speed and agility of their businesses. To learn more, visit us at www.appzen.com. About the Role:

We are seeking a highly skilled Senior DevOps Engineer to lead the design, implementation, and continuous improvement of our cloud infrastructure, kubernetes, CI/CD pipelines, observability systems, and reliability practices. This role is critical in ensuring platform stability, scalability, security, and operational excellence across production and non-production environments. You will work closely with Engineering, Security, and Product teams to build resilient, automated, and high-performing infrastructure systems.

Key Responsibilities:

Infrastructure & Cloud Engineering: Design, implement, and manage scalable cloud infrastructure (AWS preferred)

Lead infrastructure-as-code initiatives (Terraform / CloudFormation)

Improve high availability, disaster recovery, and multi-region resilience

Optimize cloud cost and resource utilization

Kubernetes & Container Platform: Architect and manage production-grade Kubernetes clusters

Improve cluster reliability, auto-scaling, and performance

Implement workload monitoring, alerting, and SLO-based reliability standards

Enforce namespace isolation and resource governance

CI/CD & Automation: Design and optimize CI/CD pipelines (Jenkins, ArgoCD)

Implement zero-downtime deployment strategies

Automate environment provisioning (fully touchless builds with seed data)

Improve deployment reliability and rollback mechanisms

Observability & Reliability: Own monitoring, alerting, and logging strategy (Prometheus, Grafana, Datadog, etc.)

Ensure 100% monitoring coverage for critical services

Reduce Sev1/Sev2 incidents caused by infrastructure

Create and maintain runbooks (COPs) for incident response

Define SLOs, SLIs, and error budgets

Security & Compliance: Implement IAM best practices and least privilege access

Improve secrets management and credential rotation

Partner with security team on audits and compliance controls

Incident Management. Lead root cause analysis for major incidents

Drive postmortems and preventive improvements

Improve MTTR and overall operational maturity

Required Skills & Experience:

6+ years in DevOps / SRE / Cloud Engineering

Strong experience with AWS (VPC, IAM, EC2, S3, RDS, EKS, etc.)

Deep Kubernetes experience (production clusters)

Strong understanding of networking and Linux systems

Experience with Infrastructure as Code (Terraform preferred)

Experience implementing monitoring & alerting systems (Datadog, prometheus.Grafana)

Strong scripting skills (Python / Bash )

Experience managing production systems with high availability requirements

Good understanding on databases like Postgres, MySQL

Strong communication written and verbal skills

Ability to follow structured processes while being proactive in identifying improvements.

Analytical and problem-solving mindset.

Willingness to work in night shift on a long-term basis.

Senior DevOps Engineer

TLDR