π Senior Observability Engineer β Datadog SME (LATAM)
We are looking for a Senior Observability Engineer with deep expertise in Datadog to join our Digital Ops team. This role is focused on owning and evolving the observability strategy for a large-scale, cloud-native environment supporting 150+ production services across multiple regions.
As a Datadog Subject Matter Expert, you will be responsible for designing, operating, and continuously improving observability capabilities, enabling engineering teams to build reliable, performant, and cost-efficient systems. You will work closely with DevOps, SRE, and development teams in an agile environment, acting as a technical reference for observability best practices.
π Start date: ASAP
π Contract type: Full-Time, Remote, Contractor
π Work hours and location: 8.00 am to 4.00 PM MSTΒ
π οΈ What Youβll Be Doing
- Own and lead the observability architecture and strategy across cloud-native services running in multiple environments and regions.
- Act as the Datadog Subject Matter Expert, owning configuration, governance, and best practices.
- Design, implement, and maintain Datadog dashboards, monitors, alerts, SLOs, and service health views.
- Operate and optimize Datadog APM, Logs, Metrics, Synthetic Monitoring, and RUM.
- Drive alert quality improvements, signal-to-noise reduction, and proactive detection of operational issues.
- Lead Datadog cost management and usage optimization initiatives in collaboration with engineering and finance stakeholders.
- Partner with development teams to embed observability into the SDLC and production readiness processes.
- Define and document runbooks, operational procedures, and observability standards.
- Eventually participate in a shared on-call rotation, triaging and resolving production incidents, acting as incident commander when needed, and leading post-incident reviews.
- Continuously identify opportunities for automation and toil reduction across observability and operational workflows.
- Set, track, and report on operational excellence metrics including reliability, performance, availability, security, and cost.
β
What You Need to Succeed
Must-haves
- 3+ years of deep, hands-on experience with Datadog as an observability platform in production environments.
- 5+ years of experience in DevOps, SRE, or Cloud Engineering roles supporting customer-facing systems.
- Strong practical experience with Datadog APM, Logs, Metrics, dashboards, monitors, alerts, and SLOs.
- Hands-on experience with Azure, Kubernetes, Terraform, Docker, and GitOps-based workflows.
- Proven experience operating 24x7 production environments, including incident response, root cause analysis, and post-mortems.
- Solid understanding of cloud-native architectures, distributed systems, and modern observability principles.
- Ability to work independently in a fully remote, distributed team, with strong communication and collaboration skills.
Nice to have
- Experience with ArgoCD, Azure DevOps CI/CD pipelines, and infrastructure automation.
- Exposure to Databricks, SQL-based systems, or data-intensive platforms.
- Hands-on experience building or extending custom DevOps/SRE tooling to reduce operational toil.
- Relevant certifications (e.g. Datadog, Azure, Cloud Architecture, ITIL).
π§ Our Recruitment Process
Hereβs what to expect from our candidate-friendly interview process:
-
Initial Interview β 60 minutes with our Talent Acquisition Specialist
-
Culture Fit β 30 minutes with our Team Engagement Manager
-
Technical Assessment β Online Challenge/Multiple Choice Questionnaire
-
Final Stage β 60 minutes with the Hiring Manager
π Why Join Launchpad?
We believe that great work starts with great people. At Launchpad, we offer:
- People first culture
- Excellent compensation
- Hardware setup for working from home
- Agile methodologies
- Diverse and multicultural work environment
- Training allowances
β¦and more!
β¨ Ready to make your mark? Apply now and be part of something exciting.