QA Engineer

AI overview

Collaborate with engineering teams to improve alerting workflows, validate incident pipelines, and enhance monitoring systems using tools like Datadog and Prometheus.

QA Engineer - Job Description

Key Responsibilities

  • Test product-specific use cases and validate end-to-end alerting workflows across monitoring systems.

  • Simulate incidents and test scenarios that trigger alerts in tools like Datadog, Prometheus, or similar monitoring platforms.

  • Verify that alerts raised in monitoring tools are correctly consumed and acted upon by downstream systems or automated workflows.

  • Understand alert rules so test cases are easier to design, execute, debug, and maintain (alert configuration will be handled by Developers/SREs, but QA must understand them).

  • Collaborate closely with engineering teams (Developers, SREs/DevOps) to improve detection, investigation, and automated incident response.

  • Analyze alert behaviour, validate incident pipelines, and ensure seamless integration across all monitoring and automation tools.

  • Identify gaps in monitoring, logging, and alert workflows and provide clear, actionable QA feedback.

  • Document test scenarios, alert behaviour, and monitoring workflows in a clear and reproducible manner.

Mandatory Skills

  • Monitoring Tools Expertise: Hands-on experience with at least one major monitoring system (Datadog or Prometheus), including working with alerts, dashboards, and troubleshooting.

  • Alert Simulation & Validation: Ability to trigger, simulate, and validate alert events end-to-end.

  • Incident Workflow Understanding: Strong understanding of how alerts propagate through monitoring systems and how automated systems respond to them.

  • Automation Mindset: Ability to use or write simple scripts (Python, Shell, etc.) to simulate workloads or events that trigger alerts.

  • Communication & Problem Solving: Ability to collaborate effectively with Developers and SRE/DevOps teams to ensure monitoring accuracy.

Good to Have

  • Experience with automated incident investigation or remediation tools.

  • Familiarity with CI/CD pipelines and integrating monitoring validation into pipelines.

  • Understanding of observability fundamentals—metrics, logs, traces.

  • Exposure to infrastructure or SRE environments.

  • Basic knowledge of Kubernetes, Docker, or cloud platforms (AWS/GCP/Azure).

InfraCloud is a premier technology company specializing in Cloud Native and Kubernetes Consulting Services. They help companies modernize applications and infrastructure with cloud native technologies for resilience and scalability. With a team of elit...

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Quality Assurance (QA) Engineer Q&A's
Report this job
Apply for this job