Senior Engineer (AWS, CloudWatch)

AI overview

Provide Level 1 support for AWS applications, ensuring efficient monitoring and troubleshooting while collaborating with teams to enhance operational processes.

REQUIREMENTS:

  • Basic understanding of AWS infrastructure and services (e.g., EC2, RDS, ALB, S3, CloudWatch, IAM).

  • Experience with application and infrastructure monitoring tools (e.g., CloudWatch, AppDynamics, New Relic, Dynatrace, Grafana).

  • Basic database knowledge and the ability to perform simple checks (e.g., MySQL, PostgreSQL, Oracle, RDS).

  • Understanding of web applications, APIs, HTTP status codes, and application errors.

  • Ability to perform log and metric analysis for initial troubleshooting.

  • Familiarity with ITSM tools for incident management (e.g., ServiceNow, Jira, Remedy).

RESPONSIBILITIES:

  • Provide Level 1 support for all client-facing applications and platforms, ensuring SLAs are met.

  • Acknowledge, log, triage, and respond to incoming incidents and alerts.

  • Perform initial troubleshooting by validating impact, collecting logs/metrics, and executing predefined runbook procedures.

  • Resolve known issues using documented processes and workarounds.

  • Escalate complex incidents to L2/L3 teams with complete diagnostic information and context.

  • Participate in major incident response bridges and provide real-time status updates.

  • Proactively monitor the health and performance of client-facing web/mobile applications, APIs, and integrated services.

  • Monitor AWS infrastructure (EC2, RDS, ALB, S3, CloudWatch) and databases for alerts and performance degradation.

  • Conduct routine application and service health checks.

  • Identify performance anomalies, error patterns, and network latency issues, escalating as required.

  • Fine-tune monitoring alerts and thresholds to improve signal clarity and reduce noise.

  • Perform basic database operational checks (e.g., connectivity, disk usage, backup status).

  • Validate application functionality and user-reported issues at the L1 level.

  • Coordinate with application owners, infrastructure teams, and third-party vendors for issue resolution.

  • Maintain and update knowledge base articles, runbooks, and operational documentation.

  • Document recurring incidents, known errors, and effective workarounds.

  • Support root cause analysis (RCA) by providing detailed L1 observations and data.

  • Identify and suggest opportunities to improve monitoring, alerting, and operational processes.

Bachelor’s or master’s degree in computer science, Information Technology, or a related field.

👋🏼 We're Nagarro.We are a digital product engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale — across all devices and digital mediums, and our people exist everywhere in the world (19,500+ experts across 36 countries, to be exact). Our work culture is dynamic and non-hierarchical. We're looking for great new colleagues. That's where you come in!By this point in your career, it is not just about the tech you know or how well you can code. It is about what more you want to do with that knowledge. Can you help your teammates proceed in the right direction? Can you tackle the challenges our clients face while always looking to take our solutions one step further to succeed at an even higher level? Yes? You may be ready to join us.

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Senior Engineer Q&A's
Report this job
Apply for this job