[Job-26769] Senior/Specialist SRE, Brazil

AI overview

Enhance visibility and performance of over 100 applications through innovative observability solutions, transforming data into actionable insights for high-availability systems.
We are tech transformation specialists, uniting human expertise with AI to create scalable tech solutions. With over 8,000 CI&Ters around the world, we’ve built partnerships with more than 1,000 clients during our 30 years of history. Artificial Intelligence is our reality. Your mission: The Observability & Monitoring Specialist is responsible for enhancing the visibility, performance, and health of SCT’s application landscape. This role focuses on bridging existing monitoring gaps across 100+ applications by leveraging modern observability solutions to ensure proactive issue detection and rapid incident resolution. Working as a core part of the infrastructure team, this specialist will optimize our monitoring stack—specifically Splunk, LogicMonitor, and AppDynamics—to transform raw data into actionable insights, ensuring that development and operations teams have the telemetry needed to maintain high-availability systems. Key Responsibilities: - Advanced Observability & Monitoring Strategy: Identify and remediate visibility gaps across a landscape of 100+ applications, ensuring full-stack monitoring from infrastructure to the end-user experience. Design and implement modern monitoring patterns to move from reactive alerting to proactive anomaly detection. - Platform Management (Splunk, LogicMonitor, AppDynamics) Splunk: Optimize log aggregation, create complex dashboards, and develop advanced queries (SPL) to support troubleshooting and security auditing. LogicMonitor: Manage system-level monitoring (CPU, Memory, Disk, Network) and refine threshold logic to reduce alert noise while maintaining high sensitivity to critical failures. AppDynamics: Configure Application Performance Monitoring (APM) to track business transactions, map dependencies, and identify code-level bottlenecks. - Reliability Engineering & Performance Analysis: Correlate data across multiple platforms to provide a holistic view of system health and performance. Partner with application owners to define meaningful Service Level Indicators (SLIs) and Service Level Objectives (SLOs). - Automation & Modernization: Automate the deployment and configuration of monitoring agents across Windows and Linux environments to ensure "monitoring-as-code" standards. Advocate for and implement modern observability solutions (OpenTelemetry, tracing, etc.) as the application landscape evolves. - Deployment & Incident Response Support: Provide deep-dive technical support during high-priority incidents by leveraging AppDynamics and Splunk for rapid root cause analysis (RCA). Create and maintain operational dashboards and runbooks that enable 24x7 support teams to respond effectively to alerts. - Continuous Improvement & Governance: Audit the existing monitoring environment to eliminate redundant alerts and ensure compliance with internal ITGC and security standards. Conduct knowledge-sharing sessions to empower application teams to self-serve using the observability toolkit. Professional Expectations: - Fluent English skills to interact with multicultural team and Amerian client everyday. - Demonstrate a "data-driven" mindset, using metrics to influence technical decisions and infrastructure investments. - Exhibit strong collaboration skills, acting as the bridge between infrastructure stability and application performance. - Proactively identify trends in system behavior to prevent outages before they impact the business. If you like it, just apply and good luck! #LI-JM2
Our benefits:

-Health and dental insurance
-Meal and food allowance
-Childcare assistance
-Extended paternity leave
-Partnership with gyms and health and wellness professionals via Wellhub (Gympass) TotalPass;
-Profit Sharing and Results Participation (PLR);
-Life insurance
-Continuous learning platform (CI&T University);
-Discount club
-Free online platform dedicated to physical, mental, and overall well-being
-Pregnancy and responsible parenting course
-Partnerships with online learning platforms
-Language learning platform
And many more!

More details about our benefits here: https://ciandt.com/br/pt-br/carreiras

At CI&T, inclusion starts at the first contact. If you are a person with a disability, it is important to present your assessment during the selection process. See which data needs to be included in the report by clicking here.This way, we can ensure the support and accommodations that you deserve. If you do not yet have the assessment, don't worry: we can support you in obtaining it.

We have a dedicated Health and Well-being team, inclusion specialists, and affinity groups who will be with you at every stage. Count on us to make this journey side by side.

Perks & Benefits Extracted with AI

  • Health Insurance: Health and dental insurance
  • Learning Budget: Continuous learning platform (CI&T University)
  • Language learning platform: Language learning platform
  • Paid Parental Leave: Extended paternity leave

CI&T is the digital technology agency empowering agile growth for the world's biggest companies by leveraging advanced technologies including Cloud, IoT, Big Data, Machine Learning/AI, Social, Mobility. For over 20 years, CI&T has been a trusted partne...

View all jobs
Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Specialist Q&A's
Report this job
Apply for this job