Capital Markets Gateway
Capital Markets Gateway

Site Reliability Engineer

TLDR

As a Site Reliability Engineer, lead efforts in designing and implementing observability solutions to enhance infrastructure reliability and application performance using modern tools and practices.

Monitoring & Observability
  • Design, implement, and maintain monitoring and observability solutions using tools like Prometheus, Grafana Stack (Loki/Grafana/Tempo/Alert Manager), Datadog, and OpenTelemetry.  
  • Define and implement SLOs, SLIs, and error budgets to measure system reliability.  
  • Develop and optimize dashboards, alerts, and reports for system performance and business metrics. 
  • Alerting & Incident Management
  • Design actionable alerting strategies to minimize noise and improve MTTR.  
  • Integrate alerting systems with Jira. 
  • Establish and refine runbooks for on-call teams to handle alerts efficiently. 
  • Empower teams to ensure observability coverage and incident response practices.  
  • Performance Optimization
  • Analyze system performance metrics, identify bottlenecks, and implement optimizations to improve system efficiency, scalability, and cost-effectiveness.  
  • Help conduct load testing and capacity planning to ensure systems can handle peak traffic loads.  
  • Automation and Tooling
  • Identify opportunities for automation and develop tools to streamline operational processes, such as fail-over, configuration management, and monitoring.  
  • Implement monitoring and alerting systems within automations to detect and resolve issues proactively.  
  • Collaboration and Communication
  • Collaborate closely with cross-functional teams, including software engineers, operations, and infrastructure teams, to understand system requirements, provide technical guidance, and drive solutions.  
  • Communicate effectively to stakeholders about system changes, incidents, and improvements.  
  • Foment and spread SRE principles and practices across company.
  • Qualifications
  • Must be based in Latin America
  • English level - C1 or C2
  • Proven experience as a Site Reliability Engineer or similar role.  
  • Proficiency in logging, metrics, and tracing frameworks (DataDog, Loki, Prometheus, OpenTelemetry).  
  • Experience with cloud platforms (Azure preferred) and infrastructure-as-code tools (e.g., Terraform).  
  • Strong programming and scripting skills (Python, Bash).  
  • Proficiency in containerization technologies and orchestration tools (Docker, Kubernetes).  
  • Understandingof Linux-based systems, networking, and security principles related to containerized applications.  
  • Strong problem-solving and troubleshooting skills, with a passion for identifying and resolving complex technical issues.  
  • Excellent communication and collaboration abilities.  
  • Ability to thrive in a fast-paced, constantly evolving environment.  
  • Experience with PostgreSQL monitoring and optimization (Optional/Nice to have).
  • If you're passionate about building resilient financial systems, optimizing observability at scale, and solving real-world reliability challenges in capital markets, we’d love to have you on our team!   
    Our Tech Stack
  • Azure as an infrastructure provider. We are reviewing secondary cloud options.  
  • Docker + Kubernetes for microservice orchestration using Istio service mesh. 
  • PostgreSQL for relational db, ElasticSearch for indexing, Redis for caching.  
  • DataDog, Grafana and OpenTelemetry for observability. 
  • GitHub for our Version Control and CI (with our own runners). 
  • CD: Harness and FluxCD.  
  • Terraform and Terragrunt as IaaC. 
  • Python and bash for scripting infrastructure. 
  • React - We’re all in on React – we maintain multiple single-page React apps.
  • TypeScript – 99% of our codebase is TypeScript.  
  • Latest .NET version for our backend services.  
  • GraphQL - Our standard for API communication is GraphQL served by our DotNet Back-End.
  • We innovate with purpose  
  • We focus on outcomes vs. output  
  • We believe diverse and inclusive teams fuel innovation  
  • We are humble yet candid  
  • We do right by the customer 
  • What We Offer
  • 2 year+ contract.
  • 15 business days of vacation.
  • Tech courses and conferences.
  • Top-of-the-line MacBook.
  • Flexible working hours.
  • CMG embraces our ongoing commitment to building a culture reflecting the people, perspectives, and passions it represents. We will accept nothing less than equity, inclusion, and belonging for all. With the only constant in life being change, we will always listen, learn, and improve for the betterment of our teams, customers, and communities. CMG is proud to be an Equal Opportunity Employer. 

    Benefits

    Flexible Work Hours

    Flexible working hours.

    Capital Markets Gateway is an innovative platform that offers workflow management and data intelligence solutions specifically designed for participants in the equity capital markets (ECM). By connecting the buy side and sell side, CMG enhances decision-making and streamlines processes across the entire offering cycle, making it a pivotal tool for leading institutions looking to improve efficiency in capital formation.

    Founded
    Founded 2015
    Employees
    11-50 employees
    Industry
    Internet Software & Services
    Total raised
    $39M raised
    View company profile
    Report this job
    Apply for this job