Jotelulu
SysAdmin/SRE Mexico
TLDR
Contribute to scaling reliability of Jotelulu's cloud platform by optimizing systems and leading incident response within a collaborative engineering team.
About this role
The challengeAs a Site Reliability Engineer / Systems Administrator, you’ll play a key role in scaling and strengthening the reliability of Jotelulu’s cloud platform. Your mission will be to monitor and optimize cloud systems, automate processes, ensure effective incident management, and maintain a robust, scalable, and secure infrastructure that supports mission-critical services.
You’ll be part of an Operations & SRE environment focused on reliability, performance, and continuous improvement. Working within a highly collaborative engineering setup, you will contribute to building and maintaining infrastructure across multiple availability zones, ensuring stability and operational excellence while supporting the growth of the platform.
Collaboration will be essential. You’ll work closely with DevOps, Product, and Development teams to build reliable services, support infrastructure decisions, lead incident response, proactively detect risks, and ensure systems and teams can scale efficiently and confidently.
Requirements that are important for us
We are looking for a SysAdmin / SRE with strong experience in cloud infrastructure, systems administration, and reliability practices, capable of operating and improving large-scale environments.
Relevant experience and expected outcomes:
- Proven experience managing large-scale cloud or MSP infrastructures.
- Expert-level Linux systems administration.
- Experience with Windows Server (2012–2025) in production environments.
- Strong troubleshooting skills across systems, networking, storage, and application layers.
- Solid networking knowledge including TCP/IP, DNS, load balancing, firewalling, BGP, and network virtualization.
- Experience with storage solutions such as Ceph, NFS or similar technologies.
- Familiarity with IaaS orchestration platforms such as CloudStack or similar.
- Experience implementing and maintaining monitoring and observability tools.
- Experience with Infrastructure as Code and automation using Ansible.
- Experience designing or maintaining CI/CD pipelines.
- Knowledge of databases such as MySQL, MariaDB or PostgreSQL.
- Strong understanding of ITIL processes for incident, problem, and change management.
- Strong documentation practices and focus on operational excellence.
Key skills and expected impact:
- Strong analytical mindset focused on reliability, scalability, and continuous improvement.
- Ability to monitor systems, detect risks proactively, and minimize downtime.
- Capability to lead incident response and ensure effective resolution.
- Strong communication skills in Spanish and intermediate English.
- Ability to collaborate across teams and contribute to infrastructure and operational improvements.
- Experience optimizing distributed systems performance.
- Knowledge of advanced security and system hardening practices.
- Ability to improve operational workflows and work with ticketing systems.
Tools
- Operating Systems: Linux, Windows Server.
- Automation: Ansible, scripting (Bash, Python, PowerShell).
- CI/CD: pipeline implementations.
- Monitoring & Observability: Zabbix, Prometheus, Grafana, ELK Stack.
- Storage: Ceph, NFS or similar.
- Orchestration: CloudStack, OpenStack.
- Databases: MySQL, MariaDB, PostgreSQL.
- Collaboration & ITSM: tools aligned with ITIL practices.
Jotelulu builds a self-managed cloud infrastructure platform tailored specifically for small and medium-sized enterprises in the IT sector. By forging strategic alliances with managed service providers and IT integrators, we enhance collaborative revenue generation and provide tools that streamline IT management and automation at scale. Our focus on the Portuguese IT market sets us apart as a dedicated partner to drive growth in this niche.