We are looking for a Site Reliability Engineers Team Lead (SRE TL) to join the team that develops and supports a few big trading platforms.
We expect the Site Reliability Engineer Team Lead (SRE TL) to:
- Lead and manage a team of 2-5 Site Reliability Engineers, providing guidance, mentorship, and support to ensure the team’s success.
- Take ownership of, standardize, and monitor our SRE capability practices, ensuring that SRE engineers effectively implement and operate these practices.
- Leverage your strong background in cloud distributed computing and reliability systems architecture to enhance the reliability and resilience of our systems.
- Work closely with developers for prototyping, and designing new features as part of the infrastructure,
- Deploy, install, configure, and maintain sophisticated Trading/Finance and related software,
- Configure bare metal & сloud instances by using Infrastructure as Code,
- Make key decisions for scalability, reliability, and accessibility,
- Install and manage in-house developed and external well-known monitoring systems,
- Design, deploy, and configure cloud-based servers and networks provision servers and storage, configure firewalls, VPN, monitoring, etc.,
- Administrate UNIX/Cloud infrastructure – installation, configuration and maintenance,
- Work with the Nexus and GIT repositories.
Must-have skills:
- Excellent communication and collaboration skills to work effectively with cross-functional teams and delivery squads.
- Minimum of 2 years of experience leading a Site Reliability Engineering (SRE) or DevOps team
- Experience with support of JVM application (garbage collection, memory leaks),
- Strong experience with OS-level administration on Linux and/or UNIX,
- Hands-on scripting experience with Bash, Python, and/or Groovy,
- Experience with configuring TeamCity CI/CD pipelines,
- IAAS solutions using Ansible, Terraform,
- Experience with Docker containers orchestrating (K8S/OpenShift/Hashicorp),
- Know how to read and analyze errors,
- In-depth knowledge of TCP/IP and ISO/OSI stack,
- Experience with monitoring and logging tools (Zabbix, Elasticsearch or Opensearch, Grafana, Kibana, etc),
- Experience in working with Apache, Nginx, HAproxy, Envoy, etc,
- Strong ability to solve problems using code and scripting,
- English level not lower than B2.
Nice-to-have skills:
- Experience with SQL-like command language,
- Experience with Ansible (AWX),
- Knowledge of Java programming language,
- Experience with trading/exchange/risk management software usage,
- Experience with Atlassian software (JIRA, Confluence, FishEye, etc.).
Care for the employees is one of Devexperts' core values. For the suggested position, we offer a benefits package that will guarantee the comfort of our new teammate.
Flexibility benefits:
- Possibility of hybrid/remote work mode,
- Flexible working hours.
Health and recreation benefits:
- 20 days of paid vacation,
- 5 days of fully paid additional wellness days,
- Medical insurance – premium package,
- Free MultiSport card.
Facility benefits:
- Modern office with new equipment,
- Panoramic view of Vitosha mountain,
- PlayStation, Billiard, Relax zone and Gym,
- Parking space/public transport card,
- Free drinks and snacks.
Community benefits:
- Teambuilding activities,
- Corporate parties,
- Football club,
- Speakers' club,
- Free admission to corporate external events,
- Possibility of joining conferences and professional fairs.
Professional training benefits:
- English language courses,
- Local language courses for foreign employees,
- Unlimited access to self-learning platforms,
- Certification opportunities,
- Mentorship Program.
Social benefits:
- Referral bonuses for specific roles,
- Paid leave upon special events.