We are looking for a Site Reliability Engineers Team Lead (SRE TL) to join the team that develops and supports a few big trading platforms.
We expect the Site Reliability Engineer Team Lead (SRE TL) to:
- Lead and manage a team of 2-5 Site Reliability Engineers, providing guidance, mentorship, and support to ensure the team’s success.
- Take ownership of, standardize, and monitor our SRE capability practices, ensuring that SRE engineers effectively implement and operate these practices.
- Leverage your strong background in cloud distributed computing and reliability systems architecture to enhance the reliability and resilience of our systems.
- Work closely with developers for prototyping, and designing new features as part of the infrastructure,
- Deploy, install, configure, and maintain sophisticated Trading/Finance and related software,
- Configure bare metal & сloud instances by using Infrastructure as Code,
- Make key decisions for scalability, reliability, and accessibility,
- Install and manage in-house developed and external well-known monitoring systems,
- Design, deploy, and configure cloud-based servers and networks provision servers and storage, configure firewalls, VPN, monitoring, etc.,
- Administrate UNIX/Cloud infrastructure – installation, configuration and maintenance,
- Work with the Nexus and GIT repositories.
Must-have skills:
- Excellent communication and collaboration skills to work effectively with cross-functional teams and delivery squads.
- Minimum of 2 years of experience leading a Site Reliability Engineering (SRE) or DevOps team
- Experience with support of JVM application (garbage collection, memory leaks),
- Strong experience with OS-level administration on Linux and/or UNIX,
- Hands-on scripting experience with Bash, Python, and/or Groovy,
- Experience with configuring TeamCity CI/CD pipelines,
- IAAS solutions using Ansible, Terraform,
- Experience with Docker containers orchestrating (K8S/OpenShift/Hashicorp),
- Know how to read and analyze errors,
- In-depth knowledge of TCP/IP and ISO/OSI stack,
- Experience with monitoring and logging tools (Zabbix, Elasticsearch or Opensearch, Grafana, Kibana, etc),
- Experience in working with Apache, Nginx, HAproxy, Envoy, etc,
- Strong ability to solve problems using code and scripting,
- English level not lower than B2.
Nice-to-have skills:
- Experience with SQL-like command language,
- Experience with Ansible (AWX),
- Knowledge of Java programming language,
- Experience with trading/exchange/risk management software usage,
- Experience with Atlassian software (JIRA, Confluence, FishEye, etc.).
- Paid vacation 20 + 5 days
- Free MultiSport card
- Medical insurance – premium package
- Мodern office space
- Panoramic view of Vitosha mountain
- Gym & billiard in the office
- Parking spot or public transport card
- Mentorship program
- Training, courses, workshops
- Paid pro certifications
- Subscriptions to pro sources
- Participation in conferences
- English courses
- Trading contest within the company
- Tech meetup dxTechTalk
- Speaker's club
- Opportunity to develop your personal brand as a speaker
- Internal referral program
- Remote work / Hybrid mode
- Flexible schedule
- Work & Travel program
- Relocation opportunities