We are looking for a Site Reliability Engineers Team Lead (SRE TL) to join the team that develops and supports a few big trading platforms.
We expect the Site Reliability Engineer Team Lead (SRE TL) to:
- Lead and manage a team of 2-5 Site Reliability Engineers, providing guidance, mentorship, and support to ensure the team’s success.
- Take ownership of, standardize, and monitor our SRE capability practices, ensuring that SRE engineers effectively implement and operate these practices.
- Leverage your strong background in cloud distributed computing and reliability systems architecture to enhance the reliability and resilience of our systems.
- Work closely with developers for prototyping, and designing new features as part of the infrastructure,
- Deploy, install, configure, and maintain sophisticated Trading/Finance and related software,
- Configure bare metal & сloud instances by using Infrastructure as Code,
- Make key decisions for scalability, reliability, and accessibility,
- Install and manage in-house developed and external well-known monitoring systems,
- Design, deploy, and configure cloud-based servers and networks provision servers and storage, configure firewalls, VPN, monitoring, etc.,
- Administrate UNIX/Cloud infrastructure – installation, configuration and maintenance,
- Work with the Nexus and GIT repositories.
Must-have skills:
- Excellent communication and collaboration skills to work effectively with cross-functional teams and delivery squads.
- Minimum of 2 years of experience leading a Site Reliability Engineering (SRE) or DevOps team
- Experience with support of JVM application (garbage collection, memory leaks),
- Strong experience with OS-level administration on Linux and/or UNIX,
- Hands-on scripting experience with Bash, Python, and/or Groovy,
- Experience with configuring TeamCity CI/CD pipelines,
- IAAS solutions using Ansible, Terraform,
- Experience with Docker containers orchestrating (K8S/OpenShift/Hashicorp),
- Know how to read and analyze errors,
- In-depth knowledge of TCP/IP and ISO/OSI stack,
- Experience with monitoring and logging tools (Zabbix, Elasticsearch or Opensearch, Grafana, Kibana, etc),
- Experience in working with Apache, Nginx, HAproxy, Envoy, etc,
- Strong ability to solve problems using code and scripting,
- English level not lower than B2.
Nice-to-have skills:
- Experience with SQL-like command language,
- Experience with Ansible (AWX),
- Knowledge of Java programming language,
- Experience with trading/exchange/risk management software usage,
- Experience with Atlassian software (JIRA, Confluence, FishEye, etc.).
Care for the employees is one of Devexperts' core values. For the suggested position, we offer a benefits package that will guarantee the comfort of our new teammate.
Work Regime Flexibility benefits:
Health and recreation benefits:
Fully paid additional wellness days (3 unwell days per year),
Medical insurance for the employees and children,
Reimbursement of fitness / Urban Sports Club Membership,
Meal allowance (Coverflex Card),
Flexpay system (Coverflex).
Facility benefits:
Modern office with new equipment,
PlayStation, table football, and musical instruments in the office,
Parking spaces/transport reimbursement,
Free drinks and snacks.
Community benefits:
Teambuilding activities,
Corporate parties,
Football Club,
Music club,
Speakers' club,
Free admission to corporate external events,
Possibility of joining conferences and professional fairs,
Personal branding development support.
Professional training benefits:
English language courses,
Local language courses for foreign employees,
Unlimited access to self-learning platforms,
Certification opportunities,
Mentorship Program.
Social benefits: