Responsibilities
This is a key role that should have the engineering knowledge, production experience and hands-on implementation ability. You will contribute in areas such as:
– Ensure the highest levels of our system performance, availability and scalability.
– Work closely with the development team to integrate new deployment processes and strategies.
– Seek out problems and opportunities in devops enablers infrastructure areas and solve them.
● Help develop and maintain a state of the art platform as a service solution, using the latest and greatest technologies and approaches (e.g. Kubernetes, Docker, Microservices, etc.)
● Help develop the best possible continuous delivery pipelines supporting features like an automated promotion to production, automated canary releasing or blue-green deployments.
● Implement monitoring and logging solutions that enable the production systems to be monitored 24/7.
● Respond to requests from engineering by building self-service solutions
● Make sure that any tech solution that you put in place is robust, will scale and that failover/BCP systems are in place.
● Implement robust security measures for infrastructure, including monitoring and responding to attacks on our systems.
● Able to guide other SRE members on large, complex projects
● Work collaboratively with the engineering team, give technical solutions or accept challenges from them.
Requirements
● Strong computer engineering foundation from work and related academic degrees.
● +5 years of experience from IT Operations and infrastructure and system engineering.
● Must have experienced in maintaining Data Center and Manage large Network
● Hands-on experience with containerisation and container orchestration (e.g. Docker and Kubernetes)
● Must have experience in Linux System Administration, performance tuning in RHEL/CentOS/Debian/Ubuntu distribution.
● Must have experience in Load balancer, Cluster and failover technologies.
● Must have experience in CD, CI and configuration management tools such as Jenkins, Gitlab, Ansible.
● Must have experience in Scripting skills in Bash.
● Must have experience in configuring service discovery, cloud and non-cloud based monitoring tools (Consul, Nagios/icinga, Cacti, Stackdriver, Newrelic).
● Experienced in design failover cluster using Nginx, Varnish, MongoDB, Postgres, TimescaleDB, Redis, Kafka and ElasticSearch will be a plus
● Hands on experience with AWS or GCP will be a plus ● Strong team player with the capability to learn, communicate and guide other members on new technology.