We are seeking a skilled and motivated Linux Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in Linux system administration, automation, and cloud infrastructure, with a passion for building reliable and scalable systems. You will collaborate with development and operations teams to ensure our services are highly available, performant, and fault-tolerant.
-
Onboarding of New Customers: Ensure smooth deployment and operational readiness, document processes and provide initial support during the transition.
-
System Administration: Manage, monitor, and optimize Linux servers in production and development environments. Identify and resolve bottlenecks in application and system performance.
-
Automation: Develop and maintain infrastructure automation using tools like Ansible, Terraform, or similar. Creation and Maintenance of Hardening and Washing Script (Ansible).
-
Performance Optimization: Diagnose and resolve performance bottlenecks at the OS, application, and network levels. Analyze system demands and plan for scaling.
-
Incident Management: Lead efforts to quickly resolve production incidents, conduct post-mortems, and implement solutions to prevent future occurrences.
-
Scalability: Work on infrastructure scalability and reliability for high-traffic services.
-
Collaboration: Partner with development teams to create CI/CD pipelines and integrate reliability practices into the development lifecycle.
Coordinate changes with Operation Teams.
-
Security: Ensure system security through best practices in access control, patch management, and system hardening.
-
Operating Systems: Extensive experience with Linux distributions like RHEL, CentOS, or Ubuntu
-
Scripting: Proficiency in scripting languages like Bash, Python, or Ruby for automation
-
Cloud Expertise: Familiarity with cloud platforms like AWS, Azure or GCP and containerization technologies like Docker or Kubernetes
-
Infrastructure as Code (IaC): Hands-on experience with tools such as Terraform, Ansible, or Chef
-
Networking: Solid understanding of networking protocols, DNS, load balancers, and firewalls
-
Version Control: Experience with Git or similar version control systems
-
Web Servers & Middleware: Good skills in configuring and managing Apache, Tomcat, JBoss and NGINX for production environments
-
Problem-Solving: Strong troubleshooting and debugging skills
-
Communication: Strong communication and teamwork abilities for cross-functional work. At least intermediate English language knowledge
-
Mindset: A mindset for optimizing and enhancing systems iteratively
Nice to have/preferred skills and experience
- Exposure to high-availability architectures and disaster recovery strategies
- Certifications: RHCE, AWS Certified SysOps Administrator, or equivalent
- Knowledge of monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or Datadog
- Experience with Websphere
- German language knowledge
* Please be informed that our remote working possibility is only available within Hungary due to European taxation regulation.