Debrecen, Hungary

Full-Time

We are seeking a skilled and motivated Linux Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in Linux system administration, automation, and cloud infrastructure, with a passion for building reliable and scalable systems. You will collaborate with development and operations teams to ensure our services are highly available, performant, and fault-tolerant.

Onboarding of New Customers: Ensure smooth deployment and operational readiness, document processes and provide initial support during the transition.
System Administration: Manage, monitor, and optimize Linux servers in production and development environments. Identify and resolve bottlenecks in application and system performance.
Automation: Develop and maintain infrastructure automation using tools like Ansible, Terraform, or similar. Creation and Maintenance of Hardening and Washing Script (Ansible).
Performance Optimization: Diagnose and resolve performance bottlenecks at the OS, application, and network levels. Analyze system demands and plan for scaling.
Incident Management: Lead efforts to quickly resolve production incidents, conduct post-mortems, and implement solutions to prevent future occurrences.
Scalability: Work on infrastructure scalability and reliability for high-traffic services.
Collaboration: Partner with development teams to create CI/CD pipelines and integrate reliability practices into the development lifecycle.
Coordinate changes with Operation Teams.
Security: Ensure system security through best practices in access control, patch management, and system hardening.

Operating Systems: Extensive experience with Linux distributions like RHEL, CentOS, or Ubuntu
Scripting: Proficiency in scripting languages like Bash, Python, or Ruby for automation
Cloud Expertise: Familiarity with cloud platforms like AWS, Azure or GCP and containerization technologies like Docker or Kubernetes
Infrastructure as Code (IaC): Hands-on experience with tools such as Terraform, Ansible, or Chef
Networking: Solid understanding of networking protocols, DNS, load balancers, and firewalls
Version Control: Experience with Git or similar version control systems
Web Servers & Middleware: Good skills in configuring and managing Apache, Tomcat, JBoss and NGINX for production environments
Problem-Solving: Strong troubleshooting and debugging skills
Communication: Strong communication and teamwork abilities for cross-functional work. At least intermediate English language knowledge
Mindset: A mindset for optimizing and enhancing systems iteratively

Nice to have/preferred skills and experience

Exposure to high-availability architectures and disaster recovery strategies
Certifications: RHCE, AWS Certified SysOps Administrator, or equivalent
Knowledge of monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or Datadog
Experience with Websphere
German language knowledge

* Please be informed that our remote working possibility is only available within Hungary due to European taxation regulation.

Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's

Report this job

Deutsche Telekom IT Solutions is hiring a

Linux Site Reliability Engineer (REF3649Z)