Deutsche Telekom IT Solutions is hiring a

Linux Site Reliability Engineer (REF3649Z)

Debrecen, Hungary
Full-Time

We are seeking a skilled and motivated Linux Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in Linux system administration, automation, and cloud infrastructure, with a passion for building reliable and scalable systems. You will collaborate with development and operations teams to ensure our services are highly available, performant, and fault-tolerant.

  • Onboarding of New Customers: Ensure smooth deployment and operational readiness, document processes and provide initial support during the transition.
  • System Administration: Manage, monitor, and optimize Linux servers in production and development environments. Identify and resolve bottlenecks in application and system performance.
  • Automation: Develop and maintain infrastructure automation using tools like Ansible, Terraform, or similar. Creation and Maintenance of Hardening and Washing Script (Ansible).
  • Performance Optimization: Diagnose and resolve performance bottlenecks at the OS, application, and network levels. Analyze system demands and plan for scaling.
  • Incident Management: Lead efforts to quickly resolve production incidents, conduct post-mortems, and implement solutions to prevent future occurrences.
  • Scalability: Work on infrastructure scalability and reliability for high-traffic services.
  • Collaboration: Partner with development teams to create CI/CD pipelines and integrate reliability practices into the development lifecycle.
    Coordinate changes with Operation Teams.
  • Security: Ensure system security through best practices in access control, patch management, and system hardening.
  • Operating Systems: Extensive experience with Linux distributions like RHEL, CentOS, or Ubuntu
  • Scripting: Proficiency in scripting languages like Bash, Python, or Ruby for automation 
  • Cloud Expertise: Familiarity with cloud platforms like AWS, Azure or GCP and containerization technologies like Docker or Kubernetes
  • Infrastructure as Code (IaC): Hands-on experience with tools such as Terraform, Ansible, or Chef
  • Networking: Solid understanding of networking protocols, DNS, load balancers, and firewalls
  • Version Control: Experience with Git or similar version control systems
  • Web Servers & Middleware: Good skills in configuring and managing Apache, Tomcat, JBoss and NGINX for production environments
  • Problem-Solving: Strong troubleshooting and debugging skills
  • Communication: Strong communication and teamwork abilities for cross-functional work. At least intermediate English language knowledge
  • Mindset: A mindset for optimizing and enhancing systems iteratively

Nice to have/preferred skills and experience

  • Exposure to high-availability architectures and disaster recovery strategies 
  • Certifications: RHCE, AWS Certified SysOps Administrator, or equivalent
  • Knowledge of monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or Datadog
  • Experience with Websphere
  • German language knowledge

* Please be informed that our remote working possibility is only available within Hungary due to European taxation regulation.

Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job