Site Reliability Engineer - Hadoop Modernization

TLDR

Contribute to enhancing the availability, scalability, and reliability of data lake environments while developing technical and interpersonal skills in a dynamic team.

You will join a team of highly skilled hadoop engineers who are responsible for delivering Acceldata’s support services in vendor-agnostic environments. As a Site Reliability Engineer, you will actively learn from experienced team members, contributing to improving the availability, scalability, performance, and reliability of our products and our customers' data lake environments.   You will be expected to listen actively to customer concerns and demonstrate empathy in providing solutions. This is an exciting opportunity to grow your technical, business, and interpersonal skills in a dynamic and collaborative environment. What makes you the right fit for this position?
  • 5+ years of experience in working with distributed systems, cloud environments, or data management services is preferred.
  • Good understanding of Hadoop technologies, including HDFS, YARN, and Hive/Impala. Working knowledge of Kafka, NiFi, Ambari, and Cloudera Manager.
  • Experience operating Linux (configure, tune, and troubleshoot both RedHat and Debian-based distributions).
  • Exposure to data security and data engineering principles in Hadoop environments is a plus.
  • Assist in troubleshooting issues across the entire stack – hardware, software, applications, and networks.
  • We’re looking for someone who can:
  • Work alongside experienced SREs to improve the availability, scalability, and reliability of enterprise production services, both for Acceldata products and customers’ data lake environments.
  • Support implementation and troubleshooting for Hadoop data lake clusters under the guidance of senior engineers.
  • Learn to implement, stabilize, and tune Hadoop ecosystems in vendor-agnostic environments.
  • Provide technical support for Hadoop Data Lake clusters across multiple data centers.
  • Assist in troubleshooting issues across the entire stack – hardware, software, applications, and networks.
  •  
    This role requires flexibility to work in rotational shifts, based on team coverage needs and customer demand.
    Candidates should be comfortable supporting operations in a 24/7 environment and be willing to adjust their working hours accordingly.

    Acceldata builds a cutting-edge platform for Enterprise Data Observability, designed to empower data teams to monitor and manage their data systems effectively. Our solutions cater to global businesses that rely on mission-critical data capabilities, ensuring they can confidently operate and optimize their data products in any environment.

    View all jobs
    Ace your job interview

    Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

    Site Reliability Engineer Q&A's
    Report this job
    Apply for this job