Site Reliability Engineer

AI overview

Perform advanced troubleshooting for Azure workloads and IoT technologies while collaborating across internal teams to enhance service design and reliability.

WE'RE HIRING!

At HTG, you’ll push boundaries with the latest tech and collaborate with a team that loves what they do. Be part of a design services company that is amongst the companies that lead the world in technology and innovation.

Your next chapter starts here. 

In this role, you will: 

  • Perform advanced troubleshooting and service recovery for residential and small-business networks, energy systems, and IoT technologies.

  • Support and troubleshoot Azure workloads, cloud integrations, and general IT issues across Windows, Linux, and hybrid environments.

  • Perform detailed incident investigations, document findings, and implement corrective actions. Strengthen the team’s ability to resolve complex technical issues.

  • Communicate clearly with customers and internal teams, providing timely updates, guidance, and expectations throughout the incident lifecycle.

  • Assist L1 analysts through training, coaching, and escalation support to increase first-contact resolution rates.

  • Stay current with developments in EV charging, solar energy, and related emerging technologies to enhance overall technical support.

  • Apply security-first practices in daily operations, including secrets management, patching cycles, baseline image maintenance, identity hygiene, and RBAC reviews.

  • Work with security and compliance teams on vulnerability management, audit documentation, and evaluation of control effectiveness across applicable frameworks.

  • Develop and maintain operational documentation such as SOPs, runbooks, knowledge base articles, escalation procedures, and service catalogs.

  • Ensure documentation accuracy, version control, and comprehensive coverage; treat documentation as a critical operational output.

  • Utilize Jira for incident, problem, and change workflows, including SLAs, dashboards, and reporting.

  • Collaborate with Engineering and DevOps teams to define operational requirements, enhance service design, and prioritize reliability improvements.

  • Provide ongoing mentorship and training opportunities to L1 analysts to support skill growth and improve initial resolution outcomes.

  • Communicate effectively with both internal and external stakeholders regarding incidents, maintenance activities, service enhancements, and post-incident analyses.

Requirements

  • At least 3 years of experience in network operations, site reliability, or cloud platform support roles managing production systems

  • Strong understanding of networking, VPNs, firewalls, load balancers, DNS, and certificate management

  • Hands-on experience with cloud services including compute, storage, networking, and identity management

  • Practical experience with both Linux and Windows systems administration

  • Proficiency in one or more scripting languages such as Python, PowerShell, or Bash, and ability to create dependable automation workflows

  • Familiarity with monitoring, alerting, and telemetry systems, including the design of meaningful service-level indicators.

  • Working knowledge of service management platforms and workflow automation tools.

  • Proven ability to write accurate operational documentation, including procedures and troubleshooting guides

  • Strong communication skills for both technical and customer-facing interactions

 

Preferred Qualifications:

  • Experience with Infrastructure-as-Code tools (e.g., Terraform, Bicep) and CI/CD systems

  • Knowledge of IoT or distributed device management at scale

  • Understanding of system reliability concepts such as graceful degradation and autoscaling

  • Exposure to industrial or energy systems involving telemetry, control, or gateway operations

  • Relevant certifications such as Azure Administrator, Azure Network Engineer, ITIL, or CCNA (or equivalents)

High Tech Genesis Inc. is an Equal Opportunity Employer. Diversity and inclusion are at the core of our values.

Please advise High Tech Genesis of any accommodation measures you may require.

Please be advised:

  1. Applicants must have the legal right to work in Canada.

  2. Kindly submit your resume in MS Word format upon application for this position.

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job