Senior Site Reliability Engineer / Senior DevOps Engineer (ConnectWise)

AI overview

Contribute to the reliability and performance of cloud services by managing Elasticsearch infrastructure and collaborating on system monitoring for large-scale deployments.
  • Build systems and infrastructure for monitoring complex, large-scale distributed systems 
  • Identify stability and performance issues, and collaborate with developers to triage critical issues in production systems 
  • Represent the SRE organization in design reviews and operational readiness exercises for new and existing services 
  • Devise ways to actively monitor system throughput, capacity, and reliability 
  • Debug complex systems and evolve a running environment without causing downtime 
  • Engage in service capacity planning and demand forecasting, as well as software performance analysis and system tuning 
  • Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization 
  • Monitor and troubleshoot Elasticsearch performance issues and outages 
  • Fundamental knowledge of technologies across a broad range of disciplines, including virtualization, storage, networking, server, and security 
  • Bachelor’s degree in computer science or equivalent work experience as a System Administrator with programming skills 
  • Understanding of systems and application design, including the operational trade-offs of various designs 
  • Experience with monitoring and logging solutions such as Prometheus, Grafana, and ELK stack 
  • Proficiency in scripting languages such as Python 
  • Experience with infrastructure-as-code tools, such as Terraform or CloudFormation 
  • Strong understanding of Linux system administration and networking concepts 
  • Demonstrable knowledge of Unix, TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures 
  • Experience in analyzing logs and troubleshooting large-scale, distributed systems 

WOULD BE A PLUS 

  • Experience with instrumenting and monitoring production systems using tools such as ELK stack, Zabbix, Nagios, Statsd/Graphite, APM, etc 
  • Experience with Amazon AWS Infrastructure (including EC2, S3, VPC, Security Groups, RDS) and related services is desirable 
  • Practical knowledge  of Docker, Vagrant, and configuration management tools like Ansible, Chef, or Puppet 
  • Experience with one or more general-purpose programming or scripting languages, including but not limited to Python, Bash, Perl, or Go 

PERSONAL PROFILE

  • Excellent troubleshooting and problem-solving skills 
  • Ability to work independently and collaboratively in a fast-paced environment 
  • Strong communication and interpersonal skills 
  • Excellent organizational, time management, and communication skills 

Build stunning career with Sigma Software! Find your dream job, send your CV and become one of us!

View all jobs
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Senior Site Reliability Engineer Q&A's
Report this job
Apply for this job