Bitquery is hiring a

SRE/ Site Reliability Engineer (Middle / Senior)

Full-Time
Remote

Bitquery is an API-first product company dedicated to powering and solving blockchain data problems using ground truth, and on-chain data. Bitquery extracts and presents valuable data via APIs. These APIs are delivering solutions to multiple verticals like Decentralize Finance (DeFi), DEX Arbitrage Analytics, Crypto Surveillance & Forensics across all major blockchains like Bitcoin, Ethereum, EOS, and Tezos.

We are an international company of developers of software for the analysis of decentralized data (40+ chains). Bitquery is a distributed team. Currently, are looking for a full-time SRE engineer to further develop/monitor/support the infrastructure, and automation of various processes. Also, you can be on duty with shift time.


Roles & Responsibilities:

  • Ensuring the smooth operation of software, environments and company services
  • Analyzing and improving the performance and availability of products
  • Identification of bottlenecks in the architecture and in the infrastructure
  • Improvement of system alerting and incident management
  • Improvements of the monitoring systems based on SLI (Prometheus, Icinga, Grafana etc.)
  • Formalization of SLI under the main business requirements
  • Formation of SLO for services and infrastructure in general
  • Minimization of system recovery time (RPO and RTO)
  • Analysis of incidents in the prod environment
  • Capacity management

Requirements

  • 5+ years of work experience implementing, troubleshooting, and supporting infrastructure software and distributed systems
  • Support experience software in Golang, python , Ruby
  • Worked with virtualization and containerization technologies (containerd, docker, k8s) for more than 2 years
  • Set up CI of varying complexity (Jenkins) with CD to different environments
  • Experience in creating and maintaining a fault-tolerant system, with log coverage, monitoring, and alerting
  • Understanding the principle of "infrastructure as code" and the ability to test it (Ansible Terraform)
  • Principles of organizing network security (IPsec, WAF, IPS)
  • Experience with maintenance of blockchain nodes
  • Availability in US timezone is required

Our Tech Stack:

  • Infrastructure: Bare-metal / AWS
  • Databases: Clickhouse / MySQL
  • SCM: git / GitHub
  • Message broker: Kafka
  • Repository: Nexus
  • CI/CD: Jenkins
  • Monitoring: Icinga 2, Grafana, Prometheus, Victoria metrics, ELK
  • Orchestration: k8s, Ansible, Terraform
  • Containers: LXC, Docker
  • Scripting: Python, Golang, Ruby, Groovy
  • OS: Debian/Ubuntu
  • Others: Docker compose, IPSec

Benefits

  • Opportunity to work & collaborate with a truly global team spread across 5 countries
  • Work from anywhere in the world
  • Choose your own work hours
  • Yearly trip with Bitquery team to any remote destination
  • A promise to finish the interview processes within 1-2 weeks

Being a startup we take decisions & move fairly fast, while giving candidates great experience with the interview process. We have a flat hierarchy in the organization where we empower individuals and provide an opportunity to deliver results as per his/her working style. Come and join a great culture and build Bitquery with us.

Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job