Alchemy is hiring a

Site Reliability Engineer (Bucharest, Romania) - Fulltime

Bucharest, Romania
Full-Time

Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers the powerful APIs, SDKs, and tools necessary to build and scale onchain apps and rollups.

Our infrastructure powers 70% of the top web3 teams, 90%+ of web2 companies building in web3 and 100+ million end users. Our customers include top web3 brands like Polymarket, OpenSea, Circle, WorldCoin, as well as major global brands like Shopify and Adobe.

The Alchemy team draws from decades of deep expertise in massively scalable infrastructure, AI, and blockchain from leadership roles at leading companies and universities like Google, Microsoft, Facebook, Stanford, and MIT.

We're backed by the world's leading VCs and institutions, including: Lightspeed, Silver Lake, a16z, Coatue, Pantera, Addition, Stanford University, Coinbase, and Charles Schwab, among others.

The Role

Site Reliability Engineers excel in converting manual operational tasks into automated processes while building and maintaining tools and infrastructure. As an SRE you should always tackle problems methodically while taking into account systems scalability, high availability, latency, and resilience. With strong experience in operations, networking, infrastructure, software development, observability and troubleshooting, SREs are one of the most versatile roles anyone can grow into.

Responsibilities

  • Design, build, and refactor major software components that improve the availability, resilience, performance and efficiency of our system.
  • Is part of our on-call rotation and responds to our infrastructure incidents in accordance with our policy.
  • Proactively addresses bugs and bottlenecks as part of our infrastructure.
  • Can define and choose the best SLI/SLOs in accordance to our system needs.
  • Is able to choose the best tools for different problems and can adapt to our ever-changing specifications and growth.
  • Addresses issues in our Incident Management process by reducing and fixing noisy alerts, reducing MTTD and MTTR and is able to support other team members on this aspect.
  • Able to identify and address design bottlenecks in our infrastructure.
  • Able to mentor new hires and onboard them to our tools and infrastructure.
  • Able to address code complexity and efficiency issues while constantly addressing software bugs.
  • Able to support and guide other team members with code-related problems and participate in and offer effective code reviews.

What We're Looking For

  • Experience writing efficient code in one or more programming languages (e.g. Python, Golang, Java, Rust).
  • Experience developing software applications and tools from scratch that can be expanded and used by other team members by offering a clear structure, reusable code patterns and guidance.
  • Past experience designing and managing the lifecycle of complex systems while taking into account multiple factors such as costs, systems performance, scalability, resilience and disaster recovery.
  • Expertise in all aspects of operating Linux-based systems with focus on troubleshooting, configuration and monitoring.
  • Past experience managing large scale infrastructures running on Baremetal, Public and Private cloud (e.g AWS, GCP, Azure) and Container-based infrastructure (Kubernetes, Openshift, Docker etc.).
  • Knows the insides of different protocols across the stack such as HTTP, DNS, DHCP, routing protocols, etc.
  • Leverages programming languages and different automation tools to reduce toil and automate repetitive tasks.
  • Past experience with IaaC such as Terraform or Pulumi, and Configuration Management tools (e.g. Ansible, Puppet, Chef).
  • Experience with one or more CI/CD solutions (e.g. Jenkins, ArgoCD, Gitlab pipelines, Spinnaker, Harness) is a must.
  • Experience implementing monitoring and logging solutions for infrastructure and applications.
  • Must have experience with monitoring and logging tools such as Prometheus, Thanos, Splunk, Grafana, Graphite, Loki, etc.
  • Past experience leading a team is a big plus.
  • Has great communication skills and is able to express his ideas to other team members effectively.

Perks

  • Attractive salary package
  • Opportunity to work with the latest cloud and blockchain technologies
  • Fully remote work or hybrid depending on candidate preferences
  • Token allocation similar to equity packages in traditional companies
  • Growth budget, to be spent at the candidate's discretion
  • Equipment stipend
  • Flexible time away
  • Private Medical Insurance
  • Start-up environment: internal off-site hackathons, access to company-rented hacker house during summer
  • Crypto market investment opportunities and guidance

Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job