Senior Site Reliability Engineer

Yerevan , Armenia
full-time

AI overview

Join our DevOps team to ensure the reliability and scalability of applications and systems while implementing efficient monitoring, alerting, and automation solutions.

OneMarketData is continuously searching for bright talent with the skills to make an impact. From developers to data scientists, at OneTick you will have the opportunity to develop and enhance your problem-solving skills using a combination of analytics, imagination, and talent.

 

Overview

Our DevOps team develops the infrastructure behind the hosted solutions and our software and data delivery lifecycle. 


Prior to advancing with your application, we kindly request that you review the CONSENT NOTICE FOR HR AND RECRUITING provided by OneMarketData. Your attention to this matter is greatly appreciated.

 

Our stack:

  • AWS (some of the services we use are: EKS, EC2, S3, SGW, ASG, ELB, Lambda, etc.);
  • Terraform and Ansible as an IaC approach;
  • Gitlab and Gitlab CI/CD;
  • Python is the main programming language for automation;
  • Kubernetes (mostly EKS, but GKE and other Kubernetes engines are also being used) for Orchestration and Helm for its management.
  • Prometheus/Victoria Metrics, Grafana, Loki, AWS CloudWatch, and CloudTrail for monitoring, logging, and some statistics collection;
  • OneTick (our platform for market data);

Some other tools for different purposes - i.e., Packer, HashiCorp Vault, OpenVPN, Slack, Confluence, and other popular and well-known tools:)


More information about the projects

In the Cloud Project, we have a multi-account AWS infrastructure managed by the AWS organization. Separate AWS accounts are necessary to host customer-facing environments. We have been providing our customers with different setups for our application. In general, we use most of all common AWS resources like EC2, EKS, S3, VPC, ELB, etc, but also the stack of AWS resources is pretty comprehensive. Most of our AWS infrastructure is covered by IaC. CI/CD is running on GitLab.

We have more than 4 petabytes of data in S3 and EFS. We expose part of the data in S3 to the file system using Storage Gateways. Currently, we are migrating from setup on EC2 instances to Kubernetes, integrating centralized logging and monitoring solutions, migrating data loading processes to Airflow, and optimizing infrastructure costs planning to improve performance at the same time.


We are looking for an experienced Site Reliability Engineer (SRE) to join our team. Your primary responsibility will be to guarantee the reliability, scalability, and performance of our applications
and systems. Working closely with both our software engineers and product teams, you will dive deep into troubleshooting production issues, ensuring seamless operation. Additionally, you will collaborate on designing and implementing solutions to enhance our monitoring and alerting systems, aiming to optimize our overall efficiency and reliability. Your expertise in automation will play a crucial role in reducing manual toil and streamlining processes, ultimately contributing to the success of our operations.


Responsibilities:


  • Monitor and maintain the health and reliability of our production systems
  • Investigate and resolve production issues and outages
  • Develop and maintain monitoring, alerting, and incident response systems
  • Design and implement automation to reduce manual toil and improve system reliability
  • Collaborate with software engineers to design and implement highly scalable and resilient systems
  • Participate in on-call rotation and respond to incidents promptly
  • Continuously improve our systems and processes to ensure the highest level of reliability and availability
  • Document processes and procedures for maintaining and troubleshooting production systems


Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field
  • 3+ years of experience as a Site Reliability Engineer or related role
  • Strong knowledge of Linux/Unix systems and administration
  • Proficiency in at least one programming language (e.g., Python, Java, C++)
  • Experience with automation and configuration management tools (e.g., Ansible, Terraform)
  • Experience with AWS and Kubernetes 


General requirements:

  • English - Upper-Intermediate or higher.
  • Good communicative skills, being able to explain complicated things in simple words.
  • Being eager to learn new technologies (including area-specific).
  • Strong analytical and problem-solving skills
  • Attentiveness, hard-working and goal-oriented mindset (to have the tasks done), and opportunity to work both in the team and independently.
  • Be prepared to explore further and gain a comprehensive understanding of the product, ready to delve deeply into its functionality, because it is closely connected to how things work. 


As the main benefits, we have no bureaucracy, time tracking, and flexible hours. The main goal is to make employees feel comfortable and express themselves, maximizing their performance by liking what they are working on. All ideas can be realized, and many large companies will use all the work being done.



Equal Employment Opportunity


As an Equal Employment Opportunity (EEO) Employer, OneMarketData prohibits discriminatory employment actions against and treatment of its employees and applicants for employment based on actual or perceived race or color, size (including bone structure, body size, height, shape, and weight), religion or creed, alienage or citizenship status, sex (including pregnancy), national origin, age, sexual orientation, gender identity (one’s internal deeply-held sense of one’s gender which may be the same or different from one’s sex assigned at birth); gender expression (the representation of gender as expressed through, for example, one’s name, choice of pronouns, clothing, haircut, behavior, voice, or body characteristics; gender expression may not conform to traditional gender-based stereotypes assigned to specific gender identities), disability, marital status, relationship and family structure (including domestic partnerships, polyamorous families and individuals, chosen family, platonic co-parents, and multigenerational families), genetic information or predisposing genetic characteristics, military status, domestic violence victim status, arrest or pre-employment conviction record, credit history, unemployment status, caregiver status, salary history, or any other characteristic protected by law.


The position will require a background check, signed NDA, signed contract, and signed GDPR processor passthrough agreement (since we act as a data processor under GDPR). Salaries will be commensurate with experience, education, skillset, and local norms. Kindly note that only shortlisted candidates will be contacted for an interview.


Perks & Benefits Extracted with AI

  • Flexible Work Hours: the main benefits, we have no bureaucracy, time tracking, and flexible hours.
Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Senior Site Reliability Engineer Q&A's
Report this job
Apply for this job