WCC is hiring a

Site Reliability Engineer

Kuala Lumpur, Malaysia

Join us in providing work that matters


WCC has changed lives since 1996. We are a group of highly ambitious professionals who believe in the greater story. WCC is more than just a software organization, we are a community that strives for improving human life. We provide software that matters.

Our product is an advanced Search and Match engine used in solutions for the private and public sector.

We specialize in:

  • ID & Security Solutions - WCC enables governments to manage large volumes of Identity and Security data. Protecting borders and citizens while providing legal identity for all
  • Employment Solutions - WCC enables Public and Private Employment Services to match people quickly and expertly with suitable and sustainable jobs

What’s in it For You?


Our team - Our people believe unity is one of our strengths. So, if teamwork is important for you, we trust you will enjoy working in a team where people feel welcome, valued, and respected.

Work environment - We focus on talent and possibilities, not limitations. We love challenges and exploring new creative horizons. WCC has a diverse environment that gives every person the freedom to express their ideas.

We want to give you the conditions to do your best work, so here are the Perks and Benefits we provide:

  • competitive salary
  • Indefinite contract
  • Health insurance
  • Travel allowance
  • 21 vacation days
  • 13th salary
  • personal development opportunities
  • hybrid working from home / working from the office policy
  • Home office budget
  • An opportunity to create an international and diverse network.



Role

As a Site Reliability Engineer you have a unique role in our organization. You play an important role in the dynamics of software development, additional operations experience, sysadmin and IT operations. As site Reliability engineer you support our product owners and DevOps team to determine which new features can be launched and when by using service-level agreements (SLAs) to define the required reliability of the system through service-level indicators (SLI) and service-level objectives (SLO).


Responsibilities

  • Ensure the availability and efficient working of the services in compliance with the non-functional expectations
  • Plan and implement continuous improvements and changes in the ecosystem through automation
  • Handle service interruptions towards resolution within the defined SLAs with a mindset of continuous improvement
  • React to events (monitor alerts, support escalation issues, internal incidents), i.e. incidents that hit the application or the underlying infrastructure. Troubleshoot and resolve the service interruption (either hands-on or by guiding 3rd party for incident resolution actions with clear instructions)
  • Provide information for root cause analysis and/or conduct postmortem and provide reports
  • Provide recommendations/workarounds for identified problems
  • Liaise and act with others (Vendors, internal teams) for incident and problem management.
  • Provide and implement improvements in proactive actions: extend monitoring, tune alerting and alert thresholds, increase observability of the services and log management
  • Documentation: Create documentation tuned for the intended audience, including runbooks, Knowledge Base articles, how-to articles
  • Communication: Communicate with different stakeholders and vendors on technical level. Able to translate the impact of technical issues and concept to non-technical users for impact assessment.
  • Increase observability and manageability by:
  • Building and configuring logging, monitoring, and alerting
  • Providing information about what needs to be monitored, how, and the recommended thresholds
  • Participate in tuning and extending the monitoring implementation
  • Provide the mechanisms and preparation for possible system failures and outages and increase the robustness of the system
  • Participate in performance and capacity planning
  • Standby/on-call roster participation



Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Site Reliability Engineer Q&A's
Report this job
Apply for this job