Plum Fintech is hiring a

Lead Site Reliability Engineer

Athens, Greece

At Plum, we're on a mission to maximise wealth for all. We’re making saving money effortless and turning investing into something everyone can do. Our journey began back in 2017, when we became one of the first to use artificial intelligence and automation to simplify personal finance. Fast forward to today, and we've already helped people save £2 billion across 10 European markets.

Named the UK's fastest-growing fintech in the Deloitte Technology Fast 50, our success is down to the passion and dedication of our diverse team. Based in our London, Athens and Nicosia offices, 170 talented people work together to empower people to do more with their money. And now, the team is growing!

The Role

You will be joining our Infrastructure squad as a Lead Site Reliability Engineer to ensure that Plum’s systems are resilient, secure, scalable, observable and fully capable to support our growth. You will support our Engineering function to use our infrastructure in the most efficient way. You’ll proactively identify areas of improvements and propose initiatives to make the SRE function more streamlined and with reduced overhead.

What you will be doing:

  • Lead the SRE team in their daily work, provide mentoring and growth their skills and career
  • Identify initiatives to improve efficiency, raise the bar of the SRE function, prioritise team’s work, define a strategic vision aligned with company’s goals
  • Be an advocate of costs management (FinOps) and able to propose solutions to optimise our infrastructure
  • Be hands on for daily work and to contribute to initiatives owned by the team
  • Operate and scale our infrastructure (GCP, Kubernetes, PostgreSQL, RabbitMQ, Redis). We have data on the size of TBs that need to be blazing fast
  • Automate aspects of systems using infrastructure management tools of the trade (we use Terraform). Code once, deploy everywhere mindset
  • Ensure our metrics give an accurate picture of how the system is performing (we use Prometheus). Leverage observability in your day-to-day processes
  • Build and maintain SLIs and SLOs for our infrastructure; provides a platform for squads to build their SLIs and SLOs on top of collected metrics
  • Lead incident response and troubleshoot issues, correcting and improving systems to prevent incidents and grow at scale. Take point in handling service degradation
  • Collaborate with our Engineering function to deliver their craft into Plum infrastructure
  • Collaborate with the Principal Engineer to improve the Engineering function’s DevOps posture

For this role, we'd like to see:

  • Working experience of 5+ years as a Site Reliability Engineer, DevOps or of a similar position
  • Working experience of 2+ years leading an SRE squad to success
  • Proficiency in managing cloud infrastructure as IaaC with tools like Terraform
  • Ability to maintain the IaaC codebase in a optimal and efficient way (clear codebase structure, Terraform modules, etc.)
  • Strong expertise in system architecture, networking, database management, administration of Kubernetes clusters
  • Strong expertise in observability (Logging, Monitoring, Tracing)
  • Analytical skills, troubleshooting attitude
  • Proactive approach on problems, able to identify them and propose solutions
  • Passion for continuous improvement and challenging the status quo
  • Excellent communication skills in English (verbal and written)

Good to have

  • Familiarity with RDBMS databases management and migration procedures with zero downtime
  • Having built an SRE team from scratch focusing on efficiency
  • Proven stakeholder management skills and the ability to negotiate priorities with internal teams
  • Experience in Python, ability to navigate large codebases

Plum's Perks

  • We're all in this together! Own part of the company through stock options 💷
  • Annual training budget
  • Private Health & Life Insurance
  • Free Plum Premium subscription (normally £9.99 a month).
  • Free parking slots
  • 25 days holiday a year, excluding public holidays
  • Employee referral scheme up to €4000
  • Flexible approach to remote working, though we encourage at least 2-3 days a week in our beautiful office in central Athens for optimal collaboration.
  • 45 days work from anywhere
  • Team breakfast on Tuesdays and team lunch on Thursdays in the office, as well as a plentiful supply of fruit, snacks and coffee.
  • 1 day paid leave for volunteering, supporting you giving back to society.
  • 2 weeks paid sabbatical after four years of service.
  • Team trip to secret destinations once a year ✈️
  • Great office location in the heart of Athens (Syntagma square), with an amazing view!
  • A vibe that’s 🦄🌈💯

If you think this sounds like a bit of you then don’t hesitate to get in touch!

Thanks,

Plum Τeam 💜

* Plum is an Equal Opportunity Employer. Plum does not discriminate on the basis of age, race, religion, sex, gender identity, sexual orientation, non-disqualifying physical or mental disability, national origin or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit and business need.

Apply for this job

Please mention you found this job on AI Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Lead Site Reliability Engineer Q&A's
Report this job
Apply for this job