Lead Site Reliability Engineer

  • ARCUS SEARCH LIMITED
  • Dec 01, 2022
Full time Engineering

Job Description

Location: London - Hybrid

Type: Full-time/Permanent

A Data FinTech client of ours are looking for a Lead Site Reliability Engineer to join the existing team and work on exciting, new technology including kubernetes as well as gaining exposure to data systems and supporting the company in building out their brand-new data platform.

What you will be doing:

  • You will design, operate and support the infrastructure, middleware and internal services, while seeking to improve their performance, availability, scalability, latency and efficiency
  • You will be driving technical excellence across the business, following SRE best-practices
  • You will be working alongside development teams to develop and design scalable and high available services and establish effective build framework for continuous deployment and self-service automation
  • You will also work on incident resolution and engage with various teams (including 3rd parties) for support escalation.

Experience you need:

  • You need to be strong in Amazon AWS Cloud, including services such as: EC2, S3, ELB, RDS, IAM, Route 53, Auto Scaling Groups, Lambda, Cloud Watch, Cloud Formation and Security Groups
  • Having expertise with containerisation within Kubernetes and Docker and a familiarity with the pattern of Microservice Architecture will be needed. You'll also need to be able to define container configuration and troubleshoot
  • You'll need to be experienced with configuration management technologies including Terraform and Ansible, as well as associated paradigms such as IaC and Immutable Infrastructure
  • CI/CD - You need to be comfortable with build pipelines in e.g. TeamCity/ Jenkins/ Concourse
  • You must have hands-on experience developing in one or more programming or scripting languages (e.g. PowerShell, Bash, Python, JavaScript, Golang, Java), within an SCM environment (e.g. Bitbucket, GitHub).
  • Networking - must have knowledge of routing & switching protocols as well as DNS, firewalling, load-balancing and global traffic management.
  • Persistence technologies - you need to be familiar with database technologies (NoSQL/SQL) and broker/ queuing technologies, including knowledge of HA/ clustering.
  • You need to be Familiar with various logging, monitoring and alerting platforms - expertise in the usage (and, desirably, the deployment) of e.g. ELK, Splunk, CloudWatch, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/ SLO) and capacity planning
  • Linux & Windows systems administration in multiple distributions, including storage management (e.g. LVM, RAID) and security practices e.g. SSH, SSL/TLS, HMAC, IPS/IDS
  • 3-4 years experience in a similar role is required
  • Experience working within a FinTech company is desirable

This is an exciting time within this company as they are embarking on a huge growth period across the entire business, particularly within the Data and Analytics function to support the development of their brand-new Data Platform.