Cloud Site Reliability Engineer (SRE)

  • Metro Bank
  • Dec 04, 2021
Full time I.T. & Communications

Job Description

  • Team: IT, IT & Change
  • Location: Holborn Office
  • County: Central London
  • Ref: 14675
  • Closing Date: 17-Dec-2021

Do you have a passion for technology as well as helping team members and stakeholders? Are you an experienced Cloud/ DevOps Engineer? Are you well versed with Site Reliability Engineering? If you've answered yes to these questions then we may have the role for you.

As the Cloud SRE you will be working within the amazing Metro Bank Engineering function, where you will be responsible for driving system automation, reliability and observability across critical Cloud Platforms & Applications, and ensuring they meet the requirements of both internal & external customers. The successful role holder will also be a key enabler for migrating services from on-prem to public Cloud.

Most of our jobs offer the opportunity for hybrid / remote working. Ask your recruiter for more details.

So what would you be doing?...

• Drive system reliability, performance and automation within non-production and production environments

• Resolve complex issues that L1 support are not able to fix and can triage the Service end to end

• Passion for eliminating repetitive manual processes using automation and best practices to run and support stable & secure Cloud platforms.

• Understand the target Cloud architecture for the bank's chosen public Cloud Service Provider(s) and platform(s)

• Understand the bank's business and technical objectives, principles and best practices and help ensure the Cloud solutions are aligned to these

• Work closely with the Cloud architects and engineers ensuring adherence to their architecture standards, technical reference architectures, service design standards and Cloud security architecture, and provide constructive feedback on opportunities to evolve these further

•Incorporate best practices into the running of the target Cloud landing zone, such as using infrastructure as code, re-using scripts and tooling, and being secure by design

• Contribute to the Cloud community within the bank to share knowledge and experience and promote re-use of proven solutions and code

You need to be this kind of person…

• Passionate about providing unparalleled levels of service and convenience for customers

• Able to work and learn quickly in a fast paced, fun and dynamic environment

• Prepared to stick at something - we get nervous if someone has jumped from job to job as we want people who are prepared to learn and grow

• Care about doing a great job and exceeding expectations with the quality of what you do

And... we are a bank so risk is a part of everything we do. We love people who take responsibility, do the right thing for customers, colleagues and Metro Bank and have the courage to call out any concerns.

We always support colleagues to develop their skills. But to be successful in this job you really do need to already be able to do most of these wonderful things...

• Understand the risks associated with your job and what that means for you, Metro Bank and all our stakeholders

• Experience of SDLC tools & processes such as Jira, Confluence, ServiceNow, CI/CD (ie Jenkins/Azure DevOps/GIT)

• Strong experience with automation & infrastructure as code tools like Terraform and Ansible

• Extensive experience of frameworks and technologies such as Kubernetes, Istio, Kong, Docker, Kafka, Mongo

• Good knowledge of Linux (including coding & scripting in languages like Bash, Python, Javascript etc)

• Experience of Monitoring and Logging solutions (like App Dynamics, Splunk, Prometheus, Grafana, Datadog)

• Proven experience of building & supporting solutions for Public Cloud platforms (ie AWS, EC2, EKS, Lamba, APIGW) including IaaS and PaaS services.

• Solid understanding of how to configure Cloud for high scalability and resilience, disaster recovery, tight security and proven compliance, high performance, fault detection and automated resolution, etc.

• Solid understand of Cloud Networking

• Knowledge of security and compliance principles and challenges within the Cloud, and able to build the appropriate solutions

• Very good understanding and experience of agile and DevOps processes and tooling, as well as operational support processes and tooling

• Experience working with architecture, security, networks, operations and other technology teams

• Ability to articulate often complex technical solutions in a simple and coherent manner to the appropriate audience

• Have fantastic communication and stakeholder management skills

• This role is regulated by the Financial Conduct Authority (FCA) under the Senior Managers and Certification Regime. This means that If you are successful in your application, we are required to carry out additional checks that will be repeated annually while you are in this role. For more information you can visit the FCA website or ask your recruiter who can explain further.

IMPORTANT FOOTNOTE;

Diverse teams really are the best teams. We know that candidates (especially women, research tells us) may be put off applying for a job unless they can tick every box. We also know that 'normal' office hours aren't always doable, and while we can't accommodate every flexible working request we are happy to be asked. So if you are excited about working with us and think you can do much of what we are looking for but aren't sure if you are 100% there yet… why not give it a whirl? Please note that sometimes we may close a job earlier for applications if we are inundated with amazing candidates.. Good luck!