Lead Reliability Engineer

  • Sky
  • Nov 23, 2022
Full time I.T. & Communications

Job Description

We believe in better. And we make it happen.

Better content. Better products. And better careers.

Working in Tech, Product or Data at Sky is about building the next and the new. From broadband to broadcast, streaming to mobile, SkyQ to Sky Glass, we never stand still. We optimise and innovate.

We turn big ideas into the products, content and services millions of people love.

And we do it all right here at Sky.

What you'll do

  • Design and build automated deployment, management and operation of content delivery systems and associated support services, running on a mixture of cloud and bare-metal, used to deliver video content to our subscribers across Europe at web-scale concurrencies.
  • Commission bare-metal from multiple vendors and VMs on a range of hypervisors using bootstrap automation technologies such as IPMI/iLO/DRAC, pxe, RHEL/Centos kickstart / cloudinit.
  • Build CI/CD deployment pipelines with Jenkins and champion an "automation first" rollout and maintenance strategy.
  • Manage configuration, develop templates and automate fleet upgrades of production operating systems and applications using Ansible with associated secrets management.
  • Develop a Systems Assurance framework with the team, to enable qualification of physical server and cloud based resources for use at scale, using automated FT/NFT where possible.
  • Ensure effective monitoring, logging and ticketing interfaces of multiple platforms, on and off server. Design, deploy & manage basic/advanced log processing systems, from syslog, ELK stacks to Clickhouse and beyond.

What you'll bring

  • Specialism as a sysadmin, ideally with RHEL/Centos distributions, with experience focused on, for example: performance tuning, iptables, pinned repo's, security hardening.
  • Production familiarity with a range of virtualisation/container technologies using e.g., Terraform, Docker, LXC, Xen, LVM, VMware or Openstack with their Cloud equivalents at GC, AWS or Azure.
  • Ability to network and secure bare-metal/VM/container systems, ensuring isolation, performance of applications and security considerations.
  • Knowledge and practical understanding of TCP/IP, including IPv4, IPv6, DNS, DHCP and HTTP.
  • Experience of production-scale deployment and operation of open-source applications such as, Apache Traffic Server, Envoy, Squid, Varnish, HAProxy, nginx.
  • A flair for producing clear documentation and diagrams for presentation and the ability to manage configuration, shell-scripts and markdown using git.

The rewards

There's one thing people can't stop talking about when it comes to : the perks. Here's a taster:

  • Sky Q, for the TV you love all in one place
  • The magic of Sky Glass at an exclusive rate
  • A generous pension package
  • Private healthcare
  • Discounted mobile and broadband
  • A wide range of Sky VIP rewards and experiences

How you'll work - hybrid working

The world has changed. And so have we. We've embraced hybrid working and split our time between unique office spaces and the convenience of working from home.

You'll find out more about what hybrid working looks like for your role later in the recruitment process.

Your office space

Brick Lane

1 Brick Lane is in the heart of the East End of London. It's part of a vibrant and diverse community; close to street food, cafes and shops.

The closest tube station is Aldgate East and Liverpool Street is about a 10-minute walk.

Outro

Inventive, forward-thinking minds come together to work in Tech, Product and Data at Sky. It's a place where you can explore what if, how far, and what next.

But better doesn't stop at what we do, it's how we do it, too. We embrace each other's differences. We support our community and contribute to a sustainable future for our business and the planet.

If you believe in better, we'll back you all the way.