• Home
  • Search Jobs
  • Register CV
  • Post a Job
  • Employer Pricing
  • Contact Us
  • Sign in
  • Sign up
  • Home
  • Search Jobs
  • Register CV
  • Post a Job
  • Employer Pricing
  • Contact Us
Sorry, that job is no longer available. Here are some results that may be similar to the job you were looking for.

10 jobs found

Email me jobs like this
Refine Search
Current Search
infrastructure engineer sre aws iac
Twinstream Limited
Site Reliability Engineer
Twinstream Limited Hereford, Herefordshire
Site Reliability Engineer Hybrid (Near Hereford) £80,000 - £110,000 DOE + Clearance Join a team that builds technology where it matters most. In 2019, TwinStream was founded by a group of engineers with deep experience solving complex, cross-domain problems within government organisations. Today, we continue that mission delivering cutting-edge solutions with a focus on technical excellence, reliability, and exceptional service. We work in hybrid teams: some of us work closely with clients onsite, while others operate remotely. What unites us is a commitment to innovation, collaboration, and a job well done. The Role: Site Reliability Engineer We re growing and so is demand for the secure, high-performance systems we deliver to government clients. As a Site Reliability Engineer (SRE), you ll play a key role in ensuring that these critical services are always available, resilient, and cost-effective. You ll work closely with development and support teams to evolve infrastructure, optimise delivery pipelines, and proactively detect and resolve reliability risks whether in the cloud or on-premise. This is more than just operations. It s a chance to shape the future of secure, high-impact systems at scale. Why You ll Love Working Here We believe in supporting our people personally and professionally. Here's what we offer: 8% Pension Contribution Private Medical (incl. Dental & Optical) £1,000 Annual Training Budget 25 Days Holiday + Bank Holidays Flexible & Hybrid Working EV Leasing via Salary Sacrifice Team Events & Celebrations Life Assurance & Cycle-to-Work Scheme Key Responsibilities of the Site Reliability Engineer: Partner with developers to improve performance and reliability across systems Automate toil and reduce unnecessary alerts with smart tooling Evolve observability so we can prevent issues before they become incidents Improve CI/CD pipelines and support development teams in delivering quality faster Explore new technologies, tools, and services that improve how we build Drive cost and performance improvements in real systems used by government clients Your Skills & Experience: We re looking for engineers with strong foundational knowledge and a passion for infrastructure and automation: Essential: Proficiency with Ansible (Chef or similar is a plus) Experience with Terraform and modern IaC practices Hands-on with Docker and orchestration tools (Kubernetes, OpenShift, or Docker Swarm) CI/CD experience (Jenkins or equivalent) Monitoring/observability tools: Grafana, Prometheus, or InfluxDB Event-driven messaging: RabbitMQ or similar Strong Linux skills, scripting, and understanding of network security protocols Experience with AWS: EC2, S3, RDS, Lambda Security Requirements Due to the sensitive nature of our work, candidates must be eligible for Developed Vetting (DV) clearance. All offers are subject to security screening. Ready to Engineer Systems That Matter? If you re a proactive SRE looking to work on challenging, high-impact projects in a flexible and supportive environment we d love to hear from you. Apply now and let s build the future together.
Jun 03, 2025
Full time
Site Reliability Engineer Hybrid (Near Hereford) £80,000 - £110,000 DOE + Clearance Join a team that builds technology where it matters most. In 2019, TwinStream was founded by a group of engineers with deep experience solving complex, cross-domain problems within government organisations. Today, we continue that mission delivering cutting-edge solutions with a focus on technical excellence, reliability, and exceptional service. We work in hybrid teams: some of us work closely with clients onsite, while others operate remotely. What unites us is a commitment to innovation, collaboration, and a job well done. The Role: Site Reliability Engineer We re growing and so is demand for the secure, high-performance systems we deliver to government clients. As a Site Reliability Engineer (SRE), you ll play a key role in ensuring that these critical services are always available, resilient, and cost-effective. You ll work closely with development and support teams to evolve infrastructure, optimise delivery pipelines, and proactively detect and resolve reliability risks whether in the cloud or on-premise. This is more than just operations. It s a chance to shape the future of secure, high-impact systems at scale. Why You ll Love Working Here We believe in supporting our people personally and professionally. Here's what we offer: 8% Pension Contribution Private Medical (incl. Dental & Optical) £1,000 Annual Training Budget 25 Days Holiday + Bank Holidays Flexible & Hybrid Working EV Leasing via Salary Sacrifice Team Events & Celebrations Life Assurance & Cycle-to-Work Scheme Key Responsibilities of the Site Reliability Engineer: Partner with developers to improve performance and reliability across systems Automate toil and reduce unnecessary alerts with smart tooling Evolve observability so we can prevent issues before they become incidents Improve CI/CD pipelines and support development teams in delivering quality faster Explore new technologies, tools, and services that improve how we build Drive cost and performance improvements in real systems used by government clients Your Skills & Experience: We re looking for engineers with strong foundational knowledge and a passion for infrastructure and automation: Essential: Proficiency with Ansible (Chef or similar is a plus) Experience with Terraform and modern IaC practices Hands-on with Docker and orchestration tools (Kubernetes, OpenShift, or Docker Swarm) CI/CD experience (Jenkins or equivalent) Monitoring/observability tools: Grafana, Prometheus, or InfluxDB Event-driven messaging: RabbitMQ or similar Strong Linux skills, scripting, and understanding of network security protocols Experience with AWS: EC2, S3, RDS, Lambda Security Requirements Due to the sensitive nature of our work, candidates must be eligible for Developed Vetting (DV) clearance. All offers are subject to security screening. Ready to Engineer Systems That Matter? If you re a proactive SRE looking to work on challenging, high-impact projects in a flexible and supportive environment we d love to hear from you. Apply now and let s build the future together.
Site Reliability Engineer (SRE)
Delta Capita Group Wrexham, Clwyd
Job Title: Site Reliability Engineer (SRE) Location: London / Wrexham / UK Remote Type: Permanent Role Summary We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic engineering team to support critical application deployments in a "follow-the-sun" environment. In this role, you will leverage your expertise in cloud provisioning, infrastructure as code, and container orchestration to ensure the reliability, scalability, and performance of our services. We are looking for a self-starter with an open-minded attitude-someone who approaches challenges thoughtfully and strategically. You will collaborate closely with development teams to design and implement robust infrastructure solutions utilizing AWS, Azure, and containerized technologies. The Role and Responsibilities Cloud Infrastructure Management: Design, implement, and manage cloud infrastructure in AWS and Azure, ensuring alignment with best practices and organizational standards. Infrastructure as Code (IaC): Utilize Terraform (HCL), AWS CDK, and AWS CloudFormation for scalable and maintainable IaC, enabling safe and efficient infrastructure builds, changes, and versioning. Containerization and Orchestration: Deploy, manage, and provide ongoing support for containerized applications using Kubernetes, including Amazon EKS (Elastic Kubernetes Service) and Azure Kubernetes Service (AKS), ensuring their reliability, availability, and performance. Monitoring and Alerting: Monitor application performance and system health through observability tools (e.g., Prometheus, Grafana, ELK stack), proactively identifying and resolving issues to ensure high availability and rapid incident response. Security and IAM: Implement security best practices, managing Identity and Access Management (IAM) policies across cloud environments. Utilize technologies such as OpenID Connect (OIDC), OAuth2, and SAML Single Sign-On (SSO) to ensure secure authentication and authorization across services. Database Technologies: Manage and optimize database systems, including SQL databases and AWS DynamoDB, ensuring high availability, performance tuning, and data security. CI/CD Practices: Automate manual processes to enhance operational efficiency, employing Continuous Integration/Continuous Deployment (CI/CD) best practices for efficient code deployment. Scripting Languages: Demonstrate proficient scripting skills in languages such as Java, TypeScript, and Python to automate tasks and manage configurations. Load Balancing: Implement and maintain load balancing solutions to ensure optimal distribution of application traffic and high availability. Collaboration with Development Teams: Collaborate with software engineering teams to design, develop, and maintain robust systems and solutions, including RESTful APIs, ensuring seamless integration across platforms. Post-Mortem Analysis: Conduct comprehensive post-mortem analyses following incidents, identifying root causes and recommending improvements to enhance system reliability and performance. Mentorship: Mentor and guide junior engineers, fostering a culture of knowledge sharing and continuous improvement within the engineering team. Skills and Experience As an experienced SRE, you will have: Bachelor's degree in Computer Science, Engineering, or equivalent practical experience. Proven work experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role within a high-availability environment. Strong experience with AWS and Azure cloud services, including a deep understanding of cloud architecture and services. Expertise in Infrastructure as Code (IaC) using Terraform (HCL) and AWS CloudFormation. Experience with AWS CDK for programmatic management of cloud resources, primarily using TypeScript. Hands-on experience with container orchestration technologies, particularly Kubernetes. Familiarity with version control systems (e.g., Git) and CI/CD pipelines for efficient code deployment. Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) to ensure system observability. Strong experience with SQL databases and AWS DynamoDB, focusing on performance tuning and optimization. Proven ability to design and manage RESTful APIs, ensuring their reliability and scalability. Excellent troubleshooting skills, with a proactive approach to resolving complex technical issues. Strong communication and teamwork skills, enabling effective collaboration across cross-functional teams. A curious and open-minded attitude, committed to challenging the status quo and exploring innovative solutions. It would be great if you: Experience with networking concepts and troubleshooting in cloud environments. Knowledge of security best practices in cloud computing. Contributions to open-source projects or the creation of technical articles/blog posts to share knowledge with the community. Familiarity with service mesh technologies. Exposure to Agile methodologies and project management tools. Financial services domain knowledge. How We Work: Delta Capita is an equal opportunity employer. We positively encourage applications from suitably qualified and eligible candidates regardless of age, colour, disability, national origin, ancestry, race, religion, gender, sexual orientation, gender identity and/or expression, veteran status, genetic information, or any other status protected by applicable law. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. If you require any reasonable adjustments through your interview process, please use the designated space within the application questionnaire. This is a permanent full-time position based in the UK. Open to hybrid or remote working. As the selection and interview process is ongoing, please submit your application in English as soon as possible. If your profile is selected, a member of our team will contact you within 4 weeks. For this role, a valid working permit for the UK is mandatory. Who We Are: Delta Capita Group (a member of the Prytek Group) is a global managed services, consulting and solutions provider with a unique combination of experience in Financial Services and technology innovation capability. Our mission is to reinvent the financial services value chain providing technology-based mutualized services for financial institutions for non-differentiating services. Our 3 offerings are: Managed Services Consulting & Solutions Technology To know more about Delta Capita and our culture click here: Working at DC - Delta Capita .
Feb 21, 2025
Full time
Job Title: Site Reliability Engineer (SRE) Location: London / Wrexham / UK Remote Type: Permanent Role Summary We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic engineering team to support critical application deployments in a "follow-the-sun" environment. In this role, you will leverage your expertise in cloud provisioning, infrastructure as code, and container orchestration to ensure the reliability, scalability, and performance of our services. We are looking for a self-starter with an open-minded attitude-someone who approaches challenges thoughtfully and strategically. You will collaborate closely with development teams to design and implement robust infrastructure solutions utilizing AWS, Azure, and containerized technologies. The Role and Responsibilities Cloud Infrastructure Management: Design, implement, and manage cloud infrastructure in AWS and Azure, ensuring alignment with best practices and organizational standards. Infrastructure as Code (IaC): Utilize Terraform (HCL), AWS CDK, and AWS CloudFormation for scalable and maintainable IaC, enabling safe and efficient infrastructure builds, changes, and versioning. Containerization and Orchestration: Deploy, manage, and provide ongoing support for containerized applications using Kubernetes, including Amazon EKS (Elastic Kubernetes Service) and Azure Kubernetes Service (AKS), ensuring their reliability, availability, and performance. Monitoring and Alerting: Monitor application performance and system health through observability tools (e.g., Prometheus, Grafana, ELK stack), proactively identifying and resolving issues to ensure high availability and rapid incident response. Security and IAM: Implement security best practices, managing Identity and Access Management (IAM) policies across cloud environments. Utilize technologies such as OpenID Connect (OIDC), OAuth2, and SAML Single Sign-On (SSO) to ensure secure authentication and authorization across services. Database Technologies: Manage and optimize database systems, including SQL databases and AWS DynamoDB, ensuring high availability, performance tuning, and data security. CI/CD Practices: Automate manual processes to enhance operational efficiency, employing Continuous Integration/Continuous Deployment (CI/CD) best practices for efficient code deployment. Scripting Languages: Demonstrate proficient scripting skills in languages such as Java, TypeScript, and Python to automate tasks and manage configurations. Load Balancing: Implement and maintain load balancing solutions to ensure optimal distribution of application traffic and high availability. Collaboration with Development Teams: Collaborate with software engineering teams to design, develop, and maintain robust systems and solutions, including RESTful APIs, ensuring seamless integration across platforms. Post-Mortem Analysis: Conduct comprehensive post-mortem analyses following incidents, identifying root causes and recommending improvements to enhance system reliability and performance. Mentorship: Mentor and guide junior engineers, fostering a culture of knowledge sharing and continuous improvement within the engineering team. Skills and Experience As an experienced SRE, you will have: Bachelor's degree in Computer Science, Engineering, or equivalent practical experience. Proven work experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role within a high-availability environment. Strong experience with AWS and Azure cloud services, including a deep understanding of cloud architecture and services. Expertise in Infrastructure as Code (IaC) using Terraform (HCL) and AWS CloudFormation. Experience with AWS CDK for programmatic management of cloud resources, primarily using TypeScript. Hands-on experience with container orchestration technologies, particularly Kubernetes. Familiarity with version control systems (e.g., Git) and CI/CD pipelines for efficient code deployment. Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) to ensure system observability. Strong experience with SQL databases and AWS DynamoDB, focusing on performance tuning and optimization. Proven ability to design and manage RESTful APIs, ensuring their reliability and scalability. Excellent troubleshooting skills, with a proactive approach to resolving complex technical issues. Strong communication and teamwork skills, enabling effective collaboration across cross-functional teams. A curious and open-minded attitude, committed to challenging the status quo and exploring innovative solutions. It would be great if you: Experience with networking concepts and troubleshooting in cloud environments. Knowledge of security best practices in cloud computing. Contributions to open-source projects or the creation of technical articles/blog posts to share knowledge with the community. Familiarity with service mesh technologies. Exposure to Agile methodologies and project management tools. Financial services domain knowledge. How We Work: Delta Capita is an equal opportunity employer. We positively encourage applications from suitably qualified and eligible candidates regardless of age, colour, disability, national origin, ancestry, race, religion, gender, sexual orientation, gender identity and/or expression, veteran status, genetic information, or any other status protected by applicable law. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. If you require any reasonable adjustments through your interview process, please use the designated space within the application questionnaire. This is a permanent full-time position based in the UK. Open to hybrid or remote working. As the selection and interview process is ongoing, please submit your application in English as soon as possible. If your profile is selected, a member of our team will contact you within 4 weeks. For this role, a valid working permit for the UK is mandatory. Who We Are: Delta Capita Group (a member of the Prytek Group) is a global managed services, consulting and solutions provider with a unique combination of experience in Financial Services and technology innovation capability. Our mission is to reinvent the financial services value chain providing technology-based mutualized services for financial institutions for non-differentiating services. Our 3 offerings are: Managed Services Consulting & Solutions Technology To know more about Delta Capita and our culture click here: Working at DC - Delta Capita .
Site Reliability Engineer (SRE)
Delta Capita Group
Job Title: Site Reliability Engineer (SRE) Location: London / Wrexham / UK Remote Type: Permanent Role Summary We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic engineering team to support critical application deployments in a "follow-the-sun" environment. In this role, you will leverage your expertise in cloud provisioning, infrastructure as code, and container orchestration to ensure the reliability, scalability, and performance of our services. We are looking for a self-starter with an open-minded attitude-someone who approaches challenges thoughtfully and strategically. You will collaborate closely with development teams to design and implement robust infrastructure solutions utilizing AWS, Azure, and containerized technologies. The Role and Responsibilities Cloud Infrastructure Management: Design, implement, and manage cloud infrastructure in AWS and Azure, ensuring alignment with best practices and organizational standards. Infrastructure as Code (IaC): Utilize Terraform (HCL), AWS CDK, and AWS CloudFormation for scalable and maintainable IaC, enabling safe and efficient infrastructure builds, changes, and versioning. Containerization and Orchestration: Deploy, manage, and provide ongoing support for containerized applications using Kubernetes, including Amazon EKS (Elastic Kubernetes Service) and Azure Kubernetes Service (AKS), ensuring their reliability, availability, and performance. Monitoring and Alerting: Monitor application performance and system health through observability tools (e.g., Prometheus, Grafana, ELK stack), proactively identifying and resolving issues to ensure high availability and rapid incident response. Security and IAM: Implement security best practices, managing Identity and Access Management (IAM) policies across cloud environments. Utilize technologies such as OpenID Connect (OIDC), OAuth2, and SAML Single Sign-On (SSO) to ensure secure authentication and authorization across services. Database Technologies: Manage and optimize database systems, including SQL databases and AWS DynamoDB, ensuring high availability, performance tuning, and data security. CI/CD Practices: Automate manual processes to enhance operational efficiency, employing Continuous Integration/Continuous Deployment (CI/CD) best practices for efficient code deployment. Scripting Languages: Demonstrate proficient scripting skills in languages such as Java, TypeScript, and Python to automate tasks and manage configurations. Load Balancing: Implement and maintain load balancing solutions to ensure optimal distribution of application traffic and high availability. Collaboration with Development Teams: Collaborate with software engineering teams to design, develop, and maintain robust systems and solutions, including RESTful APIs, ensuring seamless integration across platforms. Post-Mortem Analysis: Conduct comprehensive post-mortem analyses following incidents, identifying root causes and recommending improvements to enhance system reliability and performance. Mentorship: Mentor and guide junior engineers, fostering a culture of knowledge sharing and continuous improvement within the engineering team. Skills and Experience As an experienced SRE, you will have: Bachelor's degree in Computer Science, Engineering, or equivalent practical experience. Proven work experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role within a high-availability environment. Strong experience with AWS and Azure cloud services, including a deep understanding of cloud architecture and services. Expertise in Infrastructure as Code (IaC) using Terraform (HCL) and AWS CloudFormation. Experience with AWS CDK for programmatic management of cloud resources, primarily using TypeScript. Hands-on experience with container orchestration technologies, particularly Kubernetes. Familiarity with version control systems (e.g., Git) and CI/CD pipelines for efficient code deployment. Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) to ensure system observability. Strong experience with SQL databases and AWS DynamoDB, focusing on performance tuning and optimization. Proven ability to design and manage RESTful APIs, ensuring their reliability and scalability. Excellent troubleshooting skills, with a proactive approach to resolving complex technical issues. Strong communication and teamwork skills, enabling effective collaboration across cross-functional teams. A curious and open-minded attitude, committed to challenging the status quo and exploring innovative solutions. It would be great if you: Experience with networking concepts and troubleshooting in cloud environments. Knowledge of security best practices in cloud computing. Contributions to open-source projects or the creation of technical articles/blog posts to share knowledge with the community. Familiarity with service mesh technologies. Exposure to Agile methodologies and project management tools. Financial services domain knowledge. How We Work: Delta Capita is an equal opportunity employer. We positively encourage applications from suitably qualified and eligible candidates regardless of age, colour, disability, national origin, ancestry, race, religion, gender, sexual orientation, gender identity and/or expression, veteran status, genetic information, or any other status protected by applicable law. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. If you require any reasonable adjustments through your interview process, please use the designated space within the application questionnaire. This is a permanent full-time position based in the UK. Open to hybrid or remote working. As the selection and interview process is ongoing, please submit your application in English as soon as possible. If your profile is selected, a member of our team will contact you within 4 weeks. For this role, a valid working permit for the UK is mandatory. Who We Are: Delta Capita Group (a member of the Prytek Group) is a global managed services, consulting and solutions provider with a unique combination of experience in Financial Services and technology innovation capability. Our mission is to reinvent the financial services value chain providing technology-based mutualized services for financial institutions for non-differentiating services. Our 3 offerings are: Managed Services Consulting & Solutions Technology To know more about Delta Capita and our culture click here: Working at DC - Delta Capita .
Feb 21, 2025
Full time
Job Title: Site Reliability Engineer (SRE) Location: London / Wrexham / UK Remote Type: Permanent Role Summary We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic engineering team to support critical application deployments in a "follow-the-sun" environment. In this role, you will leverage your expertise in cloud provisioning, infrastructure as code, and container orchestration to ensure the reliability, scalability, and performance of our services. We are looking for a self-starter with an open-minded attitude-someone who approaches challenges thoughtfully and strategically. You will collaborate closely with development teams to design and implement robust infrastructure solutions utilizing AWS, Azure, and containerized technologies. The Role and Responsibilities Cloud Infrastructure Management: Design, implement, and manage cloud infrastructure in AWS and Azure, ensuring alignment with best practices and organizational standards. Infrastructure as Code (IaC): Utilize Terraform (HCL), AWS CDK, and AWS CloudFormation for scalable and maintainable IaC, enabling safe and efficient infrastructure builds, changes, and versioning. Containerization and Orchestration: Deploy, manage, and provide ongoing support for containerized applications using Kubernetes, including Amazon EKS (Elastic Kubernetes Service) and Azure Kubernetes Service (AKS), ensuring their reliability, availability, and performance. Monitoring and Alerting: Monitor application performance and system health through observability tools (e.g., Prometheus, Grafana, ELK stack), proactively identifying and resolving issues to ensure high availability and rapid incident response. Security and IAM: Implement security best practices, managing Identity and Access Management (IAM) policies across cloud environments. Utilize technologies such as OpenID Connect (OIDC), OAuth2, and SAML Single Sign-On (SSO) to ensure secure authentication and authorization across services. Database Technologies: Manage and optimize database systems, including SQL databases and AWS DynamoDB, ensuring high availability, performance tuning, and data security. CI/CD Practices: Automate manual processes to enhance operational efficiency, employing Continuous Integration/Continuous Deployment (CI/CD) best practices for efficient code deployment. Scripting Languages: Demonstrate proficient scripting skills in languages such as Java, TypeScript, and Python to automate tasks and manage configurations. Load Balancing: Implement and maintain load balancing solutions to ensure optimal distribution of application traffic and high availability. Collaboration with Development Teams: Collaborate with software engineering teams to design, develop, and maintain robust systems and solutions, including RESTful APIs, ensuring seamless integration across platforms. Post-Mortem Analysis: Conduct comprehensive post-mortem analyses following incidents, identifying root causes and recommending improvements to enhance system reliability and performance. Mentorship: Mentor and guide junior engineers, fostering a culture of knowledge sharing and continuous improvement within the engineering team. Skills and Experience As an experienced SRE, you will have: Bachelor's degree in Computer Science, Engineering, or equivalent practical experience. Proven work experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role within a high-availability environment. Strong experience with AWS and Azure cloud services, including a deep understanding of cloud architecture and services. Expertise in Infrastructure as Code (IaC) using Terraform (HCL) and AWS CloudFormation. Experience with AWS CDK for programmatic management of cloud resources, primarily using TypeScript. Hands-on experience with container orchestration technologies, particularly Kubernetes. Familiarity with version control systems (e.g., Git) and CI/CD pipelines for efficient code deployment. Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) to ensure system observability. Strong experience with SQL databases and AWS DynamoDB, focusing on performance tuning and optimization. Proven ability to design and manage RESTful APIs, ensuring their reliability and scalability. Excellent troubleshooting skills, with a proactive approach to resolving complex technical issues. Strong communication and teamwork skills, enabling effective collaboration across cross-functional teams. A curious and open-minded attitude, committed to challenging the status quo and exploring innovative solutions. It would be great if you: Experience with networking concepts and troubleshooting in cloud environments. Knowledge of security best practices in cloud computing. Contributions to open-source projects or the creation of technical articles/blog posts to share knowledge with the community. Familiarity with service mesh technologies. Exposure to Agile methodologies and project management tools. Financial services domain knowledge. How We Work: Delta Capita is an equal opportunity employer. We positively encourage applications from suitably qualified and eligible candidates regardless of age, colour, disability, national origin, ancestry, race, religion, gender, sexual orientation, gender identity and/or expression, veteran status, genetic information, or any other status protected by applicable law. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. If you require any reasonable adjustments through your interview process, please use the designated space within the application questionnaire. This is a permanent full-time position based in the UK. Open to hybrid or remote working. As the selection and interview process is ongoing, please submit your application in English as soon as possible. If your profile is selected, a member of our team will contact you within 4 weeks. For this role, a valid working permit for the UK is mandatory. Who We Are: Delta Capita Group (a member of the Prytek Group) is a global managed services, consulting and solutions provider with a unique combination of experience in Financial Services and technology innovation capability. Our mission is to reinvent the financial services value chain providing technology-based mutualized services for financial institutions for non-differentiating services. Our 3 offerings are: Managed Services Consulting & Solutions Technology To know more about Delta Capita and our culture click here: Working at DC - Delta Capita .
Head of Cloud and Platform Engineering
Griffinfire
Head of Cloud and Platform Engineering - United Kingdom Our IT Practice Area is a key part of our success, and our clients include many large household name companies. We now have an exciting opportunity for a Head of Cloud and Platform Engineering to join our London office. As Head of Cloud and Platform Engineering, you will lead and build our Cloud and Platform engineering capabilities, overseeing architecture, data, security, SRE, and incident management. We are looking for someone who is a skilled leader with deep experience in cloud infrastructure and modern methodologies, who thinks "cloud-first" and can bring simplicity and efficiency to our platforms. We need an exceptional people leader who can recruit and develop top talent, fostering growth and adaptability within our teams as we embrace cloud-native systems and DevSecOps methodologies. You will guide our existing teams in learning and adapting to cloud-based approaches while building a team culture that prioritizes excellence and innovation. As an advocate for building fantastic technical and engineering teams, you will position Barnett Waddingham as a trusted partner for the financial services sector, recognized for driving operational excellence, system resilience, and reliability, enabling innovative solutions, and supporting the continuous development of our engineering teams. Responsibilities Lead, build, and develop a high-performing infrastructure and operational team Modernise operations and incident management Build and develop an SRE function Integrate security into DevOps practices Enhance developer experience Guide data and architecture teams We would love to hear from if you have: Degree-level education with 5+ years of experience in technical leadership roles. Proven track record as a senior engineering manager, director of engineering, or leader in a tech sector growth or scale-up business. Comprehensive knowledge of DevOps and DevSecOps methodologies with hands-on experience working in Agile team environments. Expertise in modern observability practices and Site Reliability Engineering (SRE), including best practices, implementations, and familiarity with observability vendors. Deep hands-on cloud experience, with 5+ years working with Azure, AWS, or GCP. Experience managing and building teams across operations, site reliability, data, and security functions. Technical proficiency in both software development and Infrastructure as Code (IaC), with familiarity in engineering best practices associated with software development. What's in it for you: Competitive discretionary annual bonus Generous pension scheme Core benefits including private medical cover, life assurance, group income protection, and up to 30 days holiday per year with holiday trading A comprehensive range of voluntary benefits to suit you (and your family) Happy to talk flexible working. We are a Disability Confident Employer. If you require reasonable adjustments or want more information on accessibility, please click here.
Feb 17, 2025
Full time
Head of Cloud and Platform Engineering - United Kingdom Our IT Practice Area is a key part of our success, and our clients include many large household name companies. We now have an exciting opportunity for a Head of Cloud and Platform Engineering to join our London office. As Head of Cloud and Platform Engineering, you will lead and build our Cloud and Platform engineering capabilities, overseeing architecture, data, security, SRE, and incident management. We are looking for someone who is a skilled leader with deep experience in cloud infrastructure and modern methodologies, who thinks "cloud-first" and can bring simplicity and efficiency to our platforms. We need an exceptional people leader who can recruit and develop top talent, fostering growth and adaptability within our teams as we embrace cloud-native systems and DevSecOps methodologies. You will guide our existing teams in learning and adapting to cloud-based approaches while building a team culture that prioritizes excellence and innovation. As an advocate for building fantastic technical and engineering teams, you will position Barnett Waddingham as a trusted partner for the financial services sector, recognized for driving operational excellence, system resilience, and reliability, enabling innovative solutions, and supporting the continuous development of our engineering teams. Responsibilities Lead, build, and develop a high-performing infrastructure and operational team Modernise operations and incident management Build and develop an SRE function Integrate security into DevOps practices Enhance developer experience Guide data and architecture teams We would love to hear from if you have: Degree-level education with 5+ years of experience in technical leadership roles. Proven track record as a senior engineering manager, director of engineering, or leader in a tech sector growth or scale-up business. Comprehensive knowledge of DevOps and DevSecOps methodologies with hands-on experience working in Agile team environments. Expertise in modern observability practices and Site Reliability Engineering (SRE), including best practices, implementations, and familiarity with observability vendors. Deep hands-on cloud experience, with 5+ years working with Azure, AWS, or GCP. Experience managing and building teams across operations, site reliability, data, and security functions. Technical proficiency in both software development and Infrastructure as Code (IaC), with familiarity in engineering best practices associated with software development. What's in it for you: Competitive discretionary annual bonus Generous pension scheme Core benefits including private medical cover, life assurance, group income protection, and up to 30 days holiday per year with holiday trading A comprehensive range of voluntary benefits to suit you (and your family) Happy to talk flexible working. We are a Disability Confident Employer. If you require reasonable adjustments or want more information on accessibility, please click here.
Senior SRE / DevOps (Blockchain)
Kiln
Full time - Paris or remote - 90/100k€ + equity As a Senior Site Reliability Engineer / DevOps at Kiln, you will join our Infrastructure Team, composed of 10 Engineers, to build the future of our Validator product and deploy new blockchain protocols. You will report to our Head of Infrastructure, and collaborate with both the Product and Software Development teams to provide internal tools and services and contribute to the continuous improvement of Kiln's Infrastructure-as-Code. Responsibilities: Deploying new blockchain protocols in accordance with the Product team. Architect, deploy and maintain our multi-cloud infrastructure. Ensure that our services communicate with each other seamlessly, have minimal downtime, and recover quickly. Make sure we respect any software security norms (Kiln is a SOC 2 Type 1 and Type 2 company). Continuously support our Software/Smart Contract team to ship code of quality. Actively suggest continuous improvement of Kiln's architecture. Assess any protocol deployment risks. Communicate with our Product & Sales team to make sure they understand any risk that may occur during protocol deployment. Stack: Infrastructure: AWS/GCP + baremetal, Kubernetes, Terraform/Terragrunt, Prometheus/Thanos, Helm, Hashicorp Vault, FluxCD Software: Golang, Typescript, PostgreSQL Smart-Contract: Solidity, Foundry, OpenZeppelin Requirements: +5 years of background experience in Software or Infrastructure , within a high standard engineering environment - preferably FinTech or Crypto. Proven experience as a Senior SRE with a very strong focus on Kubernetes. Proficiency with IaC (Terraform/Terragrunt) and infrastructure automation (Helm, GitOps). Familiar with Prometheus and PromQL. Familiar with infrastructure and data security (KMS, Hashicorp Vault). Ability to ship opinionated architectural choices and code, and to share software best practices. Fluent in both French and English. All our written communication is in English. Nice-to-have: First experience in a web3/crypto/blockchain company. Experience in running blockchain nodes, either professionally or as a hobby. Experience designing, building and deploying user-facing, and/or API-based products. Previous experience working within a certified environment (SOC2, ISO 27001, PCI DSS, HIPAA ) About Kiln: Kiln is the leading enterprise-grade rewards platform that enables institutional customers to stake assets and integrate staking & DeFi functionality into their offerings. Our API-first platform provides fully automated validators, staking & DeFi protocols access, and comprehensive data and commission management. With $13+ billion in crypto assets staked through our platform, Kiln has established a strong presence on Ethereum, managing over 4.3% of the network through 45,000+ validators - all with zero slashing events. Kiln serves more than 140 leading customers, including Binance, BitPanda, Bitgo, Fireblocks, VanEck, and TrustWallet. Our team of 90 ecosystem enthusiasts brings experience from industry leaders like Google, Circle, Ledger, Chainalysis, and other prominent technology and cryptocurrency companies. We've raised $30M in total funding from prominent investors including 1kx Illuminate Financial, Consensys, Wintermute, Kraken Ventures Join Kiln and help us make the web more secure, stable, decentralized, and fair! How Kiln will support you: A fast-paced, no bureaucratic work environment. Equity Share Options in the Business: if Kiln succeeds, we all succeed! Competitive Salary. Unlimited holiday. Flexible remote working. Choose your IT equipment. Internet connection paid up to €50/month. Significant personal development and tech conf budget. Your interview process: Recruiter Interview (45 min). Take-home test ( Technical Interview (90 min). Core Values Interview (45 min). Founders Interview (30 min). Offer! Please note that we are not sponsoring visas for persons without work authorization in the UK or the EU. This role is specifically for employees (no B2B or contractors) based in France, the UK, Italy, Spain, Portugal & Netherlands. Thank you!
Feb 15, 2025
Full time
Full time - Paris or remote - 90/100k€ + equity As a Senior Site Reliability Engineer / DevOps at Kiln, you will join our Infrastructure Team, composed of 10 Engineers, to build the future of our Validator product and deploy new blockchain protocols. You will report to our Head of Infrastructure, and collaborate with both the Product and Software Development teams to provide internal tools and services and contribute to the continuous improvement of Kiln's Infrastructure-as-Code. Responsibilities: Deploying new blockchain protocols in accordance with the Product team. Architect, deploy and maintain our multi-cloud infrastructure. Ensure that our services communicate with each other seamlessly, have minimal downtime, and recover quickly. Make sure we respect any software security norms (Kiln is a SOC 2 Type 1 and Type 2 company). Continuously support our Software/Smart Contract team to ship code of quality. Actively suggest continuous improvement of Kiln's architecture. Assess any protocol deployment risks. Communicate with our Product & Sales team to make sure they understand any risk that may occur during protocol deployment. Stack: Infrastructure: AWS/GCP + baremetal, Kubernetes, Terraform/Terragrunt, Prometheus/Thanos, Helm, Hashicorp Vault, FluxCD Software: Golang, Typescript, PostgreSQL Smart-Contract: Solidity, Foundry, OpenZeppelin Requirements: +5 years of background experience in Software or Infrastructure , within a high standard engineering environment - preferably FinTech or Crypto. Proven experience as a Senior SRE with a very strong focus on Kubernetes. Proficiency with IaC (Terraform/Terragrunt) and infrastructure automation (Helm, GitOps). Familiar with Prometheus and PromQL. Familiar with infrastructure and data security (KMS, Hashicorp Vault). Ability to ship opinionated architectural choices and code, and to share software best practices. Fluent in both French and English. All our written communication is in English. Nice-to-have: First experience in a web3/crypto/blockchain company. Experience in running blockchain nodes, either professionally or as a hobby. Experience designing, building and deploying user-facing, and/or API-based products. Previous experience working within a certified environment (SOC2, ISO 27001, PCI DSS, HIPAA ) About Kiln: Kiln is the leading enterprise-grade rewards platform that enables institutional customers to stake assets and integrate staking & DeFi functionality into their offerings. Our API-first platform provides fully automated validators, staking & DeFi protocols access, and comprehensive data and commission management. With $13+ billion in crypto assets staked through our platform, Kiln has established a strong presence on Ethereum, managing over 4.3% of the network through 45,000+ validators - all with zero slashing events. Kiln serves more than 140 leading customers, including Binance, BitPanda, Bitgo, Fireblocks, VanEck, and TrustWallet. Our team of 90 ecosystem enthusiasts brings experience from industry leaders like Google, Circle, Ledger, Chainalysis, and other prominent technology and cryptocurrency companies. We've raised $30M in total funding from prominent investors including 1kx Illuminate Financial, Consensys, Wintermute, Kraken Ventures Join Kiln and help us make the web more secure, stable, decentralized, and fair! How Kiln will support you: A fast-paced, no bureaucratic work environment. Equity Share Options in the Business: if Kiln succeeds, we all succeed! Competitive Salary. Unlimited holiday. Flexible remote working. Choose your IT equipment. Internet connection paid up to €50/month. Significant personal development and tech conf budget. Your interview process: Recruiter Interview (45 min). Take-home test ( Technical Interview (90 min). Core Values Interview (45 min). Founders Interview (30 min). Offer! Please note that we are not sponsoring visas for persons without work authorization in the UK or the EU. This role is specifically for employees (no B2B or contractors) based in France, the UK, Italy, Spain, Portugal & Netherlands. Thank you!
Sky
Tech Lead
Sky
We believe in better. And we make it happen. Better content. Better products. And better careers. Working in Tech, Product or Data at Sky is about building the next and the new. From broadband to broadcast, streaming to mobile, SkyQ to Sky Glass, we never stand still. We optimise and innovate. We turn big ideas into the products, content and services millions of people love. And we do it all right here at Sky. Job Purpose As a Tech Lead, your role is to lead the Linux Infrastructure team, shaping the effectiveness of the day-to-day delivery and support processes, as well as providing technical guidance and mentoring to team members. You will be responsible for overseeing the design, implementation, and maintenance of Linux-based systems, ensuring their reliability, scalability, and security. Additionally, you will collaborate closely with other teams to integrate Linux systems with existing infrastructure and emerging technologies. Your input will be crucial in driving forward initiatives to improve the self-service nature of our Linux estate. Overall, you will play a pivotal role in maintaining the stability and efficiency of our Linux environment while driving innovation and continuous improvement. What You'll Do Strategy & Leadership Lead the Linux Infrastructure team in executing plans aligned with organisational objectives and internal client needs, fostering continual improvement of our Linux systems. Work to resolve the discontinuous nature of Linux deployments and integrating these with the strategic product plans for private cloud, automated host deployments and self-service. Work closely with the Senior Manager of SRE & Infrastructure and the Principal Linux Engineer to improve the Linux team delivery and support processes across the org, unifying all teams into a single way of working across our Linux environments. Working closely with the Product Management and SRE teams to drive new ideas into the product roadmaps around self-service capability, observability, security, and reliability. Team Management & Delivery Define and track key performance indicators (KPIs) to measure the success and impact of Linux initiatives. Fostering a team culture of high performance, innovation, and continuous improvement, while also providing professional development opportunities for all team members. Provide first-class on-call support through final escalation, where required. Engage with peers within our wider company, in Comcast and NBCU, to share ideas and find solutions where they may have already been solved. What You'll Bring Bachelor's or master's degree in computer science, engineering, or a related field. (Not essential if you have related experience or aptitude). Demonstrated experience leading Linux teams preferred, showcasing leadership skills and ability to coordinate and motivate team members effectively. Proficiency in scripting languages such as Bash and Python, with the capability to automate tasks and streamline processes efficiently. Strong understanding of DevOps principles, emphasising collaboration, automation, and continuous integration/continuous deployment (CI/CD) pipelines, and using version control systems such as Git/GitLab. Familiarity with containerisation and orchestration technologies like Docker, ECS/EKS and Kubernetes for deploying, managing and scaling applications with an understanding of modern deployment methods. Experience with Red Hat Satellite, including managing system patching, provisioning, and configuration in enterprise environments. Knowledge of configuration management tools ( , Ansible), managing and maintaining system configurations at scale and implementing Infrastructure-as-Code (IaC) solutions ( Terraform). Proven experience designing and managing cloud infrastructure across AWS (public cloud) and private cloud environments. Experience with monitoring and logging tools such as Prometheus, Grafana, and ELK stack. Certification in Linux administration ( , LPIC, RHCSA) desirable, demonstrating a commitment to continuous learning and validation of expertise in Linux systems management. Understanding of security frameworks such as NIST/SOC 2/ISO/IEC. The Rewards There's one thing people can't stop talking about when it comes to : the perks. Here's a taster: Sky Q, for the TV you love all in one place! The magic of Sky Glass at an exclusive rate A generous pension package Private healthcare Discounted mobile and broadband A wide range of Sky VIP rewards and experiences! Inclusion & How You'll Work We are a Disability Confident Employer, and welcome and encourage applications from all candidates. We will look to ensure
Feb 13, 2025
Full time
We believe in better. And we make it happen. Better content. Better products. And better careers. Working in Tech, Product or Data at Sky is about building the next and the new. From broadband to broadcast, streaming to mobile, SkyQ to Sky Glass, we never stand still. We optimise and innovate. We turn big ideas into the products, content and services millions of people love. And we do it all right here at Sky. Job Purpose As a Tech Lead, your role is to lead the Linux Infrastructure team, shaping the effectiveness of the day-to-day delivery and support processes, as well as providing technical guidance and mentoring to team members. You will be responsible for overseeing the design, implementation, and maintenance of Linux-based systems, ensuring their reliability, scalability, and security. Additionally, you will collaborate closely with other teams to integrate Linux systems with existing infrastructure and emerging technologies. Your input will be crucial in driving forward initiatives to improve the self-service nature of our Linux estate. Overall, you will play a pivotal role in maintaining the stability and efficiency of our Linux environment while driving innovation and continuous improvement. What You'll Do Strategy & Leadership Lead the Linux Infrastructure team in executing plans aligned with organisational objectives and internal client needs, fostering continual improvement of our Linux systems. Work to resolve the discontinuous nature of Linux deployments and integrating these with the strategic product plans for private cloud, automated host deployments and self-service. Work closely with the Senior Manager of SRE & Infrastructure and the Principal Linux Engineer to improve the Linux team delivery and support processes across the org, unifying all teams into a single way of working across our Linux environments. Working closely with the Product Management and SRE teams to drive new ideas into the product roadmaps around self-service capability, observability, security, and reliability. Team Management & Delivery Define and track key performance indicators (KPIs) to measure the success and impact of Linux initiatives. Fostering a team culture of high performance, innovation, and continuous improvement, while also providing professional development opportunities for all team members. Provide first-class on-call support through final escalation, where required. Engage with peers within our wider company, in Comcast and NBCU, to share ideas and find solutions where they may have already been solved. What You'll Bring Bachelor's or master's degree in computer science, engineering, or a related field. (Not essential if you have related experience or aptitude). Demonstrated experience leading Linux teams preferred, showcasing leadership skills and ability to coordinate and motivate team members effectively. Proficiency in scripting languages such as Bash and Python, with the capability to automate tasks and streamline processes efficiently. Strong understanding of DevOps principles, emphasising collaboration, automation, and continuous integration/continuous deployment (CI/CD) pipelines, and using version control systems such as Git/GitLab. Familiarity with containerisation and orchestration technologies like Docker, ECS/EKS and Kubernetes for deploying, managing and scaling applications with an understanding of modern deployment methods. Experience with Red Hat Satellite, including managing system patching, provisioning, and configuration in enterprise environments. Knowledge of configuration management tools ( , Ansible), managing and maintaining system configurations at scale and implementing Infrastructure-as-Code (IaC) solutions ( Terraform). Proven experience designing and managing cloud infrastructure across AWS (public cloud) and private cloud environments. Experience with monitoring and logging tools such as Prometheus, Grafana, and ELK stack. Certification in Linux administration ( , LPIC, RHCSA) desirable, demonstrating a commitment to continuous learning and validation of expertise in Linux systems management. Understanding of security frameworks such as NIST/SOC 2/ISO/IEC. The Rewards There's one thing people can't stop talking about when it comes to : the perks. Here's a taster: Sky Q, for the TV you love all in one place! The magic of Sky Glass at an exclusive rate A generous pension package Private healthcare Discounted mobile and broadband A wide range of Sky VIP rewards and experiences! Inclusion & How You'll Work We are a Disability Confident Employer, and welcome and encourage applications from all candidates. We will look to ensure
Nicholas Howard Ltd
Site Reliability Engineer
Nicholas Howard Ltd
Site Reliability Engineer Are you a Site Reliability Engineer, Environment Manager, Platform Engineer, or a senior-level DevOps Engineer? Are you looking for an exciting role in a newly formed team that will drive innovation and create best-in-class development environments to support product innovation and delivery? Does a remote-first role sound good to you? If so, then this could be right up your street! Nicholas Howard is delighted to be recruiting for a Site Reliability Engineer to join a leading systems integrator. Our client helps companies to establish, maintain and grow their IT services, and operate their critical technology in a more cost-effective manner. This is a brand-new role within the strategic engineering team, which sets and maintains design and development standards across IP development. As a Site Reliability Engineer (SRE), you will ensure the reliability, availability, and performance of services, primarily utilising Microsoft Azure with a focus on containers, serverless, AI, analytics, and database services. You will work closely with development teams to build scalable and resilient systems and provide advisory support to our support teams. Although Azure will be our main Cloud Platform experience with AWS would be desirable. Fundamentally, the post-holder will play a crucial role in building the environment for internal development capability. This is a remote-first role, with time in the office in London once a month. Key Responsibilities: Collaborate with development teams to design scalable and resilient architectures in Azure. Develop and implement monitoring and alerting solutions to ensure service reliability. Automate operational processes and tasks using Infrastructure as Code (IaC) and scripting. Manage and optimise Azure resources, focusing on: Containers (e.g., Azure Kubernetes Service (AKS), Azure Container Apps). Serverless computing (e.g., Azure Functions, Logic Apps). AI and analytics (e.g., Azure Machine Learning, Synapse Analytics, Data Factory). Database services (e.g., Cosmos DB, Azure SQL, PostgreSQL). Perform root cause analysis for incidents and implement preventative measures. Provide advisory support to platform support teams. Work in a multi-cloud environment, and while Azure is the primary focus, experience with AWS (e.g., ECS, Lambda, RDS) is beneficial. Key Skills and Experience: Proven experience as an SRE, or in a similar role. Strong expertise in Azure services (containers, serverless, AI, analytics, databases). Experience with implementing and utilising monitoring & logging tools (Azure Monitor, Application Insights, Datadog, Grafana). Proficient in scripting & automation (Python, Bash, PowerShell). Infrastructure as Code (IaC) experience (Terraform, Bicep, ARM Templates). Experience with making technical decisions and implementing solutions that align with best practices and business goals. Excellent problem-solving and collaboration skills. AWS knowledge and experience would be a plus. The company offers a highly competitive salary, along with comprehensive benefits including flexible remote working, a generous company pension, health and dental insurance, life assurance, access to the Udemy training platform to support ongoing skills development and training, and a wide range of additional lifestyle perks. Please register your interest by applying now!
Feb 12, 2025
Full time
Site Reliability Engineer Are you a Site Reliability Engineer, Environment Manager, Platform Engineer, or a senior-level DevOps Engineer? Are you looking for an exciting role in a newly formed team that will drive innovation and create best-in-class development environments to support product innovation and delivery? Does a remote-first role sound good to you? If so, then this could be right up your street! Nicholas Howard is delighted to be recruiting for a Site Reliability Engineer to join a leading systems integrator. Our client helps companies to establish, maintain and grow their IT services, and operate their critical technology in a more cost-effective manner. This is a brand-new role within the strategic engineering team, which sets and maintains design and development standards across IP development. As a Site Reliability Engineer (SRE), you will ensure the reliability, availability, and performance of services, primarily utilising Microsoft Azure with a focus on containers, serverless, AI, analytics, and database services. You will work closely with development teams to build scalable and resilient systems and provide advisory support to our support teams. Although Azure will be our main Cloud Platform experience with AWS would be desirable. Fundamentally, the post-holder will play a crucial role in building the environment for internal development capability. This is a remote-first role, with time in the office in London once a month. Key Responsibilities: Collaborate with development teams to design scalable and resilient architectures in Azure. Develop and implement monitoring and alerting solutions to ensure service reliability. Automate operational processes and tasks using Infrastructure as Code (IaC) and scripting. Manage and optimise Azure resources, focusing on: Containers (e.g., Azure Kubernetes Service (AKS), Azure Container Apps). Serverless computing (e.g., Azure Functions, Logic Apps). AI and analytics (e.g., Azure Machine Learning, Synapse Analytics, Data Factory). Database services (e.g., Cosmos DB, Azure SQL, PostgreSQL). Perform root cause analysis for incidents and implement preventative measures. Provide advisory support to platform support teams. Work in a multi-cloud environment, and while Azure is the primary focus, experience with AWS (e.g., ECS, Lambda, RDS) is beneficial. Key Skills and Experience: Proven experience as an SRE, or in a similar role. Strong expertise in Azure services (containers, serverless, AI, analytics, databases). Experience with implementing and utilising monitoring & logging tools (Azure Monitor, Application Insights, Datadog, Grafana). Proficient in scripting & automation (Python, Bash, PowerShell). Infrastructure as Code (IaC) experience (Terraform, Bicep, ARM Templates). Experience with making technical decisions and implementing solutions that align with best practices and business goals. Excellent problem-solving and collaboration skills. AWS knowledge and experience would be a plus. The company offers a highly competitive salary, along with comprehensive benefits including flexible remote working, a generous company pension, health and dental insurance, life assurance, access to the Udemy training platform to support ongoing skills development and training, and a wide range of additional lifestyle perks. Please register your interest by applying now!
Lead Site Reliability Engineer
Bumble
Inclusion at Bumble Inc. Bumble Inc. is an equal opportunity employer and we strongly encourage people of all ages, colour, lesbian, gay, bisexual, transgender, queer and non-binary people, veterans, parents, people with disabilities, and neurodivergent people to apply. We're happy to make any reasonable adjustments that will help you feel more confident throughout the process, please don't hesitate to let us know how we can help. In your application, please feel free to note which pronouns you use (For example: she/her, he/him, they/them, etc). At Bumble, Site Reliability Engineers (SRE) are responsible for ensuring the reliability, scalability and performance of software systems while bridging the gap between development, security and operations. We proactively manage, automate, and safeguard our infrastructure to deliver a robust foundation for the business and an exceptional experience for our stakeholders. What you'll be doing Design and build new tools and services from the ground up to solve complex problems Build automation frameworks to streamline repetitive tasks Design and maintain scalable, highly available and fault-tolerant systems Build and maintain observability tooling including logging, monitoring, tracing and alerting systems Develop and maintain automation tooling to reduce manual intervention Implement infrastructure as code (IaC) for infrastructure provisioning. Monitor system health and performance, identifying and fixing issues Respond to system outages, troubleshooting root causes and implementing preventative measures Collaborate with engineering teams and security engineers to improve system reliability, security and performance Participate in on-call rotations Create and maintain documentation to improve knowledge sharing across teams About you Excellent problem solving, analytical skills Strong communication and collaboration skills are a must Proficiency in at least Python or Golang programming languages Experience with CI/CD pipelines Strong proficiency with Kubernetes architecture Prior experience in SRE, System administration or DevOps roles Strong proficiency with Linux/Unix operating systems, including hands-on experience in configuration and troubleshooting Proficiency with using Puppet for configuration management, automation and system provisioning Hands-on experience in monitoring and observability platforms such as Grafana, Prometheus, Elasticsearch, Jaeger Experience with cloud architectures such as GCP or AWS Familiarity with SQL databases and broker systems such as Kafka You are a solution-oriented professional with a passion for problem-solving You take pride in ensuring systems are performant, stable and efficient You thrive in a collaborative environment Continuous learning is important to you and you actively explore new tools and techniques. You are curiosity-driven and are constantly seeking new ways to improve processes and implement new modern solutions You are committed to ensuring quality is at the heart of every project. About Us Bumble Inc. is the parent company of Bumble, Badoo, Fruitz and Official. The Bumble platform enables people to build healthy and equitable relationships, through kind connections. Founded by Whitney Wolfe Herd in 2014, Bumble was one of the first dating apps built with women at the centre and connects people across dating (Bumble Date), friendship (Bumble BFF) and professional networking (Bumble Bizz). Badoo, which was founded in 2006, is one of the pioneers of web and mobile dating products. Fruitz, founded in 2017, encourages open and honest communication of dating intentions through playful fruit metaphors. Official is an app for couples that promotes open and honest communication between partners and was founded in 2020.
Jan 30, 2025
Full time
Inclusion at Bumble Inc. Bumble Inc. is an equal opportunity employer and we strongly encourage people of all ages, colour, lesbian, gay, bisexual, transgender, queer and non-binary people, veterans, parents, people with disabilities, and neurodivergent people to apply. We're happy to make any reasonable adjustments that will help you feel more confident throughout the process, please don't hesitate to let us know how we can help. In your application, please feel free to note which pronouns you use (For example: she/her, he/him, they/them, etc). At Bumble, Site Reliability Engineers (SRE) are responsible for ensuring the reliability, scalability and performance of software systems while bridging the gap between development, security and operations. We proactively manage, automate, and safeguard our infrastructure to deliver a robust foundation for the business and an exceptional experience for our stakeholders. What you'll be doing Design and build new tools and services from the ground up to solve complex problems Build automation frameworks to streamline repetitive tasks Design and maintain scalable, highly available and fault-tolerant systems Build and maintain observability tooling including logging, monitoring, tracing and alerting systems Develop and maintain automation tooling to reduce manual intervention Implement infrastructure as code (IaC) for infrastructure provisioning. Monitor system health and performance, identifying and fixing issues Respond to system outages, troubleshooting root causes and implementing preventative measures Collaborate with engineering teams and security engineers to improve system reliability, security and performance Participate in on-call rotations Create and maintain documentation to improve knowledge sharing across teams About you Excellent problem solving, analytical skills Strong communication and collaboration skills are a must Proficiency in at least Python or Golang programming languages Experience with CI/CD pipelines Strong proficiency with Kubernetes architecture Prior experience in SRE, System administration or DevOps roles Strong proficiency with Linux/Unix operating systems, including hands-on experience in configuration and troubleshooting Proficiency with using Puppet for configuration management, automation and system provisioning Hands-on experience in monitoring and observability platforms such as Grafana, Prometheus, Elasticsearch, Jaeger Experience with cloud architectures such as GCP or AWS Familiarity with SQL databases and broker systems such as Kafka You are a solution-oriented professional with a passion for problem-solving You take pride in ensuring systems are performant, stable and efficient You thrive in a collaborative environment Continuous learning is important to you and you actively explore new tools and techniques. You are curiosity-driven and are constantly seeking new ways to improve processes and implement new modern solutions You are committed to ensuring quality is at the heart of every project. About Us Bumble Inc. is the parent company of Bumble, Badoo, Fruitz and Official. The Bumble platform enables people to build healthy and equitable relationships, through kind connections. Founded by Whitney Wolfe Herd in 2014, Bumble was one of the first dating apps built with women at the centre and connects people across dating (Bumble Date), friendship (Bumble BFF) and professional networking (Bumble Bizz). Badoo, which was founded in 2006, is one of the pioneers of web and mobile dating products. Fruitz, founded in 2017, encourages open and honest communication of dating intentions through playful fruit metaphors. Official is an app for couples that promotes open and honest communication between partners and was founded in 2020.
ARCUS SEARCH LIMITED
Lead Site Reliability Engineer
ARCUS SEARCH LIMITED
Location: London - Hybrid Type: Full-time/Permanent A Data FinTech client of ours are looking for a Lead Site Reliability Engineer to join the existing team and work on exciting, new technology including kubernetes as well as gaining exposure to data systems and supporting the company in building out their brand-new data platform. What you will be doing: You will design, operate and support the infrastructure, middleware and internal services, while seeking to improve their performance, availability, scalability, latency and efficiency You will be driving technical excellence across the business, following SRE best-practices You will be working alongside development teams to develop and design scalable and high available services and establish effective build framework for continuous deployment and self-service automation You will also work on incident resolution and engage with various teams (including 3rd parties) for support escalation. Experience you need: You need to be strong in Amazon AWS Cloud, including services such as: EC2, S3, ELB, RDS, IAM, Route 53, Auto Scaling Groups, Lambda, Cloud Watch, Cloud Formation and Security Groups Having expertise with containerisation within Kubernetes and Docker and a familiarity with the pattern of Microservice Architecture will be needed. You'll also need to be able to define container configuration and troubleshoot You'll need to be experienced with configuration management technologies including Terraform and Ansible, as well as associated paradigms such as IaC and Immutable Infrastructure CI/CD - You need to be comfortable with build pipelines in e.g. TeamCity/ Jenkins/ Concourse You must have hands-on experience developing in one or more programming or scripting languages (e.g. PowerShell, Bash, Python, JavaScript, Golang, Java), within an SCM environment (e.g. Bitbucket, GitHub). Networking - must have knowledge of routing & switching protocols as well as DNS, firewalling, load-balancing and global traffic management. Persistence technologies - you need to be familiar with database technologies (NoSQL/SQL) and broker/ queuing technologies, including knowledge of HA/ clustering. You need to be Familiar with various logging, monitoring and alerting platforms - expertise in the usage (and, desirably, the deployment) of e.g. ELK, Splunk, CloudWatch, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/ SLO) and capacity planning Linux & Windows systems administration in multiple distributions, including storage management (e.g. LVM, RAID) and security practices e.g. SSH, SSL/TLS, HMAC, IPS/IDS 3-4 years experience in a similar role is required Experience working within a FinTech company is desirable This is an exciting time within this company as they are embarking on a huge growth period across the entire business, particularly within the Data and Analytics function to support the development of their brand-new Data Platform.
Dec 01, 2022
Full time
Location: London - Hybrid Type: Full-time/Permanent A Data FinTech client of ours are looking for a Lead Site Reliability Engineer to join the existing team and work on exciting, new technology including kubernetes as well as gaining exposure to data systems and supporting the company in building out their brand-new data platform. What you will be doing: You will design, operate and support the infrastructure, middleware and internal services, while seeking to improve their performance, availability, scalability, latency and efficiency You will be driving technical excellence across the business, following SRE best-practices You will be working alongside development teams to develop and design scalable and high available services and establish effective build framework for continuous deployment and self-service automation You will also work on incident resolution and engage with various teams (including 3rd parties) for support escalation. Experience you need: You need to be strong in Amazon AWS Cloud, including services such as: EC2, S3, ELB, RDS, IAM, Route 53, Auto Scaling Groups, Lambda, Cloud Watch, Cloud Formation and Security Groups Having expertise with containerisation within Kubernetes and Docker and a familiarity with the pattern of Microservice Architecture will be needed. You'll also need to be able to define container configuration and troubleshoot You'll need to be experienced with configuration management technologies including Terraform and Ansible, as well as associated paradigms such as IaC and Immutable Infrastructure CI/CD - You need to be comfortable with build pipelines in e.g. TeamCity/ Jenkins/ Concourse You must have hands-on experience developing in one or more programming or scripting languages (e.g. PowerShell, Bash, Python, JavaScript, Golang, Java), within an SCM environment (e.g. Bitbucket, GitHub). Networking - must have knowledge of routing & switching protocols as well as DNS, firewalling, load-balancing and global traffic management. Persistence technologies - you need to be familiar with database technologies (NoSQL/SQL) and broker/ queuing technologies, including knowledge of HA/ clustering. You need to be Familiar with various logging, monitoring and alerting platforms - expertise in the usage (and, desirably, the deployment) of e.g. ELK, Splunk, CloudWatch, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/ SLO) and capacity planning Linux & Windows systems administration in multiple distributions, including storage management (e.g. LVM, RAID) and security practices e.g. SSH, SSL/TLS, HMAC, IPS/IDS 3-4 years experience in a similar role is required Experience working within a FinTech company is desirable This is an exciting time within this company as they are embarking on a huge growth period across the entire business, particularly within the Data and Analytics function to support the development of their brand-new Data Platform.
ARCUS SEARCH LIMITED
Senior Site Reliability Engineer
ARCUS SEARCH LIMITED
Location: London - Hybrid Type: Full-time/Permanent A Data FinTech client of ours are looking for a Senior Site Reliability Engineer to join the existing team and work on exciting, new technology including kubernetes as well as gaining exposure to data systems and supporting the company in building out their brand-new data platform. What you will be doing: You will design, operate and support the infrastructure, middleware and internal services, while seeking to improve their performance, availability, scalability, latency and efficiency You will be driving technical excellence across the business, following SRE best-practices You will be working alongside development teams to develop and design scalable and high available services and establish effective build framework for continuous deployment and self-service automation You will also work on incident resolution and engage with various teams (including 3rd parties) for support escalation. Experience you need: You need to be strong in Amazon AWS Cloud, including services such as: EC2, S3, ELB, RDS, IAM, Route 53, Auto Scaling Groups, Lambda, Cloud Watch, Cloud Formation and Security Groups Having expertise with containerisation within Kubernetes and Docker and a familiarity with the pattern of Microservice Architecture will be needed. You'll also need to be able to define container configuration and troubleshoot You'll need to be experienced with configuration management technologies including Terraform and Ansible, as well as associated paradigms such as IaC and Immutable Infrastructure CI/CD - You need to be comfortable with build pipelines in e.g. TeamCity/ Jenkins/ Concourse You must have hands-on experience developing in one or more programming or scripting languages (e.g. PowerShell, Bash, Python, JavaScript, Golang, Java), within an SCM environment (e.g. Bitbucket, GitHub). Networking - must have knowledge of routing & switching protocols as well as DNS, firewalling, load-balancing and global traffic management. Persistence technologies - you need to be familiar with database technologies (NoSQL/SQL) and broker/ queuing technologies, including knowledge of HA/ clustering. You need to be Familiar with various logging, monitoring and alerting platforms - expertise in the usage (and, desirably, the deployment) of e.g. ELK, Splunk, CloudWatch, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/ SLO) and capacity planning Linux & Windows systems administration in multiple distributions, including storage management (e.g. LVM, RAID) and security practices e.g. SSH, SSL/TLS, HMAC, IPS/IDS 3-4 years experience in a similar role is required Experience working within a FinTech company is desirable This is an exciting time within this company as they are embarking on a huge growth period across the entire business, particularly within the Data and Analytics function to support the development of their brand-new Data Platform.
Dec 01, 2022
Full time
Location: London - Hybrid Type: Full-time/Permanent A Data FinTech client of ours are looking for a Senior Site Reliability Engineer to join the existing team and work on exciting, new technology including kubernetes as well as gaining exposure to data systems and supporting the company in building out their brand-new data platform. What you will be doing: You will design, operate and support the infrastructure, middleware and internal services, while seeking to improve their performance, availability, scalability, latency and efficiency You will be driving technical excellence across the business, following SRE best-practices You will be working alongside development teams to develop and design scalable and high available services and establish effective build framework for continuous deployment and self-service automation You will also work on incident resolution and engage with various teams (including 3rd parties) for support escalation. Experience you need: You need to be strong in Amazon AWS Cloud, including services such as: EC2, S3, ELB, RDS, IAM, Route 53, Auto Scaling Groups, Lambda, Cloud Watch, Cloud Formation and Security Groups Having expertise with containerisation within Kubernetes and Docker and a familiarity with the pattern of Microservice Architecture will be needed. You'll also need to be able to define container configuration and troubleshoot You'll need to be experienced with configuration management technologies including Terraform and Ansible, as well as associated paradigms such as IaC and Immutable Infrastructure CI/CD - You need to be comfortable with build pipelines in e.g. TeamCity/ Jenkins/ Concourse You must have hands-on experience developing in one or more programming or scripting languages (e.g. PowerShell, Bash, Python, JavaScript, Golang, Java), within an SCM environment (e.g. Bitbucket, GitHub). Networking - must have knowledge of routing & switching protocols as well as DNS, firewalling, load-balancing and global traffic management. Persistence technologies - you need to be familiar with database technologies (NoSQL/SQL) and broker/ queuing technologies, including knowledge of HA/ clustering. You need to be Familiar with various logging, monitoring and alerting platforms - expertise in the usage (and, desirably, the deployment) of e.g. ELK, Splunk, CloudWatch, to enable forensic log analysis and system tuning as well as data-driven performance analysis (i.e. SLI/ SLO) and capacity planning Linux & Windows systems administration in multiple distributions, including storage management (e.g. LVM, RAID) and security practices e.g. SSH, SSL/TLS, HMAC, IPS/IDS 3-4 years experience in a similar role is required Experience working within a FinTech company is desirable This is an exciting time within this company as they are embarking on a huge growth period across the entire business, particularly within the Data and Analytics function to support the development of their brand-new Data Platform.

Modal Window

  • Home
  • Contact
  • About Us
  • Terms & Conditions
  • Privacy
  • Employer
  • Post a Job
  • Search Resumes
  • Sign in
  • Job Seeker
  • Find Jobs
  • Create Resume
  • Sign in
  • Facebook
  • Twitter
  • Google Plus
  • LinkedIn
Parent and Partner sites: IT Job Board | Jobs Near Me | RightTalent.co.uk | Quantity Surveyor jobs | Building Surveyor jobs | Construction Recruitment | Talent Recruiter | Construction Job Board | Property jobs | myJobsnearme.com | Jobs near me
© 2008-2025 Jobsite Jobs | Designed by Web Design Agency