Anson McCade Gloucester, Gloucestershire

Site Reliability Engineer (SRE) £65,000 base salary Role Overview Our National Security business in Gloucester is expanding, creating vital opportunities to support National Security clients through innovative technical solutions. We are looking for a Site Reliability Engineer to join a growing team that prioritizes both client delivery and community engagement, helping to build tech and cyber skills click apply for full job details

Apr 05, 2026

Full time

Site Reliability Engineer (SRE) £65,000 base salary Role Overview Our National Security business in Gloucester is expanding, creating vital opportunities to support National Security clients through innovative technical solutions. We are looking for a Site Reliability Engineer to join a growing team that prioritizes both client delivery and community engagement, helping to build tech and cyber skills click apply for full job details

Site Reliability Engineering (SRE) Manager

Halian Technology Limited

Senior Site Reliability Engineer (SRE) UK Remote Permanent Up to £120,000 Fully Remote (UK Only) This Is NOT a DevOps Role Real SRE Work Only Were looking for a true Senior Site Reliability Engineer with deep incident management experience, strong operational ownership, and expert Linux/AWS troubleshooting skills. This role is focused entirely on reliability, availability, incident response, click apply for full job details

Apr 05, 2026

Full time

Senior Site Reliability Engineer (SRE) UK Remote Permanent Up to £120,000 Fully Remote (UK Only) This Is NOT a DevOps Role Real SRE Work Only Were looking for a true Senior Site Reliability Engineer with deep incident management experience, strong operational ownership, and expert Linux/AWS troubleshooting skills. This role is focused entirely on reliability, availability, incident response, click apply for full job details

Network SRE - DC - Network WAN

Technopride Ltd

Overview Title of role - Network SRE - DC - Network WAN Location - London Onsite Employment Type - Contract - 6 Months Job Description Job Description: Senior Network SRE (London) Role Overview: We are seeking a highly experienced Senior Network Site Reliability Engineer (SRE) to join our global network operations team. This role is critical in ensuring the reliability, scalability, and performance of our network infrastructure. You will lead incident responses, troubleshoot complex issues, and drive automation initiatives to maintain world-class network services. Required Skills Minimum 10 years' hands-on experience in network engineering and operations. Deep expertise in routing, switching, firewalling, and wireless across multiple vendors. Strong troubleshooting skills, including overlay/underlay network understanding. Proficiency in Linux/Unix environments. Experience with automation and monitoring platforms. Ability to work independently, set technical direction, and mentor others. Tools Netbox/Nautobot Prometheus / VictoriaMetrics Salt Networking (either one of the following) EVPN Segment routing (although suitable MPLS depth on resume acceptable) Key Responsibilities Lead Incident Management: Own and resolve critical network incidents, manage outages, and provide expert guidance during high-pressure situations. Advanced Troubleshooting: Diagnose and resolve complex issues across routing, switching, firewalling, and wireless domains. Technical Leadership: Set technical direction, mentor junior engineers, and foster a culture of operational excellence. 24/7 Operations: Participate in a shift-based model to ensure continuous availability of critical network services. Multi-Vendor Expertise: Operate across diverse environments including Arista, Cisco, Cumulus, Spectrum Ethernet, InfiniBand, Palo Alto, Check Point, Mist, Aruba, A10, Netscaler, and F5. Security & Segmentation: Support network segmentation, policy enforcement, and VPN solutions (GlobalProtect, AnyConnect). Automation & Observability: Utilize tools like Grafana, Big Panda, ServiceNow, ITMP, syslog, Splunk, Salt, Ansible, and Prometheus to enhance monitoring and automation. Innovation Projects: Collaborate on wireless design and AI cluster deployments to support cutting-edge initiatives. Preferred Skills Experience with InfiniBand and AI cluster deployments. Familiarity with network asset management systems (e.g., Nautobot). Wireless design experience with Cisco, Mist, Aruba.

Apr 03, 2026

Full time

Overview Title of role - Network SRE - DC - Network WAN Location - London Onsite Employment Type - Contract - 6 Months Job Description Job Description: Senior Network SRE (London) Role Overview: We are seeking a highly experienced Senior Network Site Reliability Engineer (SRE) to join our global network operations team. This role is critical in ensuring the reliability, scalability, and performance of our network infrastructure. You will lead incident responses, troubleshoot complex issues, and drive automation initiatives to maintain world-class network services. Required Skills Minimum 10 years' hands-on experience in network engineering and operations. Deep expertise in routing, switching, firewalling, and wireless across multiple vendors. Strong troubleshooting skills, including overlay/underlay network understanding. Proficiency in Linux/Unix environments. Experience with automation and monitoring platforms. Ability to work independently, set technical direction, and mentor others. Tools Netbox/Nautobot Prometheus / VictoriaMetrics Salt Networking (either one of the following) EVPN Segment routing (although suitable MPLS depth on resume acceptable) Key Responsibilities Lead Incident Management: Own and resolve critical network incidents, manage outages, and provide expert guidance during high-pressure situations. Advanced Troubleshooting: Diagnose and resolve complex issues across routing, switching, firewalling, and wireless domains. Technical Leadership: Set technical direction, mentor junior engineers, and foster a culture of operational excellence. 24/7 Operations: Participate in a shift-based model to ensure continuous availability of critical network services. Multi-Vendor Expertise: Operate across diverse environments including Arista, Cisco, Cumulus, Spectrum Ethernet, InfiniBand, Palo Alto, Check Point, Mist, Aruba, A10, Netscaler, and F5. Security & Segmentation: Support network segmentation, policy enforcement, and VPN solutions (GlobalProtect, AnyConnect). Automation & Observability: Utilize tools like Grafana, Big Panda, ServiceNow, ITMP, syslog, Splunk, Salt, Ansible, and Prometheus to enhance monitoring and automation. Innovation Projects: Collaborate on wireless design and AI cluster deployments to support cutting-edge initiatives. Preferred Skills Experience with InfiniBand and AI cluster deployments. Familiarity with network asset management systems (e.g., Nautobot). Wireless design experience with Cisco, Mist, Aruba.

Senior Network SRE: Incident Response & Automation

Technopride Ltd

A technology company is seeking a highly experienced Senior Network Site Reliability Engineer (SRE) in London. This role focuses on ensuring the reliability and performance of network infrastructure, requiring a minimum of 10 years in network engineering. Responsibilities include managing critical network incidents, troubleshooting complex issues, and leading automation initiatives. The ideal candidate will have deep expertise across multiple vendors and technologies, ensuring operational excellence in a high-pressure environment.

Apr 03, 2026

Full time

A technology company is seeking a highly experienced Senior Network Site Reliability Engineer (SRE) in London. This role focuses on ensuring the reliability and performance of network infrastructure, requiring a minimum of 10 years in network engineering. Responsibilities include managing critical network incidents, troubleshooting complex issues, and leading automation initiatives. The ideal candidate will have deep expertise across multiple vendors and technologies, ensuring operational excellence in a high-pressure environment.

Software Architect

Autodesk, Inc.

Job Requisition ID # 26WD94957# Position OverviewIn this role, you will be responsible for designing and building the systems and services that power cross-cutting data capabilities across our organization and their integration with Autodesk's existing and future AEC products. A key focus area will be the design and evolution of AEC data schemas-the semantic models and data structures that enable seamless interoperability between design, construction, and operations workflows.This opportunity is for you if you have a passion for data modeling and schema design, experience with complex system architectures, and are excited by the idea of transforming how AEC professionals use data to capture knowledge, inform decisions, and deliver projects.As a Software Architect in the AEC Data team, you will be at the forefront of designing the next generation of capabilities for the Autodesk Forma Industry Cloud. You'll collaborate with software architects, domain experts, and product teams across Revit, Civil 3D, AutoCAD, Autodesk Construction Cloud, and Autodesk Forma. You'll define the software architecture that reimagines the continuous flow of AEC Data (e.g., 3D models, 2D drawings, issue tracking, cost, sensor streams, etc.) and information throughout the entire lifecycle of a built asset, from design and construction through operation and maintenance.This is an individual contributor role reporting to the Distinguished Architect, AEC Data team. Responsibilities System Architecture : Define and evolve cross-team architecture for data platforms and services in the AEC Data organization; design distributed systems including APIs, event streams, and data pipelines Schema Design & Evolution : Design, document, and evolve data schemas for AEC domain entities (buildings, infrastructure, spaces, systems, components, relationships) ensuring extensibility and backward compatibility; create semantic data models that capture AEC domain knowledge, including property sets, classification systems, and relationship hierarchies Schema Governance : Establish schema versioning strategies, deprecation policies, and migration patterns; maintain a schema registry and change management process; define schema validation rules, constraints, and quality checks Standards & Governance : Establish standards, reference architectures, and reusable components; lead architectural decision records (ADRs) and run design reviews across teams Interoperability : Define and implement schema mappings between industry standards (IFC, COBie, CityGML, LandXML) and Autodesk's internal data models API & Service Design : Design APIs and services optimized for REST, GraphQL, and gRPC interfaces; define serialization formats (JSON Schema, Avro, Protobuf) and ensure backward compatibility at scale Reliability & Performance : Ensure reliability, security, and performance of systems; define SLOs and drive observability (metrics, tracing, logging) Cross-Team Alignment : Partner across the AEC organization's product and platform teams to align roadmaps, gather requirements, validate against real-world use cases, and drive adoption Communication : Communicate architectures with clear views and diagrams (e.g., C4) and executive-ready narratives; produce comprehensive documentation and onboarding materials Minimum Qualifications Bachelor's degree in Computer Science, or equivalent experience 10+ years as a Software Architect in data-intensive cloud environments Experience with cloud services, API design, database architecture, big data tools and frameworks Strong understanding of data modeling and schema design, including proficiency in schema definition languages (JSON Schema, etc.) and experience with schema versioning, backward/forward compatibility, and evolution strategies Excellent knowledge of software design and architecture patterns Demonstrated ability to influence without authority and drive cross-team alignment Mastery of taking complex ideas and conveying them in a concise and impactful manner Excellent verbal, written communication, and presentation abilities to effectively communicate software architecture strategy to a variety of stakeholders Ability to collaborate with a global team Preferred Qualifications AEC Domain Knowledge : Experience in the Architecture, Engineering, and Construction industry; familiarity with industry standards (IFC, COBie, gbXML, CityGML, LandXML, UniFormat, OmniClass) Cloud Services : Experience with AWS strongly preferred (EC2, ECS, Lambda, API Gateway, S3, DynamoDB, RDS) Database Architecture : Experience with Snowflake, relational databases, NoSQL; understanding of data modeling best practices Event-Driven Architectures : Kafka or Kinesis; exactly-once processing; schema registries (Confluent, AWS Glue) API & Service Design : REST, gRPC, GraphQL; versioning and backward compatibility at scale Distributed Systems : Microservices, service mesh, event-driven architecture, stream processing Observability/SRE : OpenTelemetry, distributed tracing, metrics/SLOs for data services Semantic Technologies : Knowledge graphs, RDF/OWL, property graphs, or feature stores for ML (nice-to-have) BIM Expertise : Experience with Building Information Modeling concepts, Revit families, or similar parametric modeling systems About the TeamThe AEC Data team is building the foundational data infrastructure that powers Autodesk's vision for connected construction. We're creating the platforms, services, and unified data models that span the entire built asset lifecycle-from initial design concepts through construction, operations, and eventual renovation or decommissioning. Our work enables the seamless flow of information across Autodesk products and third-party integrations, helping AEC professionals make better decisions with better data. Why Join Us Shape the future of AEC data platforms and standards at an industry-leading company Design systems and schemas that will be used by millions of architects, engineers, and construction professionals worldwide Collaborate with world-class software architects and engineers across Autodesk's AEC portfolio Contribute to open standards and interoperability initiatives that benefit the entire industry Be part of a team that values technical excellence, innovation, and continuous learning Learn More About Autodesk Welcome to Autodesk! Amazing things are created every day with our software - from the greenest buildings and cleanest cars to the smartest factories and biggest hit movies. We help innovators turn their ideas into reality, transforming not only how things are made, but what can be made.We take great pride in our culture here at Autodesk - it's at the core of everything we do. Our culture guides the way we work and treat each other, informs how we connect with customers and partners, and defines how we show up in the world. Salary transparency Salary is one part of Autodesk's competitive compensation package. Offers are based on the candidate's experience and geographic location. In addition to base salaries, our compensation package may include annual cash bonuses, commissions for sales roles, stock grants, and a comprehensive benefits package. Diversity & Belonging We take pride in cultivating a culture of belonging where everyone can thrive. Learn more here:Please search for open jobs and apply internally (not on this external site).Autodesk's Architecture, Engineering, and Construction (AEC) Data organization is seeking an experienced Software Architect .

Apr 03, 2026

Full time

Job Requisition ID # 26WD94957# Position OverviewIn this role, you will be responsible for designing and building the systems and services that power cross-cutting data capabilities across our organization and their integration with Autodesk's existing and future AEC products. A key focus area will be the design and evolution of AEC data schemas-the semantic models and data structures that enable seamless interoperability between design, construction, and operations workflows.This opportunity is for you if you have a passion for data modeling and schema design, experience with complex system architectures, and are excited by the idea of transforming how AEC professionals use data to capture knowledge, inform decisions, and deliver projects.As a Software Architect in the AEC Data team, you will be at the forefront of designing the next generation of capabilities for the Autodesk Forma Industry Cloud. You'll collaborate with software architects, domain experts, and product teams across Revit, Civil 3D, AutoCAD, Autodesk Construction Cloud, and Autodesk Forma. You'll define the software architecture that reimagines the continuous flow of AEC Data (e.g., 3D models, 2D drawings, issue tracking, cost, sensor streams, etc.) and information throughout the entire lifecycle of a built asset, from design and construction through operation and maintenance.This is an individual contributor role reporting to the Distinguished Architect, AEC Data team. Responsibilities System Architecture : Define and evolve cross-team architecture for data platforms and services in the AEC Data organization; design distributed systems including APIs, event streams, and data pipelines Schema Design & Evolution : Design, document, and evolve data schemas for AEC domain entities (buildings, infrastructure, spaces, systems, components, relationships) ensuring extensibility and backward compatibility; create semantic data models that capture AEC domain knowledge, including property sets, classification systems, and relationship hierarchies Schema Governance : Establish schema versioning strategies, deprecation policies, and migration patterns; maintain a schema registry and change management process; define schema validation rules, constraints, and quality checks Standards & Governance : Establish standards, reference architectures, and reusable components; lead architectural decision records (ADRs) and run design reviews across teams Interoperability : Define and implement schema mappings between industry standards (IFC, COBie, CityGML, LandXML) and Autodesk's internal data models API & Service Design : Design APIs and services optimized for REST, GraphQL, and gRPC interfaces; define serialization formats (JSON Schema, Avro, Protobuf) and ensure backward compatibility at scale Reliability & Performance : Ensure reliability, security, and performance of systems; define SLOs and drive observability (metrics, tracing, logging) Cross-Team Alignment : Partner across the AEC organization's product and platform teams to align roadmaps, gather requirements, validate against real-world use cases, and drive adoption Communication : Communicate architectures with clear views and diagrams (e.g., C4) and executive-ready narratives; produce comprehensive documentation and onboarding materials Minimum Qualifications Bachelor's degree in Computer Science, or equivalent experience 10+ years as a Software Architect in data-intensive cloud environments Experience with cloud services, API design, database architecture, big data tools and frameworks Strong understanding of data modeling and schema design, including proficiency in schema definition languages (JSON Schema, etc.) and experience with schema versioning, backward/forward compatibility, and evolution strategies Excellent knowledge of software design and architecture patterns Demonstrated ability to influence without authority and drive cross-team alignment Mastery of taking complex ideas and conveying them in a concise and impactful manner Excellent verbal, written communication, and presentation abilities to effectively communicate software architecture strategy to a variety of stakeholders Ability to collaborate with a global team Preferred Qualifications AEC Domain Knowledge : Experience in the Architecture, Engineering, and Construction industry; familiarity with industry standards (IFC, COBie, gbXML, CityGML, LandXML, UniFormat, OmniClass) Cloud Services : Experience with AWS strongly preferred (EC2, ECS, Lambda, API Gateway, S3, DynamoDB, RDS) Database Architecture : Experience with Snowflake, relational databases, NoSQL; understanding of data modeling best practices Event-Driven Architectures : Kafka or Kinesis; exactly-once processing; schema registries (Confluent, AWS Glue) API & Service Design : REST, gRPC, GraphQL; versioning and backward compatibility at scale Distributed Systems : Microservices, service mesh, event-driven architecture, stream processing Observability/SRE : OpenTelemetry, distributed tracing, metrics/SLOs for data services Semantic Technologies : Knowledge graphs, RDF/OWL, property graphs, or feature stores for ML (nice-to-have) BIM Expertise : Experience with Building Information Modeling concepts, Revit families, or similar parametric modeling systems About the TeamThe AEC Data team is building the foundational data infrastructure that powers Autodesk's vision for connected construction. We're creating the platforms, services, and unified data models that span the entire built asset lifecycle-from initial design concepts through construction, operations, and eventual renovation or decommissioning. Our work enables the seamless flow of information across Autodesk products and third-party integrations, helping AEC professionals make better decisions with better data. Why Join Us Shape the future of AEC data platforms and standards at an industry-leading company Design systems and schemas that will be used by millions of architects, engineers, and construction professionals worldwide Collaborate with world-class software architects and engineers across Autodesk's AEC portfolio Contribute to open standards and interoperability initiatives that benefit the entire industry Be part of a team that values technical excellence, innovation, and continuous learning Learn More About Autodesk Welcome to Autodesk! Amazing things are created every day with our software - from the greenest buildings and cleanest cars to the smartest factories and biggest hit movies. We help innovators turn their ideas into reality, transforming not only how things are made, but what can be made.We take great pride in our culture here at Autodesk - it's at the core of everything we do. Our culture guides the way we work and treat each other, informs how we connect with customers and partners, and defines how we show up in the world. Salary transparency Salary is one part of Autodesk's competitive compensation package. Offers are based on the candidate's experience and geographic location. In addition to base salaries, our compensation package may include annual cash bonuses, commissions for sales roles, stock grants, and a comprehensive benefits package. Diversity & Belonging We take pride in cultivating a culture of belonging where everyone can thrive. Learn more here:Please search for open jobs and apply internally (not on this external site).Autodesk's Architecture, Engineering, and Construction (AEC) Data organization is seeking an experienced Software Architect .

Senior Software Engineer / SRE - Application Middleware London, GBR Posted today

Bloomberg L.P.

Senior Software Engineer / SRE - Application Middleware Location London Business Area Engineering and CTO Ref # Description & Requirements Are you passionate about building high-performance systems that are fast, resilient, and operate at global scale? Join Bloomberg's Application Middleware SRE team, where you'll combine software engineering and systems expertise to keep the backbone of the Bloomberg Terminal running smoothly for hundreds of thousands of users around the world. We're not your typical SRE team. We're embedded in a group that powers real-time connectivity, and we own systems where uptime isn't just important-it's essential to the global financial system. This is your opportunity to engineer resilience at scale, automate critical infrastructure, and shape reliability practices across one of the world's most powerful tech platforms. The Team We're the Site Reliability Engineering team within Bloomberg's Application Middleware group. Our mission: ensure that Bloomberg's core connectivity and messaging layers are resilient, scalable, and fully observable. We own systems that operate at high throughput and low latency, including: Gateways: Secure, high-performance TCP/SSL entry points to our data centers HFN & NSTP: A global HTTP CDN and SOCKS5 proxy network delivering fast access from any geography Playlist Services: Dynamic path configuration systems optimizing user connectivity in real-time PGM Relays: Infrastructure for reliable multicast data delivery We use automation, observability, and software engineering to detect issues before they impact customers and reduce manual toil wherever we can. What You'll Do Build production grade software that powers Bloomberg's global infrastructure Design and implement scalable, fault tolerant systems with a focus on observability, performance, and automation Collaborate across engineering teams to introduce automated, self service operational workflows Conduct deep systems analysis and root cause investigations for complex, distributed systems Propose and prototype innovative approaches to reliability and risk mitigation Contribute to design docs, runbooks, and post incident reviews-clear communication is part of the job You'll Need to Have A degree in Computer Science, Engineering, Mathematics, or equivalent practical experience Strong software engineering skills in any high-level language (we mainly use Python and C++) A deep understanding of software system reliability and risk management-including how to identify potential points of failure and design mitigation strategies. A good understanding of data structures, algorithms, and system design Experience navigating and improving large, distributed codebases An ability to identify system risks and engineer around points of failure Clear written and verbal communication, including technical documentation and incident analysis We'd Love to See We are building a team with a breadth of expertise and value depth in any of the following areas: Systems Knowledge: A strong grasp of operating systems, fundamental networking protocols (TCP, UDP, multicast), or core database concepts as they apply to modern infrastructure. Cluster Management: Experience with deployments, staging, and configuration management. Direct experience with Argo and/or Kubernetes or other Pipeline Management Platforms is a significant advantage. Machine Management at Scale: Experience with capacity planning and automating the lifecycle of large machine fleets. System Observability and Monitoring: Deep understanding of SLIs/SLOs/SLAs, alerting, and building dashboards for complex systems. Reliability in Distributed Systems: Knowledge of fault tolerance and the unique challenges of network and node failure in distributed environments. Mentoring: Proven experience mentoring and growing junior Engineers Discover what makes Bloomberg unique - watch our for an inside look at our culture, values, and the people behind our success. Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law. Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email

Apr 03, 2026

Full time

Senior Software Engineer / SRE - Application Middleware Location London Business Area Engineering and CTO Ref # Description & Requirements Are you passionate about building high-performance systems that are fast, resilient, and operate at global scale? Join Bloomberg's Application Middleware SRE team, where you'll combine software engineering and systems expertise to keep the backbone of the Bloomberg Terminal running smoothly for hundreds of thousands of users around the world. We're not your typical SRE team. We're embedded in a group that powers real-time connectivity, and we own systems where uptime isn't just important-it's essential to the global financial system. This is your opportunity to engineer resilience at scale, automate critical infrastructure, and shape reliability practices across one of the world's most powerful tech platforms. The Team We're the Site Reliability Engineering team within Bloomberg's Application Middleware group. Our mission: ensure that Bloomberg's core connectivity and messaging layers are resilient, scalable, and fully observable. We own systems that operate at high throughput and low latency, including: Gateways: Secure, high-performance TCP/SSL entry points to our data centers HFN & NSTP: A global HTTP CDN and SOCKS5 proxy network delivering fast access from any geography Playlist Services: Dynamic path configuration systems optimizing user connectivity in real-time PGM Relays: Infrastructure for reliable multicast data delivery We use automation, observability, and software engineering to detect issues before they impact customers and reduce manual toil wherever we can. What You'll Do Build production grade software that powers Bloomberg's global infrastructure Design and implement scalable, fault tolerant systems with a focus on observability, performance, and automation Collaborate across engineering teams to introduce automated, self service operational workflows Conduct deep systems analysis and root cause investigations for complex, distributed systems Propose and prototype innovative approaches to reliability and risk mitigation Contribute to design docs, runbooks, and post incident reviews-clear communication is part of the job You'll Need to Have A degree in Computer Science, Engineering, Mathematics, or equivalent practical experience Strong software engineering skills in any high-level language (we mainly use Python and C++) A deep understanding of software system reliability and risk management-including how to identify potential points of failure and design mitigation strategies. A good understanding of data structures, algorithms, and system design Experience navigating and improving large, distributed codebases An ability to identify system risks and engineer around points of failure Clear written and verbal communication, including technical documentation and incident analysis We'd Love to See We are building a team with a breadth of expertise and value depth in any of the following areas: Systems Knowledge: A strong grasp of operating systems, fundamental networking protocols (TCP, UDP, multicast), or core database concepts as they apply to modern infrastructure. Cluster Management: Experience with deployments, staging, and configuration management. Direct experience with Argo and/or Kubernetes or other Pipeline Management Platforms is a significant advantage. Machine Management at Scale: Experience with capacity planning and automating the lifecycle of large machine fleets. System Observability and Monitoring: Deep understanding of SLIs/SLOs/SLAs, alerting, and building dashboards for complex systems. Reliability in Distributed Systems: Knowledge of fault tolerance and the unique challenges of network and node failure in distributed environments. Mentoring: Proven experience mentoring and growing junior Engineers Discover what makes Bloomberg unique - watch our for an inside look at our culture, values, and the people behind our success. Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law. Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email

Principal Site Reliability Engineer

Orgvue Limited

Overview Orgvue is a leading organizational design and planning software platform that captures the power of data visualization and modelling to build more adaptable, and better performing organizations. HR, finance and business leaders use Orgvue for actionable insight and analysis that helps them make faster workforce decisions in a constantly changing world. Orgvue is used by the world's largest and best-known enterprises and management consulting firms to visualize and confidently build the businesses they want tomorrow, today. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney. Role We are seeking a Principal Site Reliability Engineer who will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. Responsibilities Define and enforce SLOs, SLIs, and error budgets across critical services Crafting and implementing a cloud infrastructure and tooling strategy Work across our Org to level up SRE practices Help implement robust observability metrics, logs & traces using our observability tool Guide the team in building automated, self-healing systems Own and evolve our incident response processes, including on-call practices and post-mortem culture Mentor engineers across the org on best practices in reliability, operational readiness, and scalable infrastructure Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, CloudFormation and GitOps practices Collaborate closely with security, DevOps, and software teams to ensure compliance, scalability, and operational excellence Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform Qualifications Demonstrable experience leading SRE transformations Deep hands-on expertise with Kubernetes (EKS preferred) in production environments Strong experience withAWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.) Expert in Infrastructure as Code using tools such as Terraform, with knowledge of GitOps workflows Strong background in observability: metrics, visualization, logging, and tracing Understanding of automation, SDLC, CI/CD pipelines, deployment automation, and blue/green or canary releases Proven experience with incident management, disaster recovery planning, root cause analysis, and post-incident reviews Benefits Hybrid working - 1+ days a week in the London office Wellbeing: Sanctus Coaching, Virtual fitness sessions, Wellbeing webinars, Annual Wellbeing day Subsidised Gym Membership Private Medical Insurance (including Dental and Vision) and Life Assurance 25 days holiday (increasing to 30 days at a rate of 1 extra day per year) Summer Fridays (half-day Fridays for the months of July and August) Employer pension contribution of 5% of your gross salary, if you contribute a minimum of 3% Season ticket Loan Cycle to Work Scheme Annual Discretionary Bonus Here at Orgvue we promote individualism and a diverse workforce to build on our future success

Apr 03, 2026

Full time

Overview Orgvue is a leading organizational design and planning software platform that captures the power of data visualization and modelling to build more adaptable, and better performing organizations. HR, finance and business leaders use Orgvue for actionable insight and analysis that helps them make faster workforce decisions in a constantly changing world. Orgvue is used by the world's largest and best-known enterprises and management consulting firms to visualize and confidently build the businesses they want tomorrow, today. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney. Role We are seeking a Principal Site Reliability Engineer who will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. Responsibilities Define and enforce SLOs, SLIs, and error budgets across critical services Crafting and implementing a cloud infrastructure and tooling strategy Work across our Org to level up SRE practices Help implement robust observability metrics, logs & traces using our observability tool Guide the team in building automated, self-healing systems Own and evolve our incident response processes, including on-call practices and post-mortem culture Mentor engineers across the org on best practices in reliability, operational readiness, and scalable infrastructure Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, CloudFormation and GitOps practices Collaborate closely with security, DevOps, and software teams to ensure compliance, scalability, and operational excellence Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform Qualifications Demonstrable experience leading SRE transformations Deep hands-on expertise with Kubernetes (EKS preferred) in production environments Strong experience withAWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.) Expert in Infrastructure as Code using tools such as Terraform, with knowledge of GitOps workflows Strong background in observability: metrics, visualization, logging, and tracing Understanding of automation, SDLC, CI/CD pipelines, deployment automation, and blue/green or canary releases Proven experience with incident management, disaster recovery planning, root cause analysis, and post-incident reviews Benefits Hybrid working - 1+ days a week in the London office Wellbeing: Sanctus Coaching, Virtual fitness sessions, Wellbeing webinars, Annual Wellbeing day Subsidised Gym Membership Private Medical Insurance (including Dental and Vision) and Life Assurance 25 days holiday (increasing to 30 days at a rate of 1 extra day per year) Summer Fridays (half-day Fridays for the months of July and August) Employer pension contribution of 5% of your gross salary, if you contribute a minimum of 3% Season ticket Loan Cycle to Work Scheme Annual Discretionary Bonus Here at Orgvue we promote individualism and a diverse workforce to build on our future success

Site Reliabilty Engineer / SRE

Partnerscale Manchester, Lancashire

Site Reliability Engineer / SRE Manchester City Centre / Hybrid £55k - £65k DOE + Bonus We are working with a leading Tech Brand in Manchester who operate with a large and mature in-house engineering function responsible for a platform used by Millions of consumers, daily. It is a company that takes its technology seriously, with mission-critical systems that need to perform reliably at all times. They are looking for a Site Reliability Engineer to join a well-funded, settled SRE team, working across multiple engineering squads to build tooling and automation, improve observability practices and drive a culture of continuous improvement. It's a great opportunity to do genuinely impactful work within a business that invests heavily in its people and its technology. Our ideal SRE / Platform Engineer will have a Software Development background and around 5 years+ commercial experience in a similar role. Key requirements: A software development background, with commercial experience writing and contributing to production code Strong SRE knowledge across SLIs, SLOs and reliability frameworks Hands-on experience with Splunk, New Relic, Grafana etc Experience with IaC tools including Ansible or Terraform Background in a large-scale, 24/7 enterprise environment Interest in Platform Engineering and modern observability practices If you're a passionate SRE looking for a step up into a well-resourced, fast-paced environment, apply now.

Apr 03, 2026

Full time

Site Reliability Engineer / SRE Manchester City Centre / Hybrid £55k - £65k DOE + Bonus We are working with a leading Tech Brand in Manchester who operate with a large and mature in-house engineering function responsible for a platform used by Millions of consumers, daily. It is a company that takes its technology seriously, with mission-critical systems that need to perform reliably at all times. They are looking for a Site Reliability Engineer to join a well-funded, settled SRE team, working across multiple engineering squads to build tooling and automation, improve observability practices and drive a culture of continuous improvement. It's a great opportunity to do genuinely impactful work within a business that invests heavily in its people and its technology. Our ideal SRE / Platform Engineer will have a Software Development background and around 5 years+ commercial experience in a similar role. Key requirements: A software development background, with commercial experience writing and contributing to production code Strong SRE knowledge across SLIs, SLOs and reliability frameworks Hands-on experience with Splunk, New Relic, Grafana etc Experience with IaC tools including Ansible or Terraform Background in a large-scale, 24/7 enterprise environment Interest in Platform Engineering and modern observability practices If you're a passionate SRE looking for a step up into a well-resourced, fast-paced environment, apply now.

Hybrid Infrastructure Engineer (DevOps / SRE)

Guidant Global

Hybrid Infrastructure Engineer (DevOps / SRE) Location: London Contract Type: 12 months About the Role Guidant Global is supporting our client in hiring a Hybrid Infrastructure Engineer with strong DevOps or Site Reliability Engineering experience. This role bridges traditional on premises IT operations with modern cloud native engineering, ensuring seamless integration, automation and performance optimisation across hybrid environments. If you're passionate about resilient systems, cloud infrastructure, automation and scaling platforms responsibly - this is a role where you'll have a huge impact. Key Responsibilities Infrastructure Management: Design, deploy and maintain hybrid infrastructure solutions across cloud platforms and on premises environments CI/CD Automation: Build and manage automated delivery pipelines to support rapid development cycles Monitoring & Reliability: Implement and maintain monitoring, logging and alerting systems to ensure availability, performance and early issue detection Security & Compliance: Apply robust security best practices across all environments and ensure compliance with industry standards Infrastructure as Code: Develop, maintain and version control infrastructure using Terraform or similar IaC tooling Partner with development and product teams to improve system reliability, deployment speed and operational efficiency Participate in on call rotations, responding to outages and performance incidents Maintain clear, comprehensive documentation for systems, processes and configurations Top Skills (in order of priority) Background in DevOps, SRE, or Systems Administration Strong hands on experience with both cloud platforms and on premises infrastructure Proficiency in scripting languages (Python, Bash, PowerShell, etc.) Experience with containerisation and orchestration (Docker, Kubernetes, etc.) Familiarity with CI/CD pipelines and modern deployment practices Solid understanding of networking - firewalls, VPNs, load balancers, routing fundamentals Experience using monitoring & observability tools (Prometheus, Grafana, ELK, CloudWatch, etc.) Required Skills & Experience Strong experience designing, operating and troubleshooting hybrid infrastructure Deep knowledge of cloud platforms (AWS, Azure, GCP) Experience using Terraform or similar tools for IaC Hands on experience with CI/CD tools such as GitLab CI, GitHub Actions, Jenkins or Azure DevOps Strong scripting skills for automation and operational tooling Understanding of security, compliance and best practice governance Ability to work cross functionally with engineering and product teams Nice to Have Cloud certifications (AWS/Azure/GCP) Experience with hybrid cloud solutions and migration strategies Familiarity with service mesh, zero trust architectures or advanced networking models Experience working in regulated industries (FS, government, healthcare, defence, etc.) Who You Are You're a versatile engineer who thrives in hybrid environments, combining reliability engineering discipline with cloud native mindset. You're collaborative, calm under pressure, and excited about building systems that are secure, scalable and resilient.

Apr 02, 2026

Full time

Hybrid Infrastructure Engineer (DevOps / SRE) Location: London Contract Type: 12 months About the Role Guidant Global is supporting our client in hiring a Hybrid Infrastructure Engineer with strong DevOps or Site Reliability Engineering experience. This role bridges traditional on premises IT operations with modern cloud native engineering, ensuring seamless integration, automation and performance optimisation across hybrid environments. If you're passionate about resilient systems, cloud infrastructure, automation and scaling platforms responsibly - this is a role where you'll have a huge impact. Key Responsibilities Infrastructure Management: Design, deploy and maintain hybrid infrastructure solutions across cloud platforms and on premises environments CI/CD Automation: Build and manage automated delivery pipelines to support rapid development cycles Monitoring & Reliability: Implement and maintain monitoring, logging and alerting systems to ensure availability, performance and early issue detection Security & Compliance: Apply robust security best practices across all environments and ensure compliance with industry standards Infrastructure as Code: Develop, maintain and version control infrastructure using Terraform or similar IaC tooling Partner with development and product teams to improve system reliability, deployment speed and operational efficiency Participate in on call rotations, responding to outages and performance incidents Maintain clear, comprehensive documentation for systems, processes and configurations Top Skills (in order of priority) Background in DevOps, SRE, or Systems Administration Strong hands on experience with both cloud platforms and on premises infrastructure Proficiency in scripting languages (Python, Bash, PowerShell, etc.) Experience with containerisation and orchestration (Docker, Kubernetes, etc.) Familiarity with CI/CD pipelines and modern deployment practices Solid understanding of networking - firewalls, VPNs, load balancers, routing fundamentals Experience using monitoring & observability tools (Prometheus, Grafana, ELK, CloudWatch, etc.) Required Skills & Experience Strong experience designing, operating and troubleshooting hybrid infrastructure Deep knowledge of cloud platforms (AWS, Azure, GCP) Experience using Terraform or similar tools for IaC Hands on experience with CI/CD tools such as GitLab CI, GitHub Actions, Jenkins or Azure DevOps Strong scripting skills for automation and operational tooling Understanding of security, compliance and best practice governance Ability to work cross functionally with engineering and product teams Nice to Have Cloud certifications (AWS/Azure/GCP) Experience with hybrid cloud solutions and migration strategies Familiarity with service mesh, zero trust architectures or advanced networking models Experience working in regulated industries (FS, government, healthcare, defence, etc.) Who You Are You're a versatile engineer who thrives in hybrid environments, combining reliability engineering discipline with cloud native mindset. You're collaborative, calm under pressure, and excited about building systems that are secure, scalable and resilient.

Hybrid Cloud DevOps & SRE Engineer

Guidant Global

A global recruitment agency is seeking a Hybrid Infrastructure Engineer in London for a 12-month contract. The role requires strong DevOps or Site Reliability Engineering experience, focusing on integrating and automating hybrid infrastructure solutions. Key responsibilities include maintaining infrastructure, implementing CI/CD automation, and ensuring security compliance. Ideal candidates will have hands-on experience with cloud platforms and strong scripting skills. This role is hybrid and offers the chance to work in a challenging environment, contributing to resilient and scalable systems.

Apr 02, 2026

Full time

A global recruitment agency is seeking a Hybrid Infrastructure Engineer in London for a 12-month contract. The role requires strong DevOps or Site Reliability Engineering experience, focusing on integrating and automating hybrid infrastructure solutions. Key responsibilities include maintaining infrastructure, implementing CI/CD automation, and ensuring security compliance. Ideal candidates will have hands-on experience with cloud platforms and strong scripting skills. This role is hybrid and offers the chance to work in a challenging environment, contributing to resilient and scalable systems.

Principle Site Reliability Engineer

83Zero Ltd

Principal Site Reliability Engineer - Active SC Required! Up to £100,000 + benefits Wokingham - Hybrid (UK-based) We're looking for a Principal Site Reliability Engineer to provide technical leadership across large-scale, complex platforms. This is a strategic role where you'll shape reliability engineering practices, influence architecture, and drive operational excellence across the organisation. What you'll be doing: Defining and driving SRE strategy, standards, and best practices Architecting highly resilient, scalable, and secure systems Leading major incident reviews and driving organisational improvements Influencing platform and application design at an architectural level Championing automation, self-healing systems, and reliability by design Acting as a mentor and technical authority across multiple teams What we're looking for: Extensive experience in SRE, DevOps, or platform engineering Proven track record of designing and operating large-scale distributed systems Deep expertise in cloud platforms and cloud-native architectures Strong experience with Kubernetes, infrastructure as code, and automation Excellent stakeholder management and leadership skills Ability to operate at both strategic and hands-on levels Why apply? High-impact role with influence across engineering and architecture Opportunity to shape reliability strategy at scale Work with cutting-edge technologies in a complex environment

Apr 02, 2026

Full time

Principal Site Reliability Engineer - Active SC Required! Up to £100,000 + benefits Wokingham - Hybrid (UK-based) We're looking for a Principal Site Reliability Engineer to provide technical leadership across large-scale, complex platforms. This is a strategic role where you'll shape reliability engineering practices, influence architecture, and drive operational excellence across the organisation. What you'll be doing: Defining and driving SRE strategy, standards, and best practices Architecting highly resilient, scalable, and secure systems Leading major incident reviews and driving organisational improvements Influencing platform and application design at an architectural level Championing automation, self-healing systems, and reliability by design Acting as a mentor and technical authority across multiple teams What we're looking for: Extensive experience in SRE, DevOps, or platform engineering Proven track record of designing and operating large-scale distributed systems Deep expertise in cloud platforms and cloud-native architectures Strong experience with Kubernetes, infrastructure as code, and automation Excellent stakeholder management and leadership skills Ability to operate at both strategic and hands-on levels Why apply? High-impact role with influence across engineering and architecture Opportunity to shape reliability strategy at scale Work with cutting-edge technologies in a complex environment

Principal Developer Team Lead

Cambridge University Press & Assessment Cambridge, Cambridgeshire

Job Title: Principal Developer Team Lead Salary: £51,400 - £68,800 Location: Cambridge/Hybrid Contract: Permanent This Principal Developer Team Lead position offers a pivotal opportunity to shape the technical future of a world-renowned academic organisation. You'll spearhead the migration of enterprise systems to cutting-edge cloud-native AWS architectures, while balancing hands-on technical leadership with people management responsibilities. We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge. About the role We're seeking a hands-on Principal Developer Team Lead to drive the technical transformation of our Exam Technology Organisation as we migrate legacy enterprise applications to modern, cloud-native architectures on AWS. You'll balance technical leadership with people management, leading a team of 4-8 developers while establishing the foundations for our future technology stack. Your initial focus will be on two strategic priorities: Evolving our SRE function - Building the DevOps infrastructure, automation, and tooling that enables Site Reliability Engineering practices across development and operations teams Advancing our AI development practice - Establishing standards, frameworks, and best practices for responsibly integrating AI capabilities into our education platforms. What You'll Do Technical Leadership Lead migration of legacy applications to cloud-native AWS architectures Build DevOps automation to support SRE practices Establish AI/ML development standards and frameworks Set observability, monitoring, and incident response standards Promote best practices in web, event-driven, and cloud-native technologies Provide technical expertise and oversee code reviews People Leadership Manage and mentor a team of 4-8 developers, providing coaching, development plan Identifying training needs in AI/ML and SRE. Support recruitment and foster a culture of continual improvement and wellbeing. Delivery & Collaboration Deliver software in agile squads Collaborate with architects, SREs, product owners, and infrastructure teams Liaise with stakeholders to identify education sector needs Plan and estimate migrations and feature delivery Coordinate with service management, security, and AWS experts About you Essentialexperience Degree or equivalent Proven technical team leadership Skilled in two or more modern programming languages Experience with AWS cloud and infrastructure DevOps skills: automation, CI/CD, infrastructure-as-code Understanding of SRE and observability Experience in web-apps and modern frameworks Strong communicator with technical and non-technical audiences Technical Expertise CI/CD pipelines, automation frameworks, and developer tooling Observability tools, monitoring, logging, and alerting systems Responsible AI practices and governance Event-driven architecture and microservices patterns Software design patterns and scalability best practices Security principles in cloud environments Leadership Qualities Ability to set technical standards and provide thought leadership Experience balancing people management with hands-on contribution Strong mentoring and coaching skills Collaborative approach that builds trust across teams Passion for continuous learning in AI/ML and DevOps Promotes inclusion and continuous improvement You'll be instrumental in our digital transformation, establishing the foundations for reliable, innovative systems that serve millions of learners, teachers, and researchers worldwide. By evolving our SRE function and advancing our AI practice, you'll empower teams to deliver high-performance solutions while responsibly harnessing cutting-edge technologies. If you would like to know more about this opportunity and what will make you successful, please see the full job description attached to the bottom of this vacancy on our careers site. Rewards and benefits We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including: 28 days annual leave plus bank holidays Private medical and Permanent Health Insurance Discretionary annual bonus Group personal pension scheme Life assurance up to 4 x annual salary Green travel schemes We are a hybrid working organisation, and we offer a range of flexible working options from day one. We expect most hybrid-working colleagues to spend 40-60% of their time at their dedicated office or location. We will also consider other work arrangements if you wish to work more flexibly or require adjustments due to a disability. Ready to pursue your potential? Apply now. We review applications on an ongoing basis, with a closing date for all applications being 16th April 2026. As part of the application process you can expect: Two questions to select one answer from multiple options. A 15-minute screening call with the Hiring Manager. First stage interview via MS Teams or in person. You will be provided with a brief to complete a role related task which will need to be returned by email in advance of your interview. Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry. Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for. Why join us Joining us is your opportunity to pursue potential. You'll belong to a collaborative team that's exploring new and better ways to serve students, teachers and researchers across the globe - for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration. Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it's safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background. We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.

Apr 02, 2026

Full time

Job Title: Principal Developer Team Lead Salary: £51,400 - £68,800 Location: Cambridge/Hybrid Contract: Permanent This Principal Developer Team Lead position offers a pivotal opportunity to shape the technical future of a world-renowned academic organisation. You'll spearhead the migration of enterprise systems to cutting-edge cloud-native AWS architectures, while balancing hands-on technical leadership with people management responsibilities. We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge. About the role We're seeking a hands-on Principal Developer Team Lead to drive the technical transformation of our Exam Technology Organisation as we migrate legacy enterprise applications to modern, cloud-native architectures on AWS. You'll balance technical leadership with people management, leading a team of 4-8 developers while establishing the foundations for our future technology stack. Your initial focus will be on two strategic priorities: Evolving our SRE function - Building the DevOps infrastructure, automation, and tooling that enables Site Reliability Engineering practices across development and operations teams Advancing our AI development practice - Establishing standards, frameworks, and best practices for responsibly integrating AI capabilities into our education platforms. What You'll Do Technical Leadership Lead migration of legacy applications to cloud-native AWS architectures Build DevOps automation to support SRE practices Establish AI/ML development standards and frameworks Set observability, monitoring, and incident response standards Promote best practices in web, event-driven, and cloud-native technologies Provide technical expertise and oversee code reviews People Leadership Manage and mentor a team of 4-8 developers, providing coaching, development plan Identifying training needs in AI/ML and SRE. Support recruitment and foster a culture of continual improvement and wellbeing. Delivery & Collaboration Deliver software in agile squads Collaborate with architects, SREs, product owners, and infrastructure teams Liaise with stakeholders to identify education sector needs Plan and estimate migrations and feature delivery Coordinate with service management, security, and AWS experts About you Essentialexperience Degree or equivalent Proven technical team leadership Skilled in two or more modern programming languages Experience with AWS cloud and infrastructure DevOps skills: automation, CI/CD, infrastructure-as-code Understanding of SRE and observability Experience in web-apps and modern frameworks Strong communicator with technical and non-technical audiences Technical Expertise CI/CD pipelines, automation frameworks, and developer tooling Observability tools, monitoring, logging, and alerting systems Responsible AI practices and governance Event-driven architecture and microservices patterns Software design patterns and scalability best practices Security principles in cloud environments Leadership Qualities Ability to set technical standards and provide thought leadership Experience balancing people management with hands-on contribution Strong mentoring and coaching skills Collaborative approach that builds trust across teams Passion for continuous learning in AI/ML and DevOps Promotes inclusion and continuous improvement You'll be instrumental in our digital transformation, establishing the foundations for reliable, innovative systems that serve millions of learners, teachers, and researchers worldwide. By evolving our SRE function and advancing our AI practice, you'll empower teams to deliver high-performance solutions while responsibly harnessing cutting-edge technologies. If you would like to know more about this opportunity and what will make you successful, please see the full job description attached to the bottom of this vacancy on our careers site. Rewards and benefits We will support you to be at your best in work and to live well outside of it. In addition to competitive salaries, we offer a world-class, flexible rewards package , featuring family-friendly and planet-friendly benefits including: 28 days annual leave plus bank holidays Private medical and Permanent Health Insurance Discretionary annual bonus Group personal pension scheme Life assurance up to 4 x annual salary Green travel schemes We are a hybrid working organisation, and we offer a range of flexible working options from day one. We expect most hybrid-working colleagues to spend 40-60% of their time at their dedicated office or location. We will also consider other work arrangements if you wish to work more flexibly or require adjustments due to a disability. Ready to pursue your potential? Apply now. We review applications on an ongoing basis, with a closing date for all applications being 16th April 2026. As part of the application process you can expect: Two questions to select one answer from multiple options. A 15-minute screening call with the Hiring Manager. First stage interview via MS Teams or in person. You will be provided with a brief to complete a role related task which will need to be returned by email in advance of your interview. Please note that successful applicants will be subject to satisfactory background checks including DBS due to working in a regulated industry. Cambridge University Press & Assessment is an approved UK employer for the sponsorship of eligible roles and applicants under the Skilled Worker visa route. Please refer to the gov.uk website for guidance to understand your own eligibility based on the role you are applying for. Why join us Joining us is your opportunity to pursue potential. You'll belong to a collaborative team that's exploring new and better ways to serve students, teachers and researchers across the globe - for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration. Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it's safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background. We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities.

Senior SRE Leader: Scale, Observability & IaC

Orgvue Limited

A leading software platform in London is seeking a Principal Site Reliability Engineer to focus on scaling and hardening their AWS- and Kubernetes-based infrastructure. The successful candidate will oversee SRE practices, guide automation efforts, and ensure operational excellence within the team. With benefits like hybrid working, private medical insurance, and a pension contribution, this role is ideal for those looking to advance their career while promoting a diverse workforce.

Apr 02, 2026

Full time

A leading software platform in London is seeking a Principal Site Reliability Engineer to focus on scaling and hardening their AWS- and Kubernetes-based infrastructure. The successful candidate will oversee SRE practices, guide automation efforts, and ensure operational excellence within the team. With benefits like hybrid working, private medical insurance, and a pension contribution, this role is ideal for those looking to advance their career while promoting a diverse workforce.

Senior Site Reliability Engineer (Public Cloud)

Head Resourcing Edinburgh, Midlothian

I'm partnered with a major organisation that's going through a huge SRE modernisation, and they're growing a brand-new, engineering-focused SRE function across their cloud platforms. We're now looking for an experienced Senior Site Reliability Engineer to join the team. This is real SRE work: reducing toil, building automation, improving system reliability and observability, and supporting large-scale cloud environments across Azure and GCP . The Role You'll be part of a unified SRE team supporting multiple cloud teams, working on: Reliability, performance and observability across Azure/GCP Automation to reduce repeat incidents, tickets, and manual processes Improving SLOs, SLIs, error budgets and platform health Building and maintaining Terraform modules, GitHub pipelines and IaC Supporting app teams as they migrate large workloads to cloud 1-in-4 on-call (enhanced pay) What They're Looking For 5+ years experience as an SRE in large/complex environments Strong Azure and/or GCP capability Terraform + CI/CD experience (GitHub, IaC, scripting) Deep understanding of observability, data, logs and alerting Someone who wants to help shape a modern SRE culture - not just keep the lights on Why It's a Great Move Massive modernisation programme Opportunity to influence tooling, processes and culture Multi-cloud exposure (Azure + GCP) Proper engineering autonomy Clear progression opportunities as the team scales What they are offering: Hybrid working environment with a requirement to be in the office 2 days per week (Leeds, Halifax, Manchester, Bristol or Edinburgh). Enhanced benefits package which includes flexible cash sum, private medical, enhanced pension contribution, 28 days + bank holidays and more. If you are interested in finding out more, please send across an updated version of your CV, clearing demonstrating your relevant experience!

Apr 02, 2026

Full time

I'm partnered with a major organisation that's going through a huge SRE modernisation, and they're growing a brand-new, engineering-focused SRE function across their cloud platforms. We're now looking for an experienced Senior Site Reliability Engineer to join the team. This is real SRE work: reducing toil, building automation, improving system reliability and observability, and supporting large-scale cloud environments across Azure and GCP . The Role You'll be part of a unified SRE team supporting multiple cloud teams, working on: Reliability, performance and observability across Azure/GCP Automation to reduce repeat incidents, tickets, and manual processes Improving SLOs, SLIs, error budgets and platform health Building and maintaining Terraform modules, GitHub pipelines and IaC Supporting app teams as they migrate large workloads to cloud 1-in-4 on-call (enhanced pay) What They're Looking For 5+ years experience as an SRE in large/complex environments Strong Azure and/or GCP capability Terraform + CI/CD experience (GitHub, IaC, scripting) Deep understanding of observability, data, logs and alerting Someone who wants to help shape a modern SRE culture - not just keep the lights on Why It's a Great Move Massive modernisation programme Opportunity to influence tooling, processes and culture Multi-cloud exposure (Azure + GCP) Proper engineering autonomy Clear progression opportunities as the team scales What they are offering: Hybrid working environment with a requirement to be in the office 2 days per week (Leeds, Halifax, Manchester, Bristol or Edinburgh). Enhanced benefits package which includes flexible cash sum, private medical, enhanced pension contribution, 28 days + bank holidays and more. If you are interested in finding out more, please send across an updated version of your CV, clearing demonstrating your relevant experience!

Site Reliability Engineer Intern - London, UK

Apple Inc.

Site Reliability Engineer Intern - London, UK London, England, United Kingdom Software and Services People at Apple don't just build products - they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds and supports the systems that make many of these daily experiences possible. If you've used Apple products, you've likely interacted with us.iCloud Services SRE teams are responsible for the systems and services that directly support our customers and their experiences. We are looking for passionate and talented Site Reliability Engineers to continue our focus on providing our customers the highest quality Apple Services experience. Our services have to scale globally, stay highly available, and "just work." If you love designing, engineering, and running systems and infrastructure that will help millions of customers, then this is the place for you! Description Our team leads the reliability engineering for iCloud Identity core services, serving millions of customers worldwide. As a reliability engineering team, we are deeply involved in the entire product delivery lifecycle, ensuring our services are always available and just works. Operating in a diverse and large-scale environment, we thrive on innovative ideas to build robust, automated systems. Here, you'll play a key role in transforming conceptual ideas into tangible solutions, ensuring our product seamlessly works for everyone. In this internship role, you'll take on the unique challenges of large-scale system operations while working closely with team members. You'll actively contribute to engineering projects aimed at improving service operability and quality, gaining invaluable hands on experience in one of the most dynamic and impactful areas of modern technology. Minimum Qualifications Coding experience using Java, or Golang. Experience with Linux/Unix, Networking, Systems Management, Systems Security Excellent troubleshooting and problem solving skills Excellent written and verbal communication skills Preferred Qualifications Good Understanding and hands on experience with Kubernetes and container orchestration's implementation. Experience managing Distributed Systems / Large Scale Systems Operations At Apple, we're not all the same. And that's our greatest strength. We draw on the differences in who we are, what we've experienced and how we think. Because to create products that serve everyone, we believe in including everyone. Therefore, we are committed to treating all applicants fairly and equally. As a registered Disability Confident employer, we will work with applicants to make any reasonable accommodations. Apple will consider for employment all qualified applicants with criminal backgrounds in a manner consistent with applicable law. Learn more

Apr 01, 2026

Full time

Site Reliability Engineer Intern - London, UK London, England, United Kingdom Software and Services People at Apple don't just build products - they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds and supports the systems that make many of these daily experiences possible. If you've used Apple products, you've likely interacted with us.iCloud Services SRE teams are responsible for the systems and services that directly support our customers and their experiences. We are looking for passionate and talented Site Reliability Engineers to continue our focus on providing our customers the highest quality Apple Services experience. Our services have to scale globally, stay highly available, and "just work." If you love designing, engineering, and running systems and infrastructure that will help millions of customers, then this is the place for you! Description Our team leads the reliability engineering for iCloud Identity core services, serving millions of customers worldwide. As a reliability engineering team, we are deeply involved in the entire product delivery lifecycle, ensuring our services are always available and just works. Operating in a diverse and large-scale environment, we thrive on innovative ideas to build robust, automated systems. Here, you'll play a key role in transforming conceptual ideas into tangible solutions, ensuring our product seamlessly works for everyone. In this internship role, you'll take on the unique challenges of large-scale system operations while working closely with team members. You'll actively contribute to engineering projects aimed at improving service operability and quality, gaining invaluable hands on experience in one of the most dynamic and impactful areas of modern technology. Minimum Qualifications Coding experience using Java, or Golang. Experience with Linux/Unix, Networking, Systems Management, Systems Security Excellent troubleshooting and problem solving skills Excellent written and verbal communication skills Preferred Qualifications Good Understanding and hands on experience with Kubernetes and container orchestration's implementation. Experience managing Distributed Systems / Large Scale Systems Operations At Apple, we're not all the same. And that's our greatest strength. We draw on the differences in who we are, what we've experienced and how we think. Because to create products that serve everyone, we believe in including everyone. Therefore, we are committed to treating all applicants fairly and equally. As a registered Disability Confident employer, we will work with applicants to make any reasonable accommodations. Apple will consider for employment all qualified applicants with criminal backgrounds in a manner consistent with applicable law. Learn more

SRE Intern: Build Reliable, Scalable Services

Apple Inc.

A leading technology company is seeking a Site Reliability Engineer Intern in London, UK. This role focuses on enhancing the reliability of iCloud services through innovative engineering solutions. Ideal candidates will have coding experience in Java or Golang, along with a strong understanding of Linux/Unix systems. Strong troubleshooting and communication skills are essential, along with a passion for improving service quality. This internship provides a unique opportunity to work on impactful projects that reach millions of customers.

Apr 01, 2026

Full time

A leading technology company is seeking a Site Reliability Engineer Intern in London, UK. This role focuses on enhancing the reliability of iCloud services through innovative engineering solutions. Ideal candidates will have coding experience in Java or Golang, along with a strong understanding of Linux/Unix systems. Strong troubleshooting and communication skills are essential, along with a passion for improving service quality. This internship provides a unique opportunity to work on impactful projects that reach millions of customers.

Global Banking & Markets - Trading Systems Support Engineer - Associate/Vice President - London

WeAreTechWomen

What We Do At Goldman Sachs, we connect people, capital and ideas to help solve problems for our clients. We are a leading global financial services firm providing investment banking, securities and investment management services to a substantial and diversified client base that includes corporations, financial institutions, governments, and individuals. Futures Engineering plays a key role in the firm's ability to provide liquidity and execution services for institutional clients around the world, two important revenue drivers for the firm. In Futures Engineering we use both open source industry standard and internal proprietary technologies to build cutting edge platforms for pricing, execution, and control over each of these millions of transactions. Who We Look For The Futures business consolidates and expands the firm's electronic market making and algorithmic trade execution. As part of the Futures Engineering team, Futures Mission Control Engineering partners with Futures Trading to develop and support the pricing and execution services for the firm and its clients. The team is primarily focused on site reliability engineering, including driving automation, improving real time monitoring, developing metrics to track performance, and managing the release and deployment lifecycle. Team members help support the day to day operations of the trading desk and the electronic trading systems; the team is expected to interact closely with Trading & Sales business users. Candidates must have the technical and analytical skills required to triage and resolve complex production issues and operate well in a fast paced, high pressure environment. A propensity to automate manual tasks, appreciation for large scale, and distributed computing systems will be necessary to succeed in the role. As part of a global support team, you will provide operational and technical assistance for Futures applications and infrastructure, both for external clients and internal business users. In addition, you will oversee every component of the production system to identify and resolve production problems as well as assess the risk of systems changes. Job Summary Technical and operational risk management of a fast paced, multi asset electronic trading business Analysis focused on creating sustainable systems and services that meet uptime and performance requirements through automation Finding opportunities for efficiency and cost savings in support process and physical environment Partnering with software and infrastructure owners to solve hardware/network issues Incident and crisis management Significant business interaction across Futures front office Participation in system design consulting, platform management, and capacity planning Basic Qualifications At least 5 years of professional experience in a technical support, SRE, or operations role within a fast paced trading or financial environment. Proven aptitude for understanding complex algorithms, data structures, and software design principles relevant to high performance systems. Solid understanding of Linux operating system internals and networking concepts. Strong analytical and problem solving skills, with the ability to quickly diagnose and mitigate issues under pressure in a real time trading environment. Excellent communication and interpersonal skills, crucial for effective interaction with trading desk personnel and technical teams. Ability to effectively multi task, prioritize, and manage incidents in a dynamic trading environment. Preferred Qualifications Direct experience providing 1st, 2nd, or 3rd line support to a trading desk or front office users. Hands on experience with Site Reliability Engineering (SRE) practices, including automation, monitoring, and incident response. Proficiency in at least one scripting or programming language (e.g., Python, Shell Scripting, Java, C++) for automation, tooling, and operational tasks. Experience with distributed systems design, maintenance, and troubleshooting. Knowledge of financial markets, electronic trading workflows, and the FIX protocol. Demonstrated ability to debug, optimize, and troubleshoot code and system performance issues.

Apr 01, 2026

Full time

What We Do At Goldman Sachs, we connect people, capital and ideas to help solve problems for our clients. We are a leading global financial services firm providing investment banking, securities and investment management services to a substantial and diversified client base that includes corporations, financial institutions, governments, and individuals. Futures Engineering plays a key role in the firm's ability to provide liquidity and execution services for institutional clients around the world, two important revenue drivers for the firm. In Futures Engineering we use both open source industry standard and internal proprietary technologies to build cutting edge platforms for pricing, execution, and control over each of these millions of transactions. Who We Look For The Futures business consolidates and expands the firm's electronic market making and algorithmic trade execution. As part of the Futures Engineering team, Futures Mission Control Engineering partners with Futures Trading to develop and support the pricing and execution services for the firm and its clients. The team is primarily focused on site reliability engineering, including driving automation, improving real time monitoring, developing metrics to track performance, and managing the release and deployment lifecycle. Team members help support the day to day operations of the trading desk and the electronic trading systems; the team is expected to interact closely with Trading & Sales business users. Candidates must have the technical and analytical skills required to triage and resolve complex production issues and operate well in a fast paced, high pressure environment. A propensity to automate manual tasks, appreciation for large scale, and distributed computing systems will be necessary to succeed in the role. As part of a global support team, you will provide operational and technical assistance for Futures applications and infrastructure, both for external clients and internal business users. In addition, you will oversee every component of the production system to identify and resolve production problems as well as assess the risk of systems changes. Job Summary Technical and operational risk management of a fast paced, multi asset electronic trading business Analysis focused on creating sustainable systems and services that meet uptime and performance requirements through automation Finding opportunities for efficiency and cost savings in support process and physical environment Partnering with software and infrastructure owners to solve hardware/network issues Incident and crisis management Significant business interaction across Futures front office Participation in system design consulting, platform management, and capacity planning Basic Qualifications At least 5 years of professional experience in a technical support, SRE, or operations role within a fast paced trading or financial environment. Proven aptitude for understanding complex algorithms, data structures, and software design principles relevant to high performance systems. Solid understanding of Linux operating system internals and networking concepts. Strong analytical and problem solving skills, with the ability to quickly diagnose and mitigate issues under pressure in a real time trading environment. Excellent communication and interpersonal skills, crucial for effective interaction with trading desk personnel and technical teams. Ability to effectively multi task, prioritize, and manage incidents in a dynamic trading environment. Preferred Qualifications Direct experience providing 1st, 2nd, or 3rd line support to a trading desk or front office users. Hands on experience with Site Reliability Engineering (SRE) practices, including automation, monitoring, and incident response. Proficiency in at least one scripting or programming language (e.g., Python, Shell Scripting, Java, C++) for automation, tooling, and operational tasks. Experience with distributed systems design, maintenance, and troubleshooting. Knowledge of financial markets, electronic trading workflows, and the FIX protocol. Demonstrated ability to debug, optimize, and troubleshoot code and system performance issues.

Junior Site Reliability Engineer

Revybe IT Recruitment Ltd

Junior Site Reliability Engineer Central London (3 days a week in the office) Up to £55,000 per annum + Bonus + Generous Benefits Package We are working with an exciting technology company that are looking to bring in a Junior Site Reliability Engineer to help scale their cloud infrastructure and DevOps capability. They've built a high-performing engineering team and are now investing further into the platform side of things as demand grows. Think modern, cloud-native architecture, and a real emphasis on automation, scalability, and developer enablement. You'll join an experienced team you can learn and grow from. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Monitoring and Observability Grafana, Prometheus, Datadog Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python, Bash or Go (scripting, automation) GitHub Actions (CI/CD pipelines) What They're Looking For Experience in AWS cloud infrastructure Previous experience working with Monitoring and Observability Tools - Datadog, Grafana or Prometheus Knowledge on how Kubernetes works. Understanding of IaC - Terraform. Experience with CI/CD (GitHub Actions or similar) A good communicator who enjoys working collaboratively across product and engineering. The client is willing to consider candidates without all the required skills and provide an environment to learn and grow on the job. Training and development is at the forefront of the business, where you will get plenty of opportunities to progress your career in whatever path you want. Junior Site Reliability Engineer Central London (3 days a week in the office) Up to £55,000 per annum + Bonus + Generous Benefits Package Click APPLY NOW to be considered for this position! AWS, SRE, Cloud, Kubernetes, EKS, Terraform, CI/CD, Automation etc.

Apr 01, 2026

Full time

Junior Site Reliability Engineer Central London (3 days a week in the office) Up to £55,000 per annum + Bonus + Generous Benefits Package We are working with an exciting technology company that are looking to bring in a Junior Site Reliability Engineer to help scale their cloud infrastructure and DevOps capability. They've built a high-performing engineering team and are now investing further into the platform side of things as demand grows. Think modern, cloud-native architecture, and a real emphasis on automation, scalability, and developer enablement. You'll join an experienced team you can learn and grow from. Tech stack AWS (Core services - EC2, RDS, S3, IAM, etc.) Monitoring and Observability Grafana, Prometheus, Datadog Kubernetes (building and managing production clusters) Terraform (IaC provisioning) Python, Bash or Go (scripting, automation) GitHub Actions (CI/CD pipelines) What They're Looking For Experience in AWS cloud infrastructure Previous experience working with Monitoring and Observability Tools - Datadog, Grafana or Prometheus Knowledge on how Kubernetes works. Understanding of IaC - Terraform. Experience with CI/CD (GitHub Actions or similar) A good communicator who enjoys working collaboratively across product and engineering. The client is willing to consider candidates without all the required skills and provide an environment to learn and grow on the job. Training and development is at the forefront of the business, where you will get plenty of opportunities to progress your career in whatever path you want. Junior Site Reliability Engineer Central London (3 days a week in the office) Up to £55,000 per annum + Bonus + Generous Benefits Package Click APPLY NOW to be considered for this position! AWS, SRE, Cloud, Kubernetes, EKS, Terraform, CI/CD, Automation etc.

Site Reliability Engineer

83Zero Ltd Wokingham, Berkshire

Site Reliability Engineer (SRE) - Active SC required! Up to £55,000 + benefits Hybrid (UK-based) We're looking for a Site Reliability Engineer to join a growing technology team delivering highly scalable, resilient systems across a range of enterprise environments. This is a fantastic opportunity for someone with a solid foundation in DevOps/SRE practices who wants to deepen their expertise in automation, reliability, and cloud-native technologies. What you'll be doing: Supporting the reliability, availability, and performance of production systems Monitoring applications and infrastructure, responding to incidents and driving resolution Automating manual processes to improve efficiency and reduce risk Collaborating with engineering teams to improve system design and resilience Contributing to CI/CD pipelines and infrastructure-as-code practices What we're looking for: Experience in an SRE, DevOps, or similar engineering role Knowledge of cloud platforms (AWS, Azure, or GCP) Familiarity with monitoring/logging tools (e.g. Prometheus, Grafana, ELK) Scripting or programming skills (e.g. Python, Bash, Go) Understanding of containers and orchestration (Docker/Kubernetes is a plus) Why apply? Work with modern, cloud-native technologies Supportive environment with strong learning and development opportunities Clear progression path into senior SRE roles

Apr 01, 2026

Full time

Site Reliability Engineer (SRE) - Active SC required! Up to £55,000 + benefits Hybrid (UK-based) We're looking for a Site Reliability Engineer to join a growing technology team delivering highly scalable, resilient systems across a range of enterprise environments. This is a fantastic opportunity for someone with a solid foundation in DevOps/SRE practices who wants to deepen their expertise in automation, reliability, and cloud-native technologies. What you'll be doing: Supporting the reliability, availability, and performance of production systems Monitoring applications and infrastructure, responding to incidents and driving resolution Automating manual processes to improve efficiency and reduce risk Collaborating with engineering teams to improve system design and resilience Contributing to CI/CD pipelines and infrastructure-as-code practices What we're looking for: Experience in an SRE, DevOps, or similar engineering role Knowledge of cloud platforms (AWS, Azure, or GCP) Familiarity with monitoring/logging tools (e.g. Prometheus, Grafana, ELK) Scripting or programming skills (e.g. Python, Bash, Go) Understanding of containers and orchestration (Docker/Kubernetes is a plus) Why apply? Work with modern, cloud-native technologies Supportive environment with strong learning and development opportunities Clear progression path into senior SRE roles

Senior Site Reliability Engineer

83Zero Ltd Wokingham, Berkshire

Senior Site Reliability Engineer - Active SC Required! Up to £75,000 + benefits Wokingham - Hybrid (UK-based) We're seeking a Senior Site Reliability Engineer to play a key role in designing and operating highly reliable, scalable systems in a fast-paced environment. You'll act as a technical leader within the team, driving best practices across reliability engineering, automation, and system performance. What you'll be doing: Designing and improving system reliability, scalability, and observability Leading incident management and driving root cause analysis Building and maintaining robust CI/CD pipelines and automation frameworks Partnering with development teams to embed SRE principles into the SDLC Mentoring junior engineers and promoting engineering best practices What we're looking for: Strong experience in SRE, DevOps, or platform engineering roles Deep understanding of cloud infrastructure (AWS, Azure, or GCP) Hands-on experience with Kubernetes and containerised environments Strong scripting/programming skills (Python, Go, or similar) Experience with monitoring, alerting, and observability tooling Proven ability to troubleshoot complex distributed systems Why apply? Opportunity to influence technical direction and best practices Work on large-scale, mission-critical systems Leadership exposure with clear progression to principal level

Apr 01, 2026

Full time

Senior Site Reliability Engineer - Active SC Required! Up to £75,000 + benefits Wokingham - Hybrid (UK-based) We're seeking a Senior Site Reliability Engineer to play a key role in designing and operating highly reliable, scalable systems in a fast-paced environment. You'll act as a technical leader within the team, driving best practices across reliability engineering, automation, and system performance. What you'll be doing: Designing and improving system reliability, scalability, and observability Leading incident management and driving root cause analysis Building and maintaining robust CI/CD pipelines and automation frameworks Partnering with development teams to embed SRE principles into the SDLC Mentoring junior engineers and promoting engineering best practices What we're looking for: Strong experience in SRE, DevOps, or platform engineering roles Deep understanding of cloud infrastructure (AWS, Azure, or GCP) Hands-on experience with Kubernetes and containerised environments Strong scripting/programming skills (Python, Go, or similar) Experience with monitoring, alerting, and observability tooling Proven ability to troubleshoot complex distributed systems Why apply? Opportunity to influence technical direction and best practices Work on large-scale, mission-critical systems Leadership exposure with clear progression to principal level

25 jobs found

Modal Window