Huxley Associates
City, London
This is a rare opportunity to apply serious data engineering in a domain where latency, correctness, and reliability carry direct commercial weight. Requirements 6+ years data engineering in production environments; Python expertise - idiomatic, well-tested, production-grade code, not notebook scripts ETL/ELT pipeline design and implementation at scale; orchestration with Airflow, Prefect, or equivalent; reliability-first mindset including backfill, retry, and exactly-once semantics Azure data platform - Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake Storage; infrastructure as code for data workloads (Terraform or Bicep) Databricks - Delta Lake, Unity Catalog, job cluster vs interactive cluster trade-offs, cost-aware compute management, Spark job optimisation Relational databases: PostgreSQL at production scale - query optimisation, indexing strategies, table partitioning, replication, schema design for both OLTP and analytical workloads MongoDB - document modelling, aggregation pipelines, indexing strategy, replica sets; clear judgment on when document vs relational storage is the right architectural call Containerisation: Docker and Kubernetes-based deployment of data workloads; reproducible, environment-agnostic data infrastructure Data modelling for analytical workloads - dimensional modelling, data vault, or equivalent; schema evolution, slowly changing dimensions, and downstream impact analysis Stream and batch processing patterns; late data handling, watermarking, and backfill strategies; throughput vs latency trade-offs in pipeline design Production data observability - data lineage, quality checks, SLA monitoring, alerting on freshness and completeness; treating data correctness as a first-class concern CI/CD for data infrastructure - version-controlled pipelines, automated data quality testing, reproducible and auditable deploys Ability to work directly with quant researchers, risk managers, and traders - translate business requirements into reliable, well-documented data products Nice to Have Financial markets data - market data feeds (Bloomberg, Refinitiv), tick data, trade history, reference data, or instrument master management Apache Spark or Flink for large-scale stream and batch processing beyond the Databricks ecosystem dbt or equivalent SQL transformation layer; experience building and maintaining dbt projects in a production data warehouse Event streaming with Kafka or Confluent Platform - topic design, consumer group management, exactly-once delivery guarantees OLAP-optimised stores - ClickHouse, DuckDB, or equivalent; understanding of columnar storage and vectorised query execution Energy, commodities, or broader financial markets domain knowledge What We're Looking For You treat data as a product, not a side effect. You know what it takes to make a pipeline trustworthy - not just running, but observable, tested, and recoverable when something upstream changes at 3am. You think in systems: schema evolution, lineage, freshness SLAs, and the downstream impact of every modelling decision. At ETrading , that data is the foundation of billion-dollar trading decisions. You are the reason it is right. To find out more about Huxley, please visit (url removed) Huxley, a trading division of SThree Partnership LLP is acting as an Employment Business in relation to this vacancy Registered office 8 Bishopsgate, London, EC2N 4BQ, United Kingdom Partnership Number OC(phone number removed) England and Wales
This is a rare opportunity to apply serious data engineering in a domain where latency, correctness, and reliability carry direct commercial weight. Requirements 6+ years data engineering in production environments; Python expertise - idiomatic, well-tested, production-grade code, not notebook scripts ETL/ELT pipeline design and implementation at scale; orchestration with Airflow, Prefect, or equivalent; reliability-first mindset including backfill, retry, and exactly-once semantics Azure data platform - Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake Storage; infrastructure as code for data workloads (Terraform or Bicep) Databricks - Delta Lake, Unity Catalog, job cluster vs interactive cluster trade-offs, cost-aware compute management, Spark job optimisation Relational databases: PostgreSQL at production scale - query optimisation, indexing strategies, table partitioning, replication, schema design for both OLTP and analytical workloads MongoDB - document modelling, aggregation pipelines, indexing strategy, replica sets; clear judgment on when document vs relational storage is the right architectural call Containerisation: Docker and Kubernetes-based deployment of data workloads; reproducible, environment-agnostic data infrastructure Data modelling for analytical workloads - dimensional modelling, data vault, or equivalent; schema evolution, slowly changing dimensions, and downstream impact analysis Stream and batch processing patterns; late data handling, watermarking, and backfill strategies; throughput vs latency trade-offs in pipeline design Production data observability - data lineage, quality checks, SLA monitoring, alerting on freshness and completeness; treating data correctness as a first-class concern CI/CD for data infrastructure - version-controlled pipelines, automated data quality testing, reproducible and auditable deploys Ability to work directly with quant researchers, risk managers, and traders - translate business requirements into reliable, well-documented data products Nice to Have Financial markets data - market data feeds (Bloomberg, Refinitiv), tick data, trade history, reference data, or instrument master management Apache Spark or Flink for large-scale stream and batch processing beyond the Databricks ecosystem dbt or equivalent SQL transformation layer; experience building and maintaining dbt projects in a production data warehouse Event streaming with Kafka or Confluent Platform - topic design, consumer group management, exactly-once delivery guarantees OLAP-optimised stores - ClickHouse, DuckDB, or equivalent; understanding of columnar storage and vectorised query execution Energy, commodities, or broader financial markets domain knowledge What We're Looking For You treat data as a product, not a side effect. You know what it takes to make a pipeline trustworthy - not just running, but observable, tested, and recoverable when something upstream changes at 3am. You think in systems: schema evolution, lineage, freshness SLAs, and the downstream impact of every modelling decision. At ETrading , that data is the foundation of billion-dollar trading decisions. You are the reason it is right. To find out more about Huxley, please visit (url removed) Huxley, a trading division of SThree Partnership LLP is acting as an Employment Business in relation to this vacancy Registered office 8 Bishopsgate, London, EC2N 4BQ, United Kingdom Partnership Number OC(phone number removed) England and Wales