Senior Data Scientist, Industrialized Workflows

  • Recursion
  • Apr 11, 2024
Full time I.T. & Communications

Job Description

The Impact You'll Make

With treatments for hundreds of diseases in our sights, we've built a data science team with domain expertise in computer science, bioinformatics, physics, biology, mathematics, applied statistics, and more. We work side-by-side with biologists, automation scientists, chemists, software engineers, and many others; together, we develop the tools and methods to turn our experimental data into treatments for pathologies that affect the lives of countless individuals. As a data scientist supporting the development of our industrialized workflows, you'll work with a highly dynamic team that is focused on improving how we move from ideation through to advanced candidate drugs in a way that accelerates decision-making and automates as much as possible to scale the impact that we can have.

You'll have access to unbelievable scales of data: we currently run up to 2.2 million experiments run each week, our ground-breaking Phenom-1 foundation model, trained on > 1 billion in-house images, and our maps of biology and chemistry that contain > 5 trillion relationships across multiple biological and chemical contexts.

In this role, you will leverage this data as you:

  • Partner with chemists and biologists to understand their processes and the questions that they are asking at each stage of the drug discovery funnel
  • Contribute to the development of LOWE, a natural language interface that connects wet- and dry-lab components of the Recursion OS to streamline drug-discovery tasks
  • Develop methods, metrics, benchmarks, and models to help drive drug discovery in a standardized way.
  • Convert exploratory analysis into production-quality functions that can be incorporated into in-house Python packages and that support at-scale generation of data packages to accelerate decisions on passing programs through internal stage gates.
  • Create and analyze enormous sets of connected data for a variety of programs to learn how best to advance drug discovery in an industrialized way
  • Collaborate with engineering teams to mature your models and analyses and put them into productionized flows
  • Deliver quickly and iteratively, both supporting in-flight programs and building improvements for the long-term in short-lived, agile workstreams
  • Learn to leverage new code packages and data science techniques as needed

Location:

Making London your home base is ideal, however, we will consider on-site work in our Salt Lake City, Utah or Toronto, Ontario offices as well.

The Team You'll Join

We are an application-oriented group whose goal is to discover drugs at scale, using the toolkit of computational science in collaboration with our counterparts in other engineering (software and data engineering, laboratory automation), scientific (biology, chemistry, clinical science), and operational (laboratory operations, regulatory affairs) disciplines. We are value-driving - data science at Recursion is not just an accelerating function; it is a core part of our value proposition. As data scientists, we are responsible for showing up as leaders and visionaries, helping to shape how Recursion delivers on our mission. We work on what matters and deliver in timescales of weeks not quarters. We focus on the impact that we are trying to make and the "why" of what we are trying to deliver and are resilient if the "how" of what we are doing needs to change.

The Experience You'll Need

  • 3-5+ years practical experience applying probability, statistics, and machine learning to real-world datasets in service of academic or business applications and recommendations.
    • Strong preference for experience in the field of biosciences (particularly pharmaceuticals) or working on projects that require regular cross-disciplinary collaboration.
  • Experience working within a fast-paced interdisciplinary team to solve business-relevant problems and communicating complex concepts and methods to audiences with diverse technical backgrounds.
  • High fluency with the Python data stack (numpy, pandas, scikit-learn, etc).
  • Experience in collaborative data product development and peer code review, including version control tools like git.
  • Experience developing, releasing, and maintaining data products in a continuous-use production environment.
  • Nice to have: experience in creating compelling visualizations of high-dimensional data that enable clear decision-making and interpretation, prompt engineering for LLMs, cheminformatics, OR analysis of RNA sequencing data.

How You'll be Supported

  • You will be assigned a peer trail guide to support you as you onboard and get familiar with Recursion systems
  • Receive real-time feedback on code quality and best practices from a team of peers
  • Ability to participate and learn from your colleagues in our regular all-hands, journal club & tech talks for Data Science
  • Option to attend conferences to learn more from colleagues, networks, and more to better your skillset