Career Hub

Data Science & Machine Learning

Data scientists build statistical models and machine learning systems to extract predictive or descriptive insight from data. In practice the role blends data engineering, statistics, programming, and domain knowledge. Many data scientists in UK companies spend more time on data cleaning, exploration, and stakeholder communication than on model training. A junior data scientist who can deliver reliable analysis and communicate uncertainty clearly is more valuable than one who can name every model architecture but cannot debug a pandas pipeline.

What does the Data Science & Machine Learning role involve?

Extracting and preparing datasets from databases, APIs, and flat files.
Exploratory data analysis to understand distributions, correlations, and anomalies.
Building, evaluating, and iterating on statistical and machine learning models.
Writing clear documentation of methodology and assumptions.
Working with engineers to deploy models into production or integrate outputs into reporting.
Presenting findings to technical and non-technical audiences.
Maintaining and monitoring deployed models for drift and degradation.

Skills Required

Python (pandas, NumPy, scikit-learn, matplotlib).
SQL for data extraction and transformation.
Statistics: distributions, hypothesis testing, regression, and confidence intervals.
Machine learning fundamentals: supervised learning (regression, classification), unsupervised learning (clustering), model evaluation metrics.
Data visualisation: matplotlib, seaborn, or Plotly.
Version control with Git.
Understanding of cloud-based data tools (BigQuery, AWS S3/Athena, Azure ML).

UK Salary Range

Entry level (0-2 years): £28,000 to £40,000. Graduate data scientist and junior data analyst with ML focus. Higher end at fintech and tech companies.
Mid-level (2-5 years): £45,000 to £65,000. Independent ownership of ML projects from data to deployment. Expectation of strong Python engineering alongside statistical competence.
Senior (5+ years): £65,000 to £95,000. Staff data scientists and ML leads at large tech companies reach £100,000 to £130,000 with equity. Research scientist roles at AI labs pay significantly above market.
ML Engineering (adjacent path): MLOps and ML engineering roles that focus on production systems command a premium: £50,000 to £80,000 at mid-level.

UK Job Market

UK data science roles are concentrated in London, with growing clusters in Manchester, Edinburgh, and Bristol.
Fintech, healthcare tech, e-commerce, and government analytics teams are the most active hirers.
Many advertised roles require two to three years of experience, but companies running graduate or apprenticeship schemes are accessible at entry level.
The data scientist title inflated during 2020 to 2023 and is now more precise: if a role primarily involves SQL and dashboards, it is an analyst role.
True ML engineering roles require production deployment experience.

Who This Career Path Is For

People with a quantitative background (mathematics, statistics, economics, physics) who want to apply that thinking to business problems.
Developers who want to move into model building.
Analysts who have outgrown SQL and want to add predictive modelling.

How to Get Started

Phase 1: Python and data fundamentals (weeks 1-6)

Python for data: pandas, NumPy, and matplotlib.
Practice on a clean dataset before touching messy real data.
SQL revision: window functions, CTEs, and query performance.
Statistics refresher: mean, variance, distributions, correlation, and hypothesis testing.
Build one end-to-end exploratory analysis project and document it fully.

Phase 2: Machine learning foundations (weeks 7-14)

Scikit-learn: linear regression, logistic regression, decision trees, random forests, k-means clustering.
Understand cross-validation and evaluation metrics (accuracy is rarely the right metric).
Build two supervised learning models on public datasets with full documentation of data preparation, feature engineering, and evaluation.

Phase 3: Production and tooling (weeks 15-20)

Git for version control.
Jupyter for exploration, Python scripts for production.
Understanding of model deployment options (API wrapper, batch scoring).
Introduction to cloud data tools.
Build a project that moves beyond a notebook: a Python script that re-runs analysis on new data and writes outputs to a database or file.

Phase 4: Specialise (weeks 21-26)

Natural language processing for text-heavy domains.
Time series forecasting for operations or finance.
Deep learning fundamentals if targeting ML engineering.
Choose based on the roles you are targeting.

Deep guidance

Build Your Portfolio

Project 1: End-to-end ML project with a clear business question

Choose a public dataset with a real prediction problem (customer churn, house price estimation, loan default).
Document every step: data exploration, cleaning decisions, feature engineering choices, model selection rationale, and evaluation.
Write a brief explaining what the model would be used for and how you would monitor it.
Host on GitHub with a README that explains the project to someone who has not read the code.

Strong version: Includes a confusion matrix with interpretation in business terms, not just accuracy.
Explains why you chose the evaluation metric you did.

Project 2: Statistical analysis case study

Take a question that can be answered with data and statistics (does this promotion increase conversion? is there a regional difference in this metric?).
Collect or use public data.
Perform hypothesis testing with correct interpretation of p-values and confidence intervals.
Present the conclusion in non-technical language.
Many junior data science candidates cannot do this correctly.

Project 3: Reproducible analysis pipeline

A notebook that works is good.
A script that can be run again on new data and produces the same output format is better.
Build a data pipeline (even a small one) that fetches data, cleans it, runs analysis, and outputs results.
Document how to run it.
This signals production awareness.

How to Apply

Competitive reality

Data science is one of the most popular career changes in the UK.
Competition for junior roles is high.
Differentiate through specificity: a portfolio that shows deep work on one domain problem beats a collection of Titanic survival tutorial notebooks.

Where to look

LinkedIn, Otta, and direct company careers pages.
Companies with data science graduate programmes: Sky, BBC, Lloyds Banking Group, HSBC, NHS AI Lab, GCHQ (STEM scheme).
Deep tech companies (DeepMind, Faculty AI, Wayve) hire graduates from strong quantitative backgrounds.

Upskilling signals that help

Kaggle competition participation (even without a medal, shows you worked on real problems).
Open-source contributions to data tools.
Writing about a specific technique you implemented and what you learnt from it.

Interview Preparation

Technical questions

"Explain overfitting and how you prevent it." Model has memorised training data and performs poorly on new data.
Prevent with cross-validation, regularisation, reducing model complexity, and more training data.

"What evaluation metric would you use for a heavily imbalanced classification problem?" Not accuracy.
Precision-recall AUC, F1-score, or ROC AUC depending on the cost asymmetry between false positives and false negatives.
Explain the trade-off.

"Walk me through how a random forest works." Ensemble of decision trees, each trained on a random bootstrap sample with a random subset of features at each split.
Predictions are aggregated by majority vote (classification) or mean (regression).
Explain the bias-variance trade-off benefit.

"How would you handle missing data?" It depends on why it is missing.
MCAR: impute with mean or median.
MAR: model-based imputation.
MNAR: investigate the mechanism.
Never blindly drop rows without understanding the pattern.

"Tell me about a time your model did not work as expected and what you did." Every interviewer asks this.
Be specific: what the model was, what the failure mode was, how you diagnosed it, and what you changed.

Common Mistakes to Avoid

Mistake 1: Jumping to models before understanding the data

The single most common mistake.
Spend more time on exploratory analysis than model selection.
A clean, well-understood dataset with a simple logistic regression often outperforms a complex model built on poorly understood data.

Mistake 2: Using accuracy for imbalanced problems

If 95 percent of your data is class A, a model that always predicts class A has 95 percent accuracy and is completely useless.
Know which metrics apply to your problem.

Mistake 3: Ignoring reproducibility

A notebook that only runs on your machine with hardcoded file paths and no documented dependencies is not a portfolio artefact.
Every project must run from scratch in a fresh environment.

Mistake 4: Confusing data science with data engineering

Many junior data scientists spend more time moving and cleaning data than building models.
This is normal and valuable.
Do not be disappointed by it.
Being good at the engineering side makes you significantly more effective.

Mistake 5: Not communicating uncertainty

Every model prediction has uncertainty.
Presenting results without confidence intervals or without acknowledging limitations is a red flag in interviews and in professional settings.