The Markov chain Monte Carlo (MCMC) is a sampling method that allows us to estimate parameters of an intractable or unknown, possibly high dimensional (depends on many parameters) distribution by randomly sampling from a simpler distribution known as the proposal distribution. This is particularly useful when applying Bayesian statistics to obtain samples from an unknown or intractable posterior distribution (the revised probability distribution after accounting for the data). It is also useful for many biological applications involving large amounts of data that would be difficult to analyse otherwise.

In this article, we will give a short introduction to MCMC and…

If you have dabbled in machine learning, you might have come across the word ‘kernel’ being thrown around casually. In the sklearn library there are options to specify the type of kernel you want to use in some classifiers such as SVMs (support vector machines). So what exactly is a kernel and why does it matter?

A kernel in the context of machine learning usually refers to a function that can be plugged into a classifier’s decision function and that **only accesses the training data via inner products **in the Euclidean vector space (i.e. real numbers) this usually refers to…

A common assumption when we build any model tends to be that the variables are independent, but this assumption often doesn’t hold perfectly and here we will show why covariance messes up your estimates. First we derive the likelihood distribution for some model, next we will show how the shape of this distribution and hence the confidence interval of our estimates changes with variance. Finally, we will show how we can visualise the effect of covariance graphically and how high covariance between the variables you are trying to estimate affect the confidence intervals.

Starting with Bayes’ Theorem, we can write…

Right before the unfortunate events that led to his untimely death, Alan Turing published a landmark paper “The Chemical Basis of Morphogenesis" that still deeply influences mathematical biology and in particular developmental biology. It describes how chemical patterns can be formed from diffusion processes; these chemical patterns in turn form a blueprint for the development of structures like the digits of our fingers and the stripes of a zebra.

The genius of Turing lies in that he saw that diffusion was not just a stabilising, homogenising force (since diffusion is the reason why a drop of ink spreads out and…

I have successfully brainwashed my Excel fanatic colleague on the virtues of Python. She became an eager student after seeing how fast and easily my Python scripts gave the desired results half an hour instead of 2 painful weeks of scrubbing Excel. Python is an easy to learn, high level programming language that is freely distributed. It is also one of the most popular and in demand programming language for data science applications. While Excel is incredibly powerful, there are some things that it doesn’t do that well, such as joining large amounts of data. …

I was an engineering student, a software developer at a wealth fund and now a graduate student studying computational biology.