# Bias-Variance Decomposition

In this note, I will derive the Bias-Variance decomposition following Ref. 1,2.

Let us suppose there exists the real function $g({\bf x})$ that generates our data $y$ with additive noise, so

Let $T$ be the set of training data

And suppose we train a model $\hat{g}_T$ with this set. The cost function $J({\bf X},\hat{g}_T)$ that we consider here is given by

where ${\bf y}=(y_1,...,y_N)$ and ${\bf \hat{g}_T(X)}=(\hat{g}_T({\bf x}_1),...,\hat{g}_T({\bf x}_N))$.

The expectation value of $J$, denoted by $\langle J \rangle$, will be given by