In this note, I will derive the Bias-Variance decomposition following Ref. 1,2.
Let us suppose there exists the real function that generates our data with additive noise, so
Let be the set of training data
And suppose we train a model with this set. The cost function that we consider here is given by
where and .
The expectation value of , denoted by , will be given by