statistical learning – Theo's Data Science Blog

1. The Maximum Likelihood (ML) Approach

Here the parameter to be estimated is considered to be a random variable.

Example: A hardware defect started appearing in the assembly line of a computer manufacturer. In the past week, the following observations of the defect were made: Monday (2), Tuesday (2), Wednesday(3), Thursday(1), Friday(0). The assembly line manager knows that the distribution of the defects follows the Poisson distribution, which is a discrete distribution used often to model the number of random events that occur in an interval of time. The probability that K events occur in a given interval is given by:

$P(K)=e^{-lambda} frac{lambda^K}{K!}$

$lambda$ is the parameter of the distribution that is also equal to the mean and variance of the distribution. The manager will use maximum likelihood to estimate $lambda$ .

The likelihood function to maximize is proportional to:

$(P(k_{1}=2,k_{2}=2,k_{3}=3,k_{4}=1,k_{5}=0)= e^{-5lambda} lambda^{sum_{n=1}^{5}k_{n}}$

In order to maximize this function, we take the derivative and set it to 0, which eventually yields:

$lambda=frac{sum_{n=1}^{5}k_{n}}{5}=8/5$

Statistical learning review question: What is the relationship between ML and Least Squares Error Estimation? When observations are independent, and normally distributed with constant variance, then ML and LSE produce the same parameter values [1].

2. The Maximum A Posteriori (MAP) Approach

Here the parameter to be estimated is considered fixed, but unknown.

Example : A technical institute in Greece accepts students from two pools each year: (a) Most of the accepted students come from students that apply for the first time (b) A smaller number of the accepted students come from students that apply for a second time. All students are required to take an entrance examination (the max. grade is 100). A second-try student who scored 70 wants to estimate the probability he will be accepted.

The following information is available from previous years: (a) the grade distribution of accepted second-try-students is a Gaussian with mean 80 and st.dev 10. (b)The grade distribution of not accepted second-try-students is a Gaussian with mean 40 and st. dev 20. (c) The probability of a second-try-student being accepted is 1/5.

$P(accepted 2nd try IF 70)= frac{P(70 IF accepted 2nd try) P(accepted 2nd try)}{P(70 IFNOTaccepted 2nd try)P(NOTaccepted 2nd try)+P(70 IF accepted2ndtry)P(accepted 2nd try)}$

Doing computations in R, we get:

P(70IFaccepted2ndtry)<-dnorm(70,mean=80,s=10) which gives 0.024.

P(70ifNOTaccepted)<-dnorm(70,mean=40,s=10), which gives 0.0064

So, P(accepted 2nd try IF70)=0.024 X 1/5 /(0.024 X 1/5 + 4/5 0.0064)=0.0048/(0.0048+0.00512)=0.0048/0.00992=0.483

Similarly,

$P(NOTaccepted 2nd try IF 70)= frac{P(70 IF NOTaccepted 2nd try) P(NOTaccepted 2nd try)}{P(70 IFNOTaccepted 2nd try)P(NOTaccepted 2nd try)+P(70 IF accepted2ndtry)P(accepted 2nd try)}$

So, P(NOTaccepted 2nd try IF70)=0.0064 X 4/5 /(0.024 X 1/5 + 4/5 0.0064)=0.0064/(0.0048+0.00512)=0.556.

Unfortunately, the probability of not being accepted is greater than that of being accepted, so the student is not too happy about this.

A Bibliographical note: In reference [2], ML and MAP were compared for the estimation of physiological parameters. It was concluded that MAP gave a more precise estimate than ML. In general, the authors conclude in case where one has few or noise-impaired measurements, the a priori information can be very useful when incorporated in a MAP-type estimation.

[1] Myung, J., Tutorial on Maximum Likelihood Estimation, Journal of Mathematical Psychology,

[2] Sparacino, G. et al., “Maximum-Likelihood Versus Maximum a Posteriori Parameter Estimation of Physiological System Models: The C-peptide Impulse Response Case Study“, IEEE Transactions on Biomedical Engineering, vol. 47, no. 6, pp. 801-811, June 2000.

Category: statistical learning

Statistical Learning: Maximum Likelihood and Maximum A Posteriori Examples