1. The Maximum Likelihood (ML) Approach
Here the parameter to be estimated is considered to be a random variable.
Example: A hardware defect started appearing in the assembly line of a computer manufacturer. In the past week, the following observations of the defect were made: Monday (2), Tuesday (2), Wednesday(3), Thursday(1), Friday(0). The assembly line manager knows that the distribution of the defects follows the Poisson distribution, which is a discrete distribution used often to model the number of random events that occur in an interval of time. The probability that K events occur in a given interval is given by:
is the parameter of the distribution that is also equal to the mean and variance of the distribution. The manager will use maximum likelihood to estimate
.
The likelihood function to maximize is proportional to:
In order to maximize this function, we take the derivative and set it to 0, which eventually yields:
Statistical learning review question: What is the relationship between ML and Least Squares Error Estimation? When observations are independent, and normally distributed with constant variance, then ML and LSE produce the same parameter values [1].
2. The Maximum A Posteriori (MAP) Approach
Here the parameter to be estimated is considered fixed, but unknown.
Example : A technical institute in Greece accepts students from two pools each year: (a) Most of the accepted students come from students that apply for the first time (b) A smaller number of the accepted students come from students that apply for a second time. All students are required to take an entrance examination (the max. grade is 100). A second-try student who scored 70 wants to estimate the probability he will be accepted.
The following information is available from previous years: (a) the grade distribution of accepted second-try-students is a Gaussian with mean 80 and st.dev 10. (b)The grade distribution of not accepted second-try-students is a Gaussian with mean 40 and st. dev 20. (c) The probability of a second-try-student being accepted is 1/5.
Doing computations in R, we get:
P(70IFaccepted2ndtry)<-dnorm(70,mean=80,s=10) which gives 0.024.
P(70ifNOTaccepted)<-dnorm(70,mean=40,s=10), which gives 0.0064
So, P(accepted 2nd try IF70)=0.024 X 1/5 /(0.024 X 1/5 + 4/5 0.0064)=0.0048/(0.0048+0.00512)=0.0048/0.00992=0.483
Similarly,
So, P(NOTaccepted 2nd try IF70)=0.0064 X 4/5 /(0.024 X 1/5 + 4/5 0.0064)=0.0064/(0.0048+0.00512)=0.556.
Unfortunately, the probability of not being accepted is greater than that of being accepted, so the student is not too happy about this.
A Bibliographical note: In reference [2], ML and MAP were compared for the estimation of physiological parameters. It was concluded that MAP gave a more precise estimate than ML. In general, the authors conclude in case where one has few or noise-impaired measurements, the a priori information can be very useful when incorporated in a MAP-type estimation.
[1] Myung, J., Tutorial on Maximum Likelihood Estimation, Journal of Mathematical Psychology,
[2] Sparacino, G. et al., “Maximum-Likelihood Versus Maximum a Posteriori Parameter Estimation of Physiological System Models: The C-peptide Impulse Response Case Study“, IEEE Transactions on Biomedical Engineering, vol. 47, no. 6, pp. 801-811, June 2000.