Minimum Error Entropy Classification by Joaquim P. Marques de Sá, Luís M.A. Silva, Jorge M.F.

This publication explains the minimal mistakes entropy (MEE) thought utilized to info class machines. Theoretical effects at the internal workings of the MEE inspiration, in its software to fixing quite a few category difficulties, are offered within the wider realm of threat functionals.

Researchers and practitioners additionally locate within the e-book a close presentation of useful info classifiers utilizing MEE. those comprise multi‐layer perceptrons, recurrent neural networks, complexvalued neural networks, modular neural networks, and selection bushes. A clustering set of rules utilizing a MEE‐like suggestion can be awarded. Examples, assessments, overview experiments and comparability with related machines utilizing vintage methods, supplement the descriptions.

Xn ) from some continuous distribution with density f (x), its estimate can be obtained in an efficient way by the Parzen window method (see Appendix E), which produces the estimate 1 fˆ(x) ≡ fˆn (x) = n n j=1 1 K h x − xj h = 1 n n Kh (x − xj ) . 2) j=1 This estimate is also known as kernel density estimate (KDE). Properties and optimal choice of the bandwidth h for a kernel function K are discussed in Appendix E, √ where the justification to use the Gaussian kernel Gh (x) = exp(−x2 /2h2 )/( 2πh) is also provided.

5 0 1 Fig. 9 Contours of equal-LM SE for ω0 instances. Darker tones correspond to smaller values. , to the min Pe issue? The answer to this question is not easy even when we restrict the classes of (X, T ) distributions and the classifier families under consideration. On one hand, none of the previously discussed risk functionals provides, in general, the min Pe solution (although they can achieve that in particular cases); on the other hand, there is no theoretical evidence precluding the existence of a risk functional that would always provide the min Pe solution.

In general practice, however, this radiant scenario is far from being met for the following main reasons: 1. The classifier must be able to provide a good approximation of the conditional expectations E[Tk |x]. , more hidden neurons in the case of MLPs) than is adequate for a good generalization of its performance. ˆ MSE . This 2. The training algorithm must be able to reach the minimum of R is a thorny issue, since one will never know whether the training process converged to a global minimum or to a local minimum instead.

