In this discussion, I tried to explain techniques for "Multi Class classification with Binary Classifiers" through the simplest possible way and finally added some observations (based on my own experience).
The popular methods to achieve multi-class classifiers with binary classifiers are: (1) "One-versus-all" [1],[3], method using winner-takes-all strategy; (2) one-versus-one method implemented by max-wins voting [4]; and (3) error-correcting codes [2].
The popular methods to achieve multi-class classifiers with binary classifiers are: (1) "One-versus-all" [1],[3], method using winner-takes-all strategy; (2) one-versus-one method implemented by max-wins voting [4]; and (3) error-correcting codes [2].
(1) "One-versus-all": Generally such methods uses hierarchical division of the output space. The method uses K−1 binary classifiers (represented as node of binary tree, see Figure-1) for a K-class problem. At each node of the tree, a simple classifier, usually a binary classifier, makes the discrimination between the different child classes Following a path from the root node to a leaf node leads to a classification of a new pattern. Several methods have been proposed for this hierarchical classification.
Figure-1: A tree for 5-class problem
(2) one-versus-one method: Hastie and Tibshirani [4] proposed a good general strategy called pairwise coupling for combining posterior probabilities provided by individual binary classifiers in order to do multiclass classification. Since SVMs do not naturally give out posterior probabilities, they suggested a particular way of generating these probabilities from the binary SVM outputs and then used these probabilities together with pairwise coupling to do multiclass classification.
My observations: I tried to focus on some important issues that can be explored to improve the performance of both techniques to achieve multi-class classifiers (based on my own experience):
- Point-1: The major problem with "Hierarchical methods" are: once, we assign the classes at initial stage, we cannot change it at later stages. So, miss-classification(s) at starting stage will continue till the end of the classification. Such problems may occur with data having weak or messy class boundaries.
- Point-2: We may face same problems with probabilistic pairwise merging of binary classes, if system does not contain the facility of revision of previously merged classes.
(3) error-correcting codes [2]: Each class is assigned a unique binary string of length 'n'; we will refer to these strings as codewords." Then 'n' binary functions are learned, one for each bit position in these binary strings. During training for an example from class 'i', the desired outputs of these 'n' binary functions are speci ed by the codeword for class 'i'. With artificial neural networks, these 'n' functions can be implemented by the 'n' output units of a single network. New values of 'x' are classi ed by evaluating each of the 'n' binary functions to generate an n-bit string 's'. This string is then compared to each of the 'k' codewords, and 'x' is assigned to the class whose codeword is closest, according to some distance measure, to the generated string 's'.
My observations: Other than issues related to robustness given in Subsection "3.3 Robustness" [2], I got some issues which can be explored (i.e., may result in improvement in score)
- Point-1: If it becomes possible to connect the features used in classification with "code-words" related to error correcting codes then I think the construction of "code-words" will be more realistic. I hope It will improve the performance.
References
- Ryan Rifkin and Aldebaro Klautau. Parallel networks that learn to pronounce english text. Journal of Machine Learning Research, pages 101–141, 2004.
- Dietterich, T., Bakiri, G.: Solving multiclass problem via error-correcting output code. Journal of Artificial Intelligence Research, Vol. 2 (1995) 263–286.
- Shailesh Kumar, Joydeep Ghosh, and Melba M. Crawford. Hierarchical fusion of multiple classifiers for hyperspectral data analysis. Pattern Analysis and Applications, 5:210–220, 2002.
- Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: Jordan, M.I., Kearns, M.J., Solla, A.S. (eds.): Advances in Neural Information Processing Systems 10. MIT Press (1998).
Hi Niraj. Your code for SVM is really good. How can I change the kernel to RBF and enter its value. Is it through Command Line Arguments?
ReplyDeleteDo have any idea of optimisation of SVM parameters?
ReplyDelete