A new embedded feature selection method for high dimensional datasets

Date(s) - 04/22/2013
4:00 pm - 5:15 pm

Dr. Panos M. Pardalos, Distinguished Professor, Director of CAO, Industrial and Systems Engineering, UF

High Dimensional datasets are currently prevalent in many biomedical applications. Classification and feature selection are common tasks performed on such datasets. In this talk, a new embedded feature selection method for high dimensional datasets is introduced by incorporating sparsity in Proximal Support Vector Machines (PSVMs). Our method called Sparse Proximal Support Vector Machines (sPSVMs) learns a sparse representation of PSVMs by first casting it as an equivalent least squares problem and then introducing the l1-norm for sparsity. An efficient algorithm based on alternate optimization techniques is proposed. Numerical experiments on several publicly available datasets show that our proposed method can obtain competitive or better performance compared with other embedded feature selection methods. Moreover, sPSVMs remove more than 98% features in many high dimensional datasets without compromising on generalization performance and also show consistency in the feature selection process. Additionally, sPSVMs can be viewed as inducing class-specific local sparsity instead of global sparsity like other embedded methods and thus offer the advantage of interpreting the selected features in the context of the classes.