Support vector machine for solving small dataset problem

Data quantity is the main concern in the small data set problem, because usually insufficient data information will not lead to a robust classification performance. How to extract more effective information from a small data set is thus of considerable interest. A computational technique called Supp...

Full description

Saved in:
Bibliographic Details
Main Author: Abdul Rahman, Ahmad Rijal
Format: Thesis
Language:English
Published: 2012
Subjects:
Online Access:http://eprints.utm.my/id/eprint/32547/1/AhmadRijalAbdulRahmanMFKE2012.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data quantity is the main concern in the small data set problem, because usually insufficient data information will not lead to a robust classification performance. How to extract more effective information from a small data set is thus of considerable interest. A computational technique called Support Vector Machine (SVM) constructs a hyperplane or set of hyperplanes in a high or infinite dimensional space, which can be used for classification, regression or other tasks, is proposed for this project. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data points of any class (so-called functional margin). In general, the larger the margin the lower the generalization error of the classifier is achieved. In this research, Support Vector Machine (SVM) is employed for solving small dataset problems in binary classification. A lot of performance measure can be used to measure the performance of data. This research used accuracy as a performance measure. In order to improve the performance of accuracy, SMOTE (Synthetic Minority Oversampling Technique) algorithm has been used to balance the data with creates a synthetic data in the minority class for imbalanced dataset or both of negative and positive class for balanced dataset problem. An algorithm of SVM and SMOTE has been developed using Matlab.