BIDAC

Development of machine learning and deep learning algorithms for imbalanced and one-class classification problems

PI - Amir Ahmad, College of Information Technology, UAE University, Al Ain, UAE

Co-PI - Dr. Shehroz S. Khan, Toronto Rehabilitation Institute, University of Toronto, Canada.

Duration 4 Years

Supervised classification is the process of mapping input data to predefined classes through mathematical modelling. Classification has many useful applications in terms of disease detection, spam detection, speech recognition, information retrieval and many others. Majority of supervised classification algorithms infers a function from labelled training data consisting of a set of training examples. These training examples occur in either raw form (e.g. images) or meaningful features are extracted from them. Supervised methods work better when the data objects in different classes (e.g. disease vs healthy) are equally distributed. If the distribution of data objects is skewed, then the classification decisions may be heavily influenced by the majority class and results may be unreliable. Another limiting factor of supervised classification methods is that it requires data objects belonging to at least two classes. In some situation, datasets have only one class.

The project will address the above two challenges. One of the ways to handle data imbalance is to create new artificial data points for the minority class to balance the data points in the classes. After this step, the project will propose new methods to handle one class classification problems that can be trained on only positive samples and identify negative examples as anomalies. The outcomes of this project will lead to the development of better and accurate classification algorithms and will help in solving important problems related to health, network security, industries, etc.

The Big Data Analytics Center (BIDAC)

Development of machine learning and deep learning algorithms for imbalanced and one-class classification problems