Evolutionary Vertical Size Reduction: A novel Approach for Big Data Computing


Abstract views: 41 / PDF downloads: 78

Authors

  • M. Najarian Department of Industrial Engineering, University of Houston, Houston, USA
  • Z. Sarmast Department of Mathematics, University of Houston, Houston, USA
  • S.M. Ghasemi Department of Mathematics, University of Houston, Houston, USA
  • S. Sarmadi Department of Mathematics, University of Houston, Houston, USA

Keywords:

Big Data, Data-Mining, Gene p53 Mutant, High-Performance Computing, Instance Selection, Size Reduction

Abstract

Complexity of model based machine learning techniques depends mainly on the size of feature vector, so various feature selection and feature extraction methods are widely employed for size reduction. Although horizontal size reduction approaches can exponentially subside the complexity of learning algorithms, utilizing all instances to build and train any models can potentially be problematic. As illustration, not only problematic data can lead for over-fitting and over-generalization, but using noisy data can cause no proper decision makings. Furthermore, performance and complexity of instance/memory based classifiers (e.g. k-NN) are highly depends on train samples. In this paper, evolutionary vertical size reduction approach is introduced to identify and filter problematic and noisy data, not only for enhancing the performance and robustness of machine learning techniques but for ebbing the needs for remarkable system resources. To this end, genetic algorithms are proposed to label problematic instances in datasets. To evaluate the performance of proposed approach, benchmark classification datasets are employed to quantify the impacts of filtering noisy and problematic data in classification applications. Despite the fact that proposed vertical size reduction approach enhances machine learning techniques (mainly supervised learning methods), real world applications which contain big data would require remarkable system resources. To address this drawback, cloud computing frameworks (such as MapReduce on Hadoop) is recommended to make the proposed vertical size reduction more applicable for big data processing.

Downloads

Published

15-09-2018

How to Cite

M. Najarian, Z. Sarmast, S.M. Ghasemi, & S. Sarmadi. (2018). Evolutionary Vertical Size Reduction: A novel Approach for Big Data Computing. International Journal of Mathematics And Its Applications, 6(3), 215–225. Retrieved from http://ijmaa.in/index.php/ijmaa/article/view/373

Issue

Section

Research Article