Session Program

 

  • 10 July 2017
  • 04:30PM - 06:30PM
  • Room: Auditorium
  • Chairs: Francisco Herrera and Alberto Fernández

Fuzzy Systems for Big Data

Abstract - The significance and benefits of addressing classification tasks in Big Data applications is beyond any doubt. To do so, learning algorithms must be scalable to cope with such a high volume of data. The most suitable option to reach this objective is by using a MapReduce programming scheme, in which algorithms are automatically executed in a distributed and fault tolerant way. Among different available tools that support this framework, Spark has emerged as a "de facto" solution when using iterative approaches. In this work, our goal is to design and implement an Evolutionary Fuzzy Rule Selection algorithm within a Spark environment. To do so, we build different local rule bases within each Map Task that are later optimized by means of a genetic process. With this procedure, we seek to minimize the total number of rules that are gathered by each Reduce task to obtain a compact and accurate Fuzzy Rule Based Classification System. In particular, we set the experimental framework in the scenario of imbalanced classification. Therefore, the final objective will be analyzing the best synergy between the novel Evolutionary Fuzzy Rule Selection algorithm and the solutions applied to cope with skewed class distributions, namely cost-sensitive learning, random under-sampling and random- oversampling.
Abstract - The main drawback of Fuzzy Rule-Based Classification Systems (FRBCSs) when they are applied in Big Data problems is the lack of scalability. Previously proposed approaches consist in concurrently fitting several Chi et al. FRBCSs whose rule bases are then aggregated to obtain the final model. This methodology is seriously affected by the degree of parallelism used for the execution of the algorithm, showing a significant decrease in classification performance as the degree of parallelism increases. This work focuses on the design of a new FRBCS for Big Data classification problems (CHI-BD) that generates exactly the same rule base regardless of the degree of parallelism. Our approach recovers the model that would be built by the original Chi et al. algorithm if it was able to deal with Big Data problems.
Abstract - The reduction of energy consumption in buildings is one of the goals to improve energy efficiency. One way to achieve energy savings in buildings is to develop intelligent control heating strategies that are able to reduce the power consumption by predicting the behavior of the thermal dynamics under different control schemes. One way to accomplish this is by means of learning fuzzy rules using the data collected from different sensors installed in buildings to generate regression models that are accurate and interpretable, so the generated models can be understood by the experts who approve the energy-saving schemes. However, one important issue is the generation of accurate knowledge bases of fuzzy rules for regression that can scale with the large amount of information generated by the many sensors installed in buildings, which will continue to grow in the coming years. For this purpose, in this paper we evaluate the scalability of two genetic fuzzy systems, FRULER and S-FRULER in the domain of thermal dynamics in buildings, using real data from a residential college at the USC.
Abstract - Incremental approaches may be used to speed up the learning process when a classification algorithm is dealing with big data bases. In this work we present a study on how the size and composition of the set of learning examples that are given to an incremental algorithm affect its behaviour.
Abstract - Internet and the new technologies are generating new scenarios with and a significant increase of data volumes. The treatment of this huge quantity of information is impossible with traditional methodologies and we need to design new approaches towards distributed paradigms such as MapReduce. This situation is widely known in the literature as Big Data. This contribution presents a first approach to handle fuzzy emerging patterns in big data environments. This new algorithm is called EvAFP-Spark and is development in Apache Spark based on MapReduce. The use of this paradigm allows us the analysis of huge datasets efficiently. The main idea of EvAEFP-Spark is to modify the methodology of evaluation of the populations in the evolutionary process. In this way, a population is evaluated in the different maps, obtained in the Map phase of the paradigm, and for each one a confusion matrix is obtained. Then, the Reduce function accumulates the confusion matrix for each map in a general matrix in order to evaluate the fitness of the individuals. An experimental study with high dimensional datasets is performed in order to show the advantages of this algorithm in emerging patterns mining.
Abstract - The k-Nearest Neighbors (kNN) classifier is one of the most effective methods in supervised learning problems. It classifies unseen cases comparing their similarity with the training data. Nevertheless, it gives to each labeled sample the same importance to classify. There are several approaches to enhance its precision, with the Fuzzy k-Nearest Neighbors (Fuzzy-kNN) classifier being among the most successful ones. Fuzzy-kNN computes a fuzzy degree of membership of each instance to the classes of the problem. As a result, it generates smoother borders between classes. Apart from the existing kNN approach to handle big datasets, there is not a fuzzy variant to manage that volume of data. Nevertheless, calculating this class membership adds an extra computational cost becoming even less scalable to tackle large datasets because of memory needs and high runtime. In this work, we present an exact and distributed approach to run the Fuzzy-kNN classifier on big datasets based on Spark, which provides the same precision than the original algorithm. It presents two separately stages. The first stage transforms the training set adding the class membership degrees. The second stage classifies with the kNN algorithm the test set using the class membership computed previously. In our experiments, we study the scaling-up capabilities of the proposed approach with datasets up to 11 million instances, showing promising results.
Abstract - This work deals with the design of scalable methodologies to build the Rule Bases of Linguistic Fuzzy Rule Based Systems from examples for Fuzzy Regression in Big Data environments. We propose a distributed MapReduce model based on the use of an adaptation of a classic data driven method followed by an Evolutionary Adaptive Defuzzification to increase the accuracy of the final fuzzy model.