Nonsmooth Optimization Models and Algorithms for Data Clustering and Visualization

Nonsmooth Optimization Models and Algorithms for Data Clustering and Visualization
Author :
Publisher :
Total Pages : 326
Release :
ISBN-10 : OCLC:967595662
ISBN-13 :
Rating : 4/5 (62 Downloads)

Synopsis Nonsmooth Optimization Models and Algorithms for Data Clustering and Visualization by : Ehsan Mohebi

"Cluster analysis deals with the problem of organization of a collection of patterns into clusters based on a similarity measure. Various distance functions can be used to define this measure. Clustering problems with the similarity measure defined by the squared Euclidean distance have been studied extensively over the last five decades. However, problems with other Minkowski norms have attracted significantly less attention. The use of different similarity measures may help to identify different cluster structures of a data set. This in turn may help to significantly improve the decision making process. High dimensional data visualization is another important task in the field of data mining and pattern recognition. To date, the principal component analysis and the self-organizing maps techniques have been used to solve such problems. In this thesis we develop algorithms for solving clustering problems in large data sets using various similarity measures. Such similarity measures are based on the squared L2 as well as L1 and L {infinity symbol} norms. In all cases the clustering problem is a global optimization problem with nonsmooth nonconvex objective functions. In many datasets these problems are large scale and the conventional global optimization algorithms are not efficient for solving such problems. Therefore we propose to apply local search methods for solving clustering problems, however the success of these methods strongly depends on the choice of starting cluster centers. To deal with the nonconvexity of the clustering problems we propose incremental algorithms for their solution which helps us to design a special procedure to generate starting points for cluster centers. Such an approach allows one to find global or near global solutions to the clustering problem. In order to solve nonsmooth clustering problems we apply both efficient nonsmooth optimization algorithms as well as smoothing techniques. To test the proposed algorithms we apply them to solve clustering problems in small, medium size and large data sets. Furthermore, these algorithms are compared with many other clustering algorithms using results of numerical experiments. The Self Organizing Maps (SOM) is one of the topology visualizing tool that contains a set of neurons that gradually adapt to input data space by competitive learning and form clusters. The topology preservation of the SOM strongly depends on the learning process. Due to this limitation one cannot guarantee the convergence of the SOM in data sets with clusters of arbitrary shape. Therefore it is important to develop more accurate data visualization and clustering algorithms. In this thesis, Constrained SOM (CSOM) is proposed as the new version of the SOM by modifying the learning algorithm. The idea is to introduce an adaptive constraint parameter to the learning process to improve the topology preservation and mapping quality of the basic SOM. The computational complexity of the CSOM is less than that of the SOM. Mapping quality of the SOM is sensitive to the map topology and initialization of neurons. Thus in this research, a modified version of the SOM (MSOM) is proposed to improve the convergence of the SOM. An initialization algorithm based on split and merge of clusters is introduced to initialize neurons of the SOM. The initialization algorithm speeds up the learning process in large high dimensional data sets. A topology based on this initialization is developed to minimize the vector quantization error and topology preservation of the self organizing maps. The CSOM and MSOM algorithms are tested on small to large size real-world datasets. Finally, a convolutional structure of the Recursive Modified SOM is proposed to cope with the diversity of styles and shapes in digits recognition. The proposed recursive structure can learn various behaviors of incoming images. The numerical results on the well-known MNIST dataset demonstrate the superiority of the proposed algorithm over existing SOM-based approaches." -- Abstract.

Partitional Clustering via Nonsmooth Optimization

Partitional Clustering via Nonsmooth Optimization
Author :
Publisher : Springer Nature
Total Pages : 343
Release :
ISBN-10 : 9783030378264
ISBN-13 : 3030378268
Rating : 4/5 (64 Downloads)

Synopsis Partitional Clustering via Nonsmooth Optimization by : Adil M. Bagirov

This book describes optimization models of clustering problems and clustering algorithms based on optimization techniques, including their implementation, evaluation, and applications. The book gives a comprehensive and detailed description of optimization approaches for solving clustering problems; the authors' emphasis on clustering algorithms is based on deterministic methods of optimization. The book also includes results on real-time clustering algorithms based on optimization techniques, addresses implementation issues of these clustering algorithms, and discusses new challenges arising from big data. The book is ideal for anyone teaching or learning clustering algorithms. It provides an accessible introduction to the field and it is well suited for practitioners already familiar with the basics of optimization.

Graph-Based Clustering and Data Visualization Algorithms

Graph-Based Clustering and Data Visualization Algorithms
Author :
Publisher : Springer Science & Business Media
Total Pages : 120
Release :
ISBN-10 : 9781447151586
ISBN-13 : 1447151585
Rating : 4/5 (86 Downloads)

Synopsis Graph-Based Clustering and Data Visualization Algorithms by : Ágnes Vathy-Fogarassy

This work presents a data visualization technique that combines graph-based topology representation and dimensionality reduction methods to visualize the intrinsic data structure in a low-dimensional vector space. The application of graphs in clustering and visualization has several advantages. A graph of important edges (where edges characterize relations and weights represent similarities or distances) provides a compact representation of the entire complex data set. This text describes clustering and visualization methods that are able to utilize information hidden in these graphs, based on the synergistic combination of clustering, graph-theory, neural networks, data visualization, dimensionality reduction, fuzzy methods, and topology learning. The work contains numerous examples to aid in the understanding and implementation of the proposed algorithms, supported by a MATLAB toolbox available at an associated website.

Partitional Clustering Algorithms

Partitional Clustering Algorithms
Author :
Publisher : Springer
Total Pages : 420
Release :
ISBN-10 : 9783319092591
ISBN-13 : 3319092596
Rating : 4/5 (91 Downloads)

Synopsis Partitional Clustering Algorithms by : M. Emre Celebi

This book focuses on partitional clustering algorithms, which are commonly used in engineering and computer scientific applications. The goal of this volume is to summarize the state-of-the-art in partitional clustering. The book includes such topics as center-based clustering, competitive learning clustering and density-based clustering. Each chapter is contributed by a leading expert in the field.

Cluster Analysis for Data Mining and System Identification

Cluster Analysis for Data Mining and System Identification
Author :
Publisher : Springer Science & Business Media
Total Pages : 317
Release :
ISBN-10 : 9783764379872
ISBN-13 : 3764379871
Rating : 4/5 (72 Downloads)

Synopsis Cluster Analysis for Data Mining and System Identification by : János Abonyi

The aim of this book is to illustrate that advanced fuzzy clustering algorithms can be used not only for partitioning of the data. It can also be used for visualization, regression, classification and time-series analysis, hence fuzzy cluster analysis is a good approach to solve complex data mining and system identification problems. This book is oriented to undergraduate and postgraduate and is well suited for teaching purposes.

Data Clustering

Data Clustering
Author :
Publisher : SIAM
Total Pages : 471
Release :
ISBN-10 : 9780898716238
ISBN-13 : 0898716233
Rating : 4/5 (38 Downloads)

Synopsis Data Clustering by : Guojun Gan

Reference and compendium of algorithms for pattern recognition, data mining and statistical computing.

Optimization Models and Algorithms for Sample-preserved Classification and Clustering

Optimization Models and Algorithms for Sample-preserved Classification and Clustering
Author :
Publisher :
Total Pages : 207
Release :
ISBN-10 : OCLC:667834065
ISBN-13 :
Rating : 4/5 (65 Downloads)

Synopsis Optimization Models and Algorithms for Sample-preserved Classification and Clustering by : Ya-Ju Fan

This dissertation presents the development of new optimization models and algorithms for sample-preserved classification and clustering. A sample-preserved method keeps some or all of the existing samples when training a rule for classification or clustering, and continues to use them in the testing or predicting phase. Developing a sample-preserved method provides the capability of analyzing time series data due to the largely applied similarity measures on time series. A proposed sample-preserved classification technique, called Support Feature Machine (SFM), finds an optimal combination of features that gives the best classification based on the nearest neighbor rule. It keeps all baseline samples of the selected features in the predicting phase. Variations of SFM models are also presented. In addition, the bilinear program sample-preserved k-median (BPSPKM) clustering algorithm is introduced. While the original k-median problem can be solved by a simple and efficient bilinear program algorithm, it does not have the sample-preserved property, and only works with the 1-norm distance. The sample-preserved k-median (SPKM) clustering method is formulated as an integer programming problem, which is very hard to solve. A bilinear program algorithm is herein proposed in order to obtain local optimal solutions of the SPKM clustering method, as well as a new sequential search algorithm that can solve the SPKM clustering more efficiently. Finally, a novel feature space sample-preserved k-median (FSSPKM) clustering algorithm is proposed, as well as feature selection methods tailor made for such clustering technique. The experimental results show that the original k-median clustering fails to classify time series data due to the lack of the sample-preserved property, and the utilization of time series similarity measures. The sample-preserved medians can avoid having invalid values in some application domains and can be used to represent the samples in the clusters. The BPSPKM clustering algorithm with the Euclidean distance is suggested for clustering attribute (non-time series), univariate time series and multivariate time series data sets. Furthermore, the proposed feature selection methods consider the distances between cluster centers and cluster densities. The results show that the proposed algorithms outperform other feature selection techniques used in the original k-median methods.

Unsupervised Learning Approaches for Dimensionality Reduction and Data Visualization

Unsupervised Learning Approaches for Dimensionality Reduction and Data Visualization
Author :
Publisher : CRC Press
Total Pages : 174
Release :
ISBN-10 : 9781000438314
ISBN-13 : 1000438317
Rating : 4/5 (14 Downloads)

Synopsis Unsupervised Learning Approaches for Dimensionality Reduction and Data Visualization by : B.K. Tripathy

Unsupervised Learning Approaches for Dimensionality Reduction and Data Visualization describes such algorithms as Locally Linear Embedding (LLE), Laplacian Eigenmaps, Isomap, Semidefinite Embedding, and t-SNE to resolve the problem of dimensionality reduction in the case of non-linear relationships within the data. Underlying mathematical concepts, derivations, and proofs with logical explanations for these algorithms are discussed, including strengths and limitations. The book highlights important use cases of these algorithms and provides examples along with visualizations. Comparative study of the algorithms is presented to give a clear idea on selecting the best suitable algorithm for a given dataset for efficient dimensionality reduction and data visualization. FEATURES Demonstrates how unsupervised learning approaches can be used for dimensionality reduction Neatly explains algorithms with a focus on the fundamentals and underlying mathematical concepts Describes the comparative study of the algorithms and discusses when and where each algorithm is best suitable for use Provides use cases, illustrative examples, and visualizations of each algorithm Helps visualize and create compact representations of high dimensional and intricate data for various real-world applications and data analysis This book is aimed at professionals, graduate students, and researchers in Computer Science and Engineering, Data Science, Machine Learning, Computer Vision, Data Mining, Deep Learning, Sensor Data Filtering, Feature Extraction for Control Systems, and Medical Instruments Input Extraction.

Metaheuristics for Data Clustering and Image Segmentation

Metaheuristics for Data Clustering and Image Segmentation
Author :
Publisher : Springer
Total Pages : 167
Release :
ISBN-10 : 9783030040970
ISBN-13 : 3030040976
Rating : 4/5 (70 Downloads)

Synopsis Metaheuristics for Data Clustering and Image Segmentation by : Meera Ramadas

In this book, differential evolution and its modified variants are applied to the clustering of data and images. Metaheuristics have emerged as potential algorithms for dealing with complex optimization problems, which are otherwise difficult to solve using traditional methods. In this regard, differential evolution is considered to be a highly promising technique for optimization and is being used to solve various real-time problems. The book studies the algorithms in detail, tests them on a range of test images, and carefully analyzes their performance. Accordingly, it offers a valuable reference guide for all researchers, students and practitioners working in the fields of artificial intelligence, optimization and data analytics.

Clustering Methods for Big Data Analytics

Clustering Methods for Big Data Analytics
Author :
Publisher : Springer
Total Pages : 192
Release :
ISBN-10 : 9783319978642
ISBN-13 : 3319978640
Rating : 4/5 (42 Downloads)

Synopsis Clustering Methods for Big Data Analytics by : Olfa Nasraoui

This book highlights the state of the art and recent advances in Big Data clustering methods and their innovative applications in contemporary AI-driven systems. The book chapters discuss Deep Learning for Clustering, Blockchain data clustering, Cybersecurity applications such as insider threat detection, scalable distributed clustering methods for massive volumes of data; clustering Big Data Streams such as streams generated by the confluence of Internet of Things, digital and mobile health, human-robot interaction, and social networks; Spark-based Big Data clustering using Particle Swarm Optimization; and Tensor-based clustering for Web graphs, sensor streams, and social networks. The chapters in the book include a balanced coverage of big data clustering theory, methods, tools, frameworks, applications, representation, visualization, and clustering validation.