Top

2021 | Book

Read chapter Read first chapter

Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges

Editors: Prof. Aboul Ella Hassanien, Prof. Ashraf Darwish

Publisher: Springer International Publishing

Book Series : Studies in Big Data

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book is intended to present the state of the art in research on machine learning and big data analytics. The accepted chapters covered many themes including artificial intelligence and data mining applications, machine learning and applications, deep learning technology for big data analytics, and modeling, simulation, and security with big data. It is a valuable resource for researchers in the area of big data analytics and its applications.

Frontmatter

Artificial Intelligence and Data Mining Applications

Frontmatter

Rough Sets and Rule Induction from Indiscernibility Relations Based on Possible World Semantics in Incomplete Information Systems with Continuous Domains

Abstract

Rough sets and rule induction in an incomplete and continuous information table are investigated under possible world semantics. We show an approach using possible indiscernibility relations, whereas the traditional approaches use possible tables. This is because the number of possible indiscernibility relations is finite, although we have the infinite number of possible tables in an incomplete and continuous information table. First, lower and upper approximations are derived directly using the indiscernibility relation on a set of attributes in a complete and continuous information table. Second, how these approximations are derived are described applying possible world semantics to an incomplete and continuous information table. Lots of possible indiscernibility relations are obtained. The actual indiscernibility relation is one of possible ones. The family of possible indiscernibility relations is a lattice for inclusion with the minimum and the maximum indiscernibility relations. Under the minimum and the maximum indiscernibility relations, we obtain four kinds of approximations: certain lower, certain upper, possible lower, and possible upper approximations. Therefore, there is no computational complexity for the number of values with incomplete information. The approximations in possible world semantics are the same as ones in our extended approach directly using indiscernibility relations. We obtain four kinds of single rules: certain and consistent, certain and inconsistent, possible and consistent, and possible and inconsistent ones from certain lower, certain upper, possible lower, and possible upper approximations, respectively. Individual objects in an approximation support single rules. Serial single rules from the approximation are brought into one combined rule. The combined rule has greater applicability than single rules that individual objects support.

Michinori Nakata, Hiroshi Sakai, Keitarou Hara

Big Data Analytics and Preprocessing

Abstract

Big data is a trending word in the industry and academia that represents the huge flood of collected data, this data is very complex in its nature. Big data as a term used to describe many concepts related to the data from technological and cultural meaning. In the big data community, big data analytics is used to discover the hidden patterns and values that give an accurate representation of the data. Big data preprocessing is considered an important step in the analysis process. It a key to the success of the analysis process in terms of analysis time, utilized resources percentage, storage, the efficiency of the analyzed data and the output gained information. Preprocessing data involves dealing with concepts like concept drift, data streams that are considered as significant challenges.

Noha Shehab, Mahmoud Badawy, Hesham Arafat

Artificial Intelligence-Based Plant Diseases Classification

Abstract

Machine learning techniques are used for classifying plant diseases. Recently, deep learning (DL) is applied in the classification process of image processing. In this chapter, convolutional neural network (CNN) is used to classify plant diseases images. However, CNN suffers from the hyper parameters problem which can affect the proposed model. Therefore, Gaussian optimization method is used to overcome the hyper parameters problem in CNN. This chapter proposed an artificial intelligence model for plants diseases classification based on convolutional neural network (CNN). The proposed model consists of three phases; (a) preprocessing phase, which augmented the data and balanced the dataset; (b) classification and evaluation phase based on pre-train CNN VGG16 and evaluate the results; (c) optimize the hyperparameters of CNN using Gaussian method. The proposed model is tested on the plant’s images dataset. The dataset consists of nine plants with thirty-three cases for diseased and healthy plant’s leaves. The experimental results before the optimization of pre-trained CNN VGG16 achieve 95.87% classification accuracy. The experimental results improved to 98. 67% classification accuracy after applied the Gaussian process for optimizing hyperparameters.

Lobna M. Abou El-Maged, Ashraf Darwish, Aboul Ella Hassanien

Artificial Intelligence in Potato Leaf Disease Classification: A Deep Learning Approach

Abstract

Potato leaf blight is one of the most devastating global plant diseases because it affects the productivity and quality of potato crops and adversely affects both individual farmers and the agricultural industry. Advances in the early classification and detection of crop blight using artificial intelligence technologies have increased the opportunity to enhance and expand plant protection. This paper presents an architecture proposed for potato leaf blight classification. This architecture depends on deep convolutional neural network. The training dataset of potato leaves contains three categories: healthy leaves, early blight leaves, and late blight leaves. The proposed architecture depends on 14 layers, including two main convolutional layers for feature extraction with different convolution window sizes followed by two fully connected layers for classification. In this paper, augmentation processes were applied to increase the number of dataset images from 1,722 to 9,822 images, which led to a significant improvement in the overall testing accuracy. The proposed architecture achieved an overall mean testing accuracy of 98%. More than 6 performance metrics were applied in this research to ensure the accuracy and validity of the presented results. The testing accuracy of the proposed approach was compared with that of related works, and the proposed architecture achieved improved accuracy compared to the related works.

Nour Eldeen M. Khalifa, Mohamed Hamed N. Taha, Lobna M. Abou El-Maged, Aboul Ella Hassanien

Granules-Based Rough Set Theory for Circuit Breaker Fault Diagnosis

Abstract

This chapter presents a new granules strategy integrated with rough set theory (RST) to extract diagnosis rules for redundant and inconsistent data set of high voltage circuit breaker (HVCB). In this approach, the diagnostic knowledge base is performed by the granules of indiscernible objects based on tolerance relation in which the objects are collected based on permissible scheme. This permissible scheme is decided by the opinion of the expert or the decision maker. In addition, a topological vision is introduced to induce the lower and upper approximations. Finally, the validation and effectiveness of the proposed granules strategy are investigated through a practical application of the high voltage circuit breaker fault diagnosis.

Rizk M. Rizk-Allah, Aboul Ella Hassanien

SQL Injection Attacks Detection and Prevention Based on Neuro—Fuzzy Technique

Abstract

A Structured Query Language (SQL) injection attack (SQLIA) is one of most famous code injection techniques that threaten web applications, as it could compromise the confidentiality, integrity and availability of the database system of an online application. Whereas other known attacks follow specific patterns, SQLIAs are often unpredictable and demonstrate no specific pattern, which has been greatly problematic to both researchers and developers. Therefore, the detection and prevention of SQLIAs has been a hot topic. This paper proposes a system to provide better results for SQLIA prevention than previous methodologies, taking in consideration the accuracy of the system and its learning capability and flexibility to deal with the issue of uncertainty. The proposed system for SQLIA detection and prevention has been realized on an Adaptive Neuro-Fuzzy Inference System (ANFIS). In addition, the developed system has been enhanced through the use of Fuzzy C-Means (FCM) to deal with the uncertainty problem associated with SQL features. Moreover, Scaled Conjugate Gradient algorithm (SCG) has been utilized to increase the speed of the proposed system drastically. The proposed system has been evaluated using a well-known dataset, and the results show a significant enhancement in the detection and prevention of SQLIAs.

Doaa E. Nofal, Abeer A. Amer

Convolutional Neural Network with Batch Normalization for Classification of Endoscopic Gastrointestinal Diseases

Abstract

In this paper, an approach for classifying gastrointestinal (GI) diseases from endoscopic images is proposed. The proposed approach is built using a convolutional neural network (CNN) with batch normalization (BN) and an exponential linear unit (ELU) as the activation function. The proposed approach consists of eight layers (six convolutional and two fully connected layers) and is used to identify eight types of GI diseases in version two of the Kvasir dataset. The proposed approach was compared with other CNN architectures (VGG16, VGG19, and Inception-v3) using five elements (number of convolutional layers, number of total parameters of the convolutional layers, number of epochs, validation accuracy and test accuracy). The proposed approach achieved good results compared to the compared architectures. It achieved a validation accuracy of 88%, which is superior to other architectures and a test accuracy of 87%, which outperforms the Inception-v3 architecture. Therefore, the proposed approach has less trained images and less computational complexity in the training phase.

Dalia Ezzat, Heba M. Afify, Mohamed Hamed N. Taha, Aboul Ella Hassanien

A Chaotic Search-Enhanced Genetic Algorithm for Bilevel Programming Problems

Abstract

In this chapter, we propose chaotic search-enhanced genetic algorithm for solving bilevel programming problem (BLPP). The proposed algorithm is a combination between enhanced genetic algorithm based on new selection technique named effective selection technique (EST) and chaos searching technique. Firstly, the upper level problem is solved using enhanced genetic algorithm based on EST. EST enables the upper level decision maker to choose an appropriate solution in anticipation of the lower level’s decision. Then, lower level problem is solved using genetic algorithm for the upper level solution. Secondly, local search based on chaos theory is applied for the upper level problem around the enhanced genetic algorithm solution. Finally, lower level problem is solved again using genetic algorithm for the chaos search solution. The incorporating between enhanced genetic algorithm supported by EST and chaos theory increases the search efficiency and helps in faster convergence of the algorithm. The performance of the algorithm has been evaluated on different sets of test problems linear and nonlinear problems, constrained and unconstrained problems and low-dimensional and high-dimensional problems. Also, comparison between the proposed algorithm results and other state-of-the-art algorithms is introduced to show the effectiveness and efficiency of our algorithm.

Y. Abo-Elnaga, S. Nasr, I. El-Desoky, Z. Hendawy, A. Mousa

Bio-inspired Machine Learning Mechanism for Detecting Malicious URL Through Passive DNS in Big Data Platform

Abstract

Malicious links are used as a source by the distribution channels to broadcast malware all over the Web. These links become instrumental in giving partial or full system control to the attackers. To overcome these issues, researchers have applied machine learning techniques for malicious URL detection. However, these techniques fall to identify distinguishable generic features that are able to define the maliciousness of a given domain. Generally, well-crafted URL’s features contribute considerably to the success of machine learning approaches, and on the contrary, poor features may ruin even good detection algorithms. In addition, the complex relationships between features are not easy to spot. The work presented in this paper explores how to detect malicious Web sites from passive DNS based features. This problem lends itself naturally to modern algorithms for selecting discriminative features in the continuously evolving distribution of malicious URLs. So, the suggested model adapts a bio-inspired feature selection technique to choose an optimal feature set in order to reduce the cost and running time of a given system, as well as achieving an acceptably high recognition rate. Moreover, a two-step artificial bee colony (ABC) algorithm is utilized for efficient data clustering. The two approaches are incorporated within a unified framework that operates on the top of Hadoop infrastructure to deal with large samples of URLs. Both the experimental and statistical analyses show that improvements in the hybrid model have an advantage over some conventional algorithms for detecting malicious URL attacks. The results demonstrated that the suggested model capable to scale 10 million query answer pairs with more than 96.6% accuracy.

Saad M. Darwish, Ali E. Anber, Saleh Mesbah

Machine Learning and Applications

Frontmatter

TargetAnalytica: A Text Analytics Framework for Ranking Therapeutic Molecules in the Bibliome

Abstract

Biomedical scientists often search databases of therapeutic molecules to answer a set of molecule-related questions. When it comes to drugs, finding the most specific target is a crucial biological criterion. Whether the target is a gene, protein, and cell line, target specificity is what makes a therapeutic molecule significant. In this chapter, we present TargetAnalytica, a novel text analytics framework that is concerned with mining the biomedical literature. Starting with a set of publications of interest, the framework produces a set of biological entities related to gene, protein, RNA, cell type, and cell line. The framework is tested against a depression-related dataset for the purpose of demonstration. The analysis shows an interesting ranking that is significantly different from a counterpart based on drugs.com’s popularity factor (e.g., according to our analysis Cymbalta appears only at position #10 though it is number one in popularity according to the database). The framework is a crucial tool that identifies the targets to investigate, provides relevant specificity insights, and help decision makers and scientists to answer critical questions that are not possible otherwise.

Ahmed Abdeen Hamed, Agata Leszczynska, Megean Schoenberg, Gergely Temesi, Karin Verspoor

Earthquakes and Thermal Anomalies in a Remote Sensing Perspective

Abstract

Earthquakes are the sudden tremors of the ground leaving behind damages to life and property, ranging from smaller to massive scale. Since very early times, earthquake prediction has been in the limelight of the scientific community. In the texts of ancient civilizations, earthquake predictions can be found which is based on the position of the planets with respect to the earth. With the advent of real time observations from various data sources many attempts are going on this direction. The present chapter investigates and put forward some facts based on data obtained from satellites for an earthquake which occurred in Imphal, India, in 2016. It studies the thermal anomaly data that took place before the earthquake. MODIS Land Surface Temperature (LST) product was used wherein daily night time images of 6 years have been used for the study. Good quality pixels having maximum information were identified by performing Quality Assurance of the datasets. A change detection technique for satellite data analysis namely Robust Satellite Technique has been used and RETIRA index has been calculated. The study of this RETIRA index has been done for 3 years and it has been found that the RETIRA index is considerably high for the earthquake year. But it cannot be concluded that high value of RETIRA index is a sure indicator for an earthquake and hence it leaves scope for future studies.

Utpal Kanti Mukhopadhyay, Richa N. K. Sharma, Shamama Anwar, Atma Deep Dutta

Literature Review with Study and Analysis of the Quality Challenges of Recommendation Techniques and Their Application in Movie Ratings

Abstract

During the past few decades, the web-based services and organizations like Amazon, Netflix and YouTube have been raised aggressively. These web services have shown the demand for the recommender systems and their growing place in our lives. More steps deeper, we noticed that the severity of the quality and accuracy of these recommendation systems is very high to match users with same interests. For that reason and for being in competitive position with the most outstanding recommendation web services, these recommendation systems should be always monitored and evaluated from a quality perspective. However, due to the steep growth rate of the available web-based services, new challenges like data sparsity, scalability problem and cold start issue have been burst and threaten the performance and the quality of the predicted recommendations. Accordingly, many data scientists and researchers got excited to figure out ways for these challenges especially if they are in scaled environments and distributed systems. These solutions could be achieved using multiple approaches such as machine learning and data mining.

Hagar El Fiky, Wedad Hussein, Rania El Gohary

Predicting Student Retention Among a Homogeneous Population Using Data Mining

Abstract

Student retention is one the biggest challenges facing academic institutions worldwide. In this research, we present a novel data mining approach to predict retention among a homogeneous group of students with similar social and cultural background at an academic institution based in the Middle East. Several researchers have studied retention by focusing on student persistence from one term to another. Our study, on the other hand, builds a predictive model to study retention until graduation. Moreover, our research relies solely on pre-college and college performance data available in the institutional database. We use both standard as well as ensemble algorithms to predict dropouts at an early stage and apply the SMOTE balancing technique to reduce the performance bias of machine learning algorithms. Our study reveals that the Gradient Boosted Trees is a robust algorithm that predicts dropouts with an accuracy of 79.31% and AUC of 88.4% using only pre-enrollment data. The effectiveness of the algorithms further increases with the use of college performance data.

Ghazala Bilquise, Sherief Abdallah, Thaeer Kobbaey

An Approach for Textual Based Clustering Using Word Embedding

Abstract

Numerous endeavors have been made to improve the retrieval procedure in Textual Case-Based Reasoning (TCBR) utilizing clustering and feature selection strategies. SOPHisticated Information Analysis (SOPHIA) approach is one of the most successful efforts which is characterized by its ability to work without the domain of knowledge or language dependency. SOPHIA is based on the conditional probability, which facilitates an advanced Knowledge Discovery (KD) framework for case-based retrieval. SOPHIA attracts clusters by themes which contain only one word in each. However, using one word is not sufficient to construct cluster attractors because the exclusion of the other words associated with that word in the same context could not give a full picture of the theme. The main contribution of this chapter is to introduce an enhanced clustering approach called GloSOPHIA (GloVe SOPHIA) that extends SOPHIA by integrating word embedding technique to enhance KD in TCBR. A new algorithm is proposed to feed SOPHIA with similar terms vector space gained from Global Vector (GloVe) embedding technique. The proposed approach is evaluated on two different language corpora and the results are compared with SOPHIA, K-means, and Self- Organizing Map (SOM) in several evaluation criteria. The results indicate that GloSOPHIA outperforms the other clustering methods in most of the evaluation criteria.

Ehab Terra, Ammar Mohammed, Hesham Hefny

A Survey on Speckle Noise Reduction for SAR Images

Abstract

Speckle noise disturbance is the most essential factor that affects the quality and the visual appearance of the synthetic aperture radar (SAR) coherent images. For remote sensing systems, the initial step always involves a suitable method to reduce the effect of speckle noise. Several non-adaptive and adaptive filters have been proposed to enhance the noisy SAR images. In this chapter, we introduce a compressive survey about speckle noise reduction in SAR images followed by two proposed non-adaptive filters. These proposed filters utilize traditional mean, median, root-mean square values, and large size filter kernels to improve the SAR image appearance while maintaining image information. The performance of the proposed filters are compared with a number of non-adaptive filters to assess their abilities to reduce speckle noise. For quantitative measurements, four metrics have been used to evaluate the performances of the proposed filters. From the experimental results, the proposed filters have achieved promising results for significantly suppressing speckle noise and preserving image information compared with other well-known filters.

Ahmed S. Mashaly, Tarek A. Mahmoud

Comparative Analysis of Different Approaches to Human Activity Recognition Based on Accelerometer Signals

Abstract

Recently, automatic human activity recognition has drawn much attention. On one hand, this is due to the rapid proliferation and cost degradation of a wide variety of sensing hardware. On the other hand there are urgent growing and pressing demands from many application domains such as: in-home health monitoring especially for the elderly, smart cities, safe driving by monitoring and predicting driver’s behavior, healthcare applications, entertainment, assessment of therapy, performance evaluation in sports, etc. In this paper we focus on activities of daily living (ADL), which are routine activities that people tend to do every day without needing assistance. We have used a public dataset of acceleration data collected with a wrist-worn accelerometer for 14 different ADL activities. Our objective is to perform an extensive comparative study of the predictive power of several paradigms to model and classify ADL activities. To the best of our knowledge, almost all techniques for activity recognition are based on methods from the machine learning literature (particularly, supervised learning). Our comparative study widens the scope of techniques that can be used for automatic analysis of human activities and provides a valuation of the relative effectiveness and efficiency of a potentially myriad pool of techniques. We apply two different paradigms for the analysis and classification of daily living activities: (1) techniques based on supervised machine learning and (2) techniques based on estimating the empirical distribution of the time series data and use metric-theoretic techniques to estimate the dissimilarity between two distributions. We used several evaluation metrics including confusion matrices, overall accuracy, sensitivity and specificity for each activity, and relative computational performance. In each approach we used some of the well-known techniques in our experimentation and analysis. For example, in supervised learning we applied both support vector machines and random forests. One of the main conclusions from our analysis is that the simplest techniques, for example, using empirical discrete distributions as models, have almost the best performance in terms of both accuracy and computational efficiency.

Walid Gomaa

Deep Learning Technology for Big Data Analytics

Frontmatter

Soil Morphology Based on Deep Learning, Polynomial Learning and Gabor Teager-Kaiser Energy Operators

Abstract

Soil Morphology is considered the main observable characteristics of the different soil horizons. It helps farmers to determine what kind of soil they can use for their different plants. The observable characteristics include soil structure, color, distribution of roots and pores. The main concept of this chapter is to classify the different soils based on their morphology. Furthermore, the chapter contains a comparison between polynomial neural network and deep learning for soil classification. The chapter introduces a background about the different methods of feature extraction including the Gabor wavelet transform, Teager-Kaiser operator, deep learning, and polynomial neural networks. The chapter, also, includes two goals. The first goal is to improve the extraction of soil features based on Gabor wavelet transform but followed by the Teager-Kaiser Operator. The second goal is to classify the types of different morphological soil based on two methods: deep learning and polynomial neural network. We achieved accuracy limits of (95–100%) for the polynomial and deep learning classification achieved accuracy up to 95% but the deep learning is more accurate and very powerful. Finally, we compare our work results with the previous work and research. Results show an accuracy range of (98–100%) for our work compared with (95.1–98.8%) for the previous algorithms based on PNN. Furthermore, the accuracy of using DNN in this chapter comparing with pervious works achieved a good accuracy rather than the others.

Kamel H. Rahouma, Rabab Hamed M. Aly

Deep Layer Convolutional Neural Network (CNN) Architecture for Breast Cancer Classification Using Histopathological Images

Abstract

In recent years, there are various improvements in computational image processing methods to assist pathologists in detecting cancer cells. Consequently, deep learning algorithm known as Convolutional Neural Network (CNN) has now become a popular method in the application image detection and analysis using histopathology image (images of tissues and cells). This study presents the histopathology image related to breast cancer cells detection (mitosis and non-mitosis). Mitosis is an important parameter for the prognosis/diagnosis of breast cancer. However, mitosis detection in histopathology image is a challenging problem that needs a deeper investigation. This is because mitosis consists of small objects with a variety of shapes, and is easily confused with some other objects or artefacts present in the image. Hence, this study proposed four types of deep layer CNN architecture which are called 6-layer CNN, 13-layer CNN, 17-layer CNN and 19-layer CNN, respectively in detecting breast cancer cells using histopathology image. The aim of this study is to detect the breast cancer cell which is called mitosis from histopathology image using suitable layer in deep layer CNN with the highest accuracy and True Positive Rate (TPR), and the lowest False Positive Rate (FPR) and loss performances. The result shows a promising performance for deep layer CNN architecture of 19-layer CNN is suitable for this MITOS-ATYPHIA and AMIDA13 dataset.

Zanariah Zainudin, Siti Mariyam Shamsuddin, Shafaatunnur Hasan

A Survey on Deep Learning for Time-Series Forecasting

Abstract

Deep learning, one of the most remarkable techniques of machine learning, has been a major success in many fields, including image processing, speech recognition, and text understanding. It is powerful engines capable of learning arbitrary mapping functions, not require a scaled or stationary time series as input, support multivariate inputs, and support multi-step outputs. All of these features together make deep learning useful tools when dealing with more complex time series prediction problems involving large amounts of data, and multiple variables with complex relationships. This paper provides an overview of the most common Deep Learning types for time series forecasting, Explain the relationships between deep learning models and classical approaches to time series forecasting. A brief background of the particular challenges presents in time-series data and the most common deep learning techniques that are often used for time series forecasting is provided. Previous studies that applied deep learning to time series are reviewed.

Amal Mahmoud, Ammar Mohammed

Deep Learning for Taxonomic Classification of Biological Bacterial Sequences

Abstract

Biological sequence classification is a key task in Bioinformatics. For research labs today, the classification of unknown biological sequences is essential for facilitating the identification, grouping and study of organisms and their evolution. This work focuses on the task of taxonomic classification of bacterial species into their hierarchical taxonomic ranks. Barcode sequences of the 16S rRNA dataset—which are known for their relatively short sequence lengths and highly discriminative characteristics—are used for classification. Several sequence representations and CNN architecture combinations are considered, each tested with the aim of learning and finding the best approaches for efficient and effective taxonomic classification. Sequence representations include k-mer based representations, integer-encoding, one-hot encoding and the usage of embedding layers in the CNN. Experimental results and comparisons have shown that representations which hold some sequential information about a sequence perform much better than a raw representation. A maximum accuracy of 91.7% was achieved with a deeper CNN when the employed sequence representation was more representative of the sequence. However with less representative representations a wide and shallow network was able to efficiently extract information and provide a reasonable accuracy of 90.6%.

Marwah A. Helaly, Sherine Rady, Mostafa M. Aref

Particle Swarm Optimization and Grey Wolf Optimizer to Solve Continuous p-Median Location Problems

Abstract

The continuous p-median location problem is to locate p facilities in the Euclidean plane in such a way that the sum of distances between each demand point and its nearest median/facility is minimized. In this chapter, the continuous p-median problem is studied, and a proposed Grey Wolf Optimizer (GWO) algorithm, which has not previously been applied to solve this problem, is presented and compared to a proposed Particle Swarm Optimization (PSO) algorithm. As an experimental evidence for the NFL theorem, the experimental results showed that the no algorithm can outperformed the other in all cases, however the proposed PSO has better performance in most of the cases. The experimental results show that the two proposed algorithms have better performance than other PSO methods in the literature.

Hassan Mohamed Rabie

Gene Ontology Analysis of Gene Expression Data Using Hybridized PSO Triclustering

Abstract

The hybridized PSO Triclustering Model is the combination of Binary Particle Swarm Optimization and Simulated Annealing algorithm to extract highly correlated tricluster from the given 3D Gene Expression Dataset. The proposed hybrid Triclustering algorithms namely HPSO- TriC model generally produce higher quality results than standard meta-heuristic triclustering algorithms. Some of the issues in classical meta-heuristic triclustering models can be overcome in the HPSO-TriC model.

N. Narmadha, R. Rathipriya

Modeling, Simulation, Security with Big Data

Frontmatter

Experimental Studies of Variations Reduction in Chemometric Model Transfer for FT-NIR Miniaturized Sensors

Abstract

Recent technology trends to miniaturize spectrometers have opened the doors for mass production of spectrometers and for new applications that were not possible before and where the spectrometer can possibly be used as a ubiquitous spectral sensor. However, with the miniaturization from large reliable bench-top to chip-size miniaturized spectrometers and with the associated mass production, new issues have to be addressed such as spectrometers unit-to-unit variations, variations due to changing the measurement setup and variations due to changing the measurement medium. The unit-to-unit variations of the sensors usually result from changing mode of operation, aging, and production tolerances. The aim of this work is to study the issues emerging from the use of miniaturized Fourier Transform Near-Infrared (FT-NIR) spectral sensors and evaluate the influence of these issues on the multivariate classification model used in many applications. In this work, we also introduce a technique to transfer a classification model from a reference calibration sensor to other target sensors to help reducing the effect of the variations and to alleviate the degradation that occurs in the classification results. To validate the effectiveness of the model transfer technique, we developed a Gaussian Process Classification (GPC) model and Soft Independent Modeling Class Analogy (SIMCA) model both using spectral data measured from ultra-high temperature (UHT) pasteurized milk with different levels of fat content. The models aim to classify milk samples according to the percentage of their fat content. Three different experiments were conducted on the models to mimic each type of variations and to test how far they affect the models’ accuracy once the transfer technique is applied. Initially, we achieved perfect discrimination between milk classes with 100% classification accuracy. The largest retardation in accuracy appeared while changing the measuring medium reaching 45.4% in one of the cases. However, the proposed calibration transfer technique showed a significant enhancement in most of the cases and standardized the accuracy of all retarded cases to get the accuracy back to over 90%.

Mohamed Hossam, Amr Wassal, Mostafa Medhat, M. Watheq El-Kharashi

Smart Environments Concepts, Applications, and Challenges

Abstract

This chapter presents the clear definition of a smart environment, its advantages, previous motivations in various applications, and open research challenges. These challenges are classified into two parts, artificial intelligence, and internet-of-things. They are powerful for researchers and students to select a research topic. This chapter also removes the blurred idea of a smart environment that is limited to the vision of a smart environment definition just interprets with the internet-of-things. It presents the importance of recently used smart environments such as smart homes, smart farming, smart city, smart education, or smart factory. The recent statistics refer to the predication of used smart devices for constructing the smart environments that reach to nine double of a population around a world by 2025. This chapter presents a proposed criterion of building a good smart environment for any domain with respect to two dimensions data and security.

Doaa Mohey El-Din, Aboul Ella Hassanein, Ehab E. Hassanien

Synonym Multi-keyword Search Over Encrypted Data Using Hierarchical Bloom Filters Index

Abstract

Search on encrypted data refers to the capability to identify and retrieve a set of objects from an encrypted collection that suit the query without decrypting the data. Users probably search not only exact or fuzzy keyword due to their lack of data content. Therefore, they might be search the same meaning of stored word but different in structure. So, this chapter presents synonym multi-keyword search over encrypted data with secure index represented by hierarchical bloom filters structure. The hierarchical index structure improves the search process, and it can be efficiently maintained and constructed. Extensive analysis acquired through controlled experiments and observations on selected data show that the proposed scheme is efficient, accurate and secure.

Azza A. Ali, Shereen Saleh

Assessing the Performance of E-government Services Through Multi-criteria Analysis: The Case of Egypt

Abstract

E-government projects have been mostly supply driven with relatively less information about the performance of e-government services as well as the perception of the citizens regarding them. Less demanding of the available e-government services may be an indication of some difficulties in using or accessing these services. Therefore, the performance of the available e-government services needs to be assessed. This chapter aims to propose a methodology to assess the performance of e-government services. The proposed methodology combines two of the most well-known and extensively used Multi-Criteria Decision Making methods. These methods are PROMETHEE and AHP. The proposed methodology has been applied to assess the performance of e-government services available on Egyptian national portal. The research outcomes inform the policy makers about how e-government services have performed from citizen‘s perspective and help them to take suitable corrective actions to better the ranks of the underperforming services and fulfill the citizens’ needs.

Abeer Mosaad Ghareeb, Nagy Ramadan Darwish, Hesham Hefny

IoTIwC: IoT Industrial Wireless Controller

Abstract

Industrial controller systems are crucially essential to the cutting edge power systems industries. Industrial controllers link the integrated technologies of a computer, communication devices, and electric devices. The communication systems act as a physical intermediary layer for transferring, controlling, and acquirement of data within the system from distant locations. This chapter discusses the Supervisory Control And Data Acquisition (SCADA) systems and proposes a similar system that is an IoT based industrial wireless controller. The proposed system can control multiple devices through the network without the need to be physically near the devices. Because it uses simple and cheap devices, the system is low cost and easy to install. Additionally, the system is modular because extra microcontrollers can be easily added to the system to control more devices should the need arise.

Tassnim Awad, Walaa Mohamed, Mohammad M. Abdellatif

Applying Software Defined Network Concepts for Securing the Client Data Signals Over the Optical Transport Network of Egypt

Abstract

The physical layer of the Optical Transport Network (OTN) is the weakest layer in the network, as anyone can access the optical cables from any unauthorized location of the network and stat his attack by using any type of the vulnerabilities. The paper discusses the security threats and the practical challenges in the Egyptian optical network and presents a new technique to protect the client’s data on the physical layer. A new security layer is added to the OTN frames in case of any intrusion detection in the optical layer. The design of the proposed security layer is done by using a structure of XOR, a Linear Feedback Shift Register (LFSR), and Random Number Generator (RNG) in a non-synchronous model. We propose the security model for different rates in the OTN and wavelength division multiplexing (WDM) system. The proposed model is implemented on the basis of protecting the important client signals only over the optical layers by passing these signals into extra layer called security layer, and before forming the final frame of the OTN system, this done by adding a new card in the Network Element (NE) to perform this job and by using the software defined network (SDN) concept of the centralized controller for all the network to find the intrusions in the optical layers. The encryption techniques of the client signals over the OTN are done between the source and the destination stations only and the signals are encrypted in the entire routes between both sides. The centralized controller of the SDN is used to manage the cryptographic model by distributing the encryption and decryptions keys to the source and the destination stations of the client signals. At the same time it is used to automatic detection of any intrusions in the OTN sections by continues tracing of the variations in the optical to signal network ratio (OSNR) in the OTN, these variations are proportionally related to the risks of the optical hacking and may be new intrusion is started. The results show that using the centralized controller of the SDN in the proposed model of the OTN encryption schemes is providing a high security against any wiretapping attack at the same time the processes of detecting the intrusions in the optical layer over all the network become easier than before, and we can found that If any unauthorized attacker has the ability to access the fiber cables from any unmonitored location, the centralized controller of the SDN in the OTN will detect the variations in the OSNR in of the intruded section of the network and will automatically enable the check phase and according to the results of the check phase it will activate the cryptographic techniques for the selected client signals which passing through this intruded section, and the attacker will find encrypted data signals only and will need many years to find one the right key to perform the decryption process.

Kamel H. Rahouma, Ayman A. Elsayed

Watermarking 3D Printing Data Based on Coyote Optimization Algorithm

Abstract

The main objective of this work is developing 3D printing Data Protection Using Watermarking approach that considers watermarking problem as an optimization problem. 3D objects watermarking inhabits a challenging obstacle. The existence of many 3D objects representations act one reason for this challenge. The 3D models watermarking research state is furthermore in its opening as opposed to published work in video and image watermarking. This work propose a 3D watermarking approach by utilizing Coyote Optimization Algorithm (COA) in optimizing statistical watermarking embedding for 3D mesh model. Coyote optimization algorithm (COA) consider a recent fast and stable meta heuristic algorithm. This proposed approach aims to introduce an intelligent layer on the watermarking process. The approach starts by selecting the best vertices that will carry the watermark bits using k-means clustering method. Followed by watermark embedding step using COA in finding the best local statistical measure modification value. Finally we extract the embedded watermark without any need of the original model. The proposed approach is validated using different visual fidelity and robustness measures. The experimental results of the proposed approach will be compared with other state of the art approaches to prove its superiority in embedding and extraction of watermark bits sequence with respect to both robustness and imperceptibility.

Mourad R. Mouhamed, Mona M. Soliman, Ashraf Darwish, Aboul Ella Hassanien

A 3D Geolocation Analysis of an RF Emitter Source with Two RF Sensors Based on Time and Angle of Arrival

Abstract

The three-dimensional geolocation of a radio frequency RF emitting source is commonly determined using two RF sensors. Even today, most re-searchers are working on one of the three emitter-sensors motion platforms: stationary sensors–stationary emitter, moving sensors–stationary emitter, or stationary sensors–moving emitter. The present work is aimed to investigate a fourth scenario of moving RF sensors to find a moving RF emitter in space. A mathematical analysis is to consider the different cases and scenarios. Also, a corresponding algorithm is designed to simulate this analysis. We consider straight line and maneuvering motions of both the emitter and sensors. The present algorithm uses a hybrid situation of angle of arrival (AOA) and time of arrival (TOA) of the emitter RF signal to estimate the 3D moving emitter geolocation. Measurement errors of AOA and TOA are investigated and compared with the calculated values. We test the algorithm for long and short distances and it is found to be dynamic and reliable. The algorithm is tested for differ-ent values of AOAs, and TOAs with different standard deviations. Relatively small resulting emitter position error is detected. A MATLAB programming environment is utilized to build up the algorithm, carrying out calculations and presenting the output results and figures. Some of the applications of our analysis and algorithm will be presented.

Kamel H. Rahouma, Aya S. A. Mostafa

Title: Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges
Editors: Prof. Aboul Ella Hassanien
Prof. Ashraf Darwish
Publisher: Springer International Publishing
Electronic ISBN: 978-3-030-59338-4
Print ISBN: 978-3-030-59337-7
DOI: https://doi.org/10.1007/978-3-030-59338-4

Springer Professional

About this book

Table of Contents

Frontmatter

Artificial Intelligence and Data Mining Applications

Frontmatter

Rough Sets and Rule Induction from Indiscernibility Relations Based on Possible World Semantics in Incomplete Information Systems with Continuous Domains

Big Data Analytics and Preprocessing

Artificial Intelligence-Based Plant Diseases Classification

Artificial Intelligence in Potato Leaf Disease Classification: A Deep Learning Approach

Granules-Based Rough Set Theory for Circuit Breaker Fault Diagnosis

SQL Injection Attacks Detection and Prevention Based on Neuro—Fuzzy Technique

Convolutional Neural Network with Batch Normalization for Classification of Endoscopic Gastrointestinal Diseases

A Chaotic Search-Enhanced Genetic Algorithm for Bilevel Programming Problems

Bio-inspired Machine Learning Mechanism for Detecting Malicious URL Through Passive DNS in Big Data Platform

Machine Learning and Applications

Frontmatter

TargetAnalytica: A Text Analytics Framework for Ranking Therapeutic Molecules in the Bibliome

Earthquakes and Thermal Anomalies in a Remote Sensing Perspective

Literature Review with Study and Analysis of the Quality Challenges of Recommendation Techniques and Their Application in Movie Ratings

Predicting Student Retention Among a Homogeneous Population Using Data Mining

An Approach for Textual Based Clustering Using Word Embedding

A Survey on Speckle Noise Reduction for SAR Images

Comparative Analysis of Different Approaches to Human Activity Recognition Based on Accelerometer Signals

Deep Learning Technology for Big Data Analytics

Frontmatter

Soil Morphology Based on Deep Learning, Polynomial Learning and Gabor Teager-Kaiser Energy Operators

Deep Layer Convolutional Neural Network (CNN) Architecture for Breast Cancer Classification Using Histopathological Images

A Survey on Deep Learning for Time-Series Forecasting

Deep Learning for Taxonomic Classification of Biological Bacterial Sequences

Particle Swarm Optimization and Grey Wolf Optimizer to Solve Continuous p-Median Location Problems

Gene Ontology Analysis of Gene Expression Data Using Hybridized PSO Triclustering

Modeling, Simulation, Security with Big Data

Frontmatter

Experimental Studies of Variations Reduction in Chemometric Model Transfer for FT-NIR Miniaturized Sensors

Smart Environments Concepts, Applications, and Challenges

Synonym Multi-keyword Search Over Encrypted Data Using Hierarchical Bloom Filters Index

Assessing the Performance of E-government Services Through Multi-criteria Analysis: The Case of Egypt

IoTIwC: IoT Industrial Wireless Controller

Applying Software Defined Network Concepts for Securing the Client Data Signals Over the Optical Transport Network of Egypt

Watermarking 3D Printing Data Based on Coyote Optimization Algorithm

A 3D Geolocation Analysis of an RF Emitter Source with Two RF Sensors Based on Time and Angle of Arrival

Premium Partner