A Survey on Machine Learning in Chemical Spectral Analysis

: Chemical spectral analysis is contemporarily undergoing a revolution and drawing much attention of scientists owing to machine learning algorithms, in particular convolutional networks. Hence, this paper outlines the major machine learning and especially deep learning methods contributed to interpret chemical images, and overviews the current application, development and breakthrough in different spectral characterization. Brief categorization of reviewed literatures is provided for studies per application apparatus: X-Ray spectra, UV-Vis-IR spectra, Micro-scope, Raman spectra, Photoluminescence spectrum. End with the overview of existing circumstances in this research area, we provide unique insight and promising directions for the chemical imaging field to fully couple machine learning subsequently.


Introduction
The strong ability of machine learning [1] to learn high-level features motivates scientists to use this method in all walks of life. This is a direct consequence of the recent breakthroughs resulting from its application across a wide variety of scientific fields, such as data mining [2], computer vision [3], natural language processing [4], biometric recognition [5], medical diagnosis [6], brain circuits studies [7], particle physics [8], securities market analysis, DNA analysis [9], voice and handwriting recognition [10,11], etc. Recently it has also grabbed an increasing interest of scientists in chemical analysis [12]. Machine learning holds out promises for in the field of analytic chemistry. The machine learning architecture-the core of artificial intelligence-enables machines to realize complicated mathematical models of to acquire accurate information of chemical spectral. The input image data are performed by non-linear and/or linear functions of hierarchically models. Treating these functions as data processing 'layers', the hierarchical use of a larger number of such layers also inspires the name 'deep' learning. The common goal of deep learning methods is to iteratively learn the parameters of the computational model using a training data set such that the model gradually gets better in performing a desired task, e.g., classification; over that data under a specified metric. The computational model itself generally takes the form of an artificial neural network (ANN) [13] that consists of multiple layers of neurons/perceptrons.
Deep convolutional networks have now become the technique of choice. Once trained for a particular task, the deep learning models are also able to perform the same task accurately using a variety of previously unseen data (i.e., testing data). This strong generalization ability of deep learning currently makes it stand out of the other machine learning techniques. Learning of the parameters of a deep model is carried out with the help of back-propagation strategy [14] that enables some form of the popular gradient descent technique [15,16] to iteratively arrive at the desired parameter values. Updating the model parameters using the complete training data once is known as a single epoch of network/model training. contemporary deep learning models are normally trained for hundreds of epochs before they can be deployed.
A crucial step in the design of such systems is the extraction of discriminant features from the images. This process is still done by human researchers and, as such, one speaks of systems with handcrafted features. A logical next step is to let computers learn the features that optimally represent the data for the problem at hand. This concept lies at the basis of many deep learning algorithms: models (networks) composed of many layers that transform input data (e.g., images) to outputs (e.g., defects present/absent) while learning increasingly higher-level features. The most successful type of models for image analysis to date are convolutional neural networks (CNNs). CNNs contain many layers that transform their input with convolution filters of a small extent. The first successful real-world application in LeNet for handingwritten digit recognition. Despite these initial successful, the use of CNNs did not gather momentum until various new techniques were developed for efficiently training deep networks. The watershed was the contribution of Krizhevsky et al. to the ImageNet challenge in December 2012 [17]. The proposed CNN, called AlexNet, won that competition by a large margin. In subsequent years, further progress has been made using related but deeper architectures [18].
In this paper, we provide a comprehensive review of the recent machine learning techniques in chemical imaging analysis, focusing mainly on the common spectral characterization. We categorize these techniques under different characterization instruments. Analyzing the reviewed literature, we establish 'lack of appropriately annotated largescale datasets for chemical imaging tasks as the fundamental challenge (among other challenges) in fully exploiting deep learning for those tasks. We then provide guidelines to deal with this and other challenges in chemical image analysis using machine learning. This review also touches upon the available public datasets to train machine learning models for the chemical imaging tasks. Considering the lack of in-depth comprehension of machine learning framework by the broader chemical community, this article also provides the understanding of the core technical concepts related to an appropriate level.

Outline of Machine Learning Methods
The purpose of this section is to present the general types of machine learning techniques: supervised and unsupervised learning algorithms. Furthermore, deep learning methods are mainly introduced. Supervised learning algorithms provide the train dataset = { , } −1 of training example and its label . In that case, the training example x belong to different class of data. Labe y is a vector with continuous values. Actually, the goal of supervised learning is to find a computational model Θ that correctly predict the label y based on the loss function ( , �) . In comparation with supervised learning, unsupervised one without labels is used to find patterns. The aim of unsupervised learning is to cluster the data samples into different group based on the similarities of their intrinsic characteristics-e.g. clustering pixel of color images based on their RGB values. Similar to the supervised learning tasks, models for unsupervised learning task can also take advantage of minimizing a loss function. In the context of deep learning, this loss function is normally designed such that the model learns an accurate mapping of an input signal to itself. Once the mapping is learned, the model is used to compute compact representations of data samples that cluster well. Deep learning framework has also been found very effective for unsupervised learning.
Along with supervised and unsupervised learning, other machine learning types include semisupervised and reinforcement learning. Informally, semi-supervised learning computes models using the training data that provides label only for its smaller subsets. On the other hand, reinforcement learning provides a kind of supervision for the learning problem in terms of rewards or punishments to the algorithm. Due to their remote relevance to the task in chemical spectrum, wo do not provide further discussion on these categories.

Artificial Neural Networks
An artificial neural network (ANN) is a hierarchical composition of basic computational elements know as neurons. Neural networks are a type of learning algorithm which forms the basis of most deep learning methods. A neural network comprises of neurons or units with some activation a and parameters Θ = { , }, where W is a set of weights and a set of biases. The activation represents a linear combination of the input x to the neuron and the parameters, followed by an element-wise nonlinearity (·), referred to as a transfer function: = ( + ).
(1) Typical transfer functions for traditional neural networks are the sigmoid and hyperbolic tangent function. The multi-layered perceptrons (MLP), the most well-known of the traditional neural networks, have several layers of these transformations: Here, W is a matrix comprising of columns , associated with activation in the output. Layers in between the input and output are often referred to as 'hidden' layers. When a neural network contains multiple hidden layers. it is typically considered a 'deep' neural network, hence the term 'deep learning'.
At the final layer of the network the activations are mapped to a distribution over classes ( | ; Θ) through: indicates the weight vector leading to the output node associated with class . A schematic representation of three-layer MLP is shown in Figure 1.
Maximum likelihood with stochastic gradient descent is currently the most popular method to fit parameters Θ to a dataset . In stochastic gradient descent a small subset of the data, a mini-batch, is used for each gradient update instead of the full data set. Optimizing maximum likelihood in practice amounts to minimizing the negative log-likelihood: This results in the binary cross-entropy loss for two class problems and the categorical cross-entropy for multi-class tasks. A downside of this approach is that it typically does not optimize the quantity we are interested in directly, such as area under the receiver operating characteristic curve or common evaluation measures for segmentation, such as the dice coefficient.

Convolutional Neural Networks
In the context of deep learning techniques for image analysis, CNNs [17,19] are of the primary importance. Similar to the standard ANNs, CNNs consist of multiple layers. However, instead of simple perceptron layers, we encounter three different kinds of layers in these networks (a) convolutional layers, (b) pooling layers, and (c) fully connected layers. We describe these layers below, focusing mainly on the Convolutional layers that are the main source of strength for CNNs as shown in Fig. 1. Convolutional layers: The aim of convolution layers is to learn weights of the so-called convolutional kernels/filters that can perform convolutional operations on images. Traditional images analysis has a long history of using such filter to highlight/extract different image features, such as Sobel filter for detecting edges in images. However, before CNNs, these filters needed to be designed by manually setting the weights of the kernel in a careful manner. The breakthrough that CNNs provided is in the automatic learning of these weights under the neural network setting. In 2D setting (e.g., grey-scale images), this operation involves moving a small window (i.e., kernel) over a 2D grid (i.e., image). In each moving step, the corresponding elements of two grids get multiplied and summed up to compute a scalar value. Concluding the operation results in another 2D-grid, referred as the feature/activation map in the CNN literature. In 3D setting, the sample steps are performed for the individual pairs of the corresponding channels of the 3D volumes, and the resulting feature maps are simply added to compute a 2D map as the final output.
Pooling layers: The main objective of a pooling layer is to reduce the width and height of the activation maps in CNNs. The basic concept is to compute a single output value for a small × grid in the activation map, where is simply the maximum or average value of that grid in the activation map. Based on the used operation, this layer is often referred as max-pooling or average-pooling layer. Interestingly, there are no learnable parameters associated with a pooling layer. Hence, this layer is sometimes seen as a part of convolutional layer. For instance, the popular VGG-16 network does not see pooling layer as a separate layer, hence the name VGG-16. On the other hand, other works, that use VGG-16 often count more than 16 layers in this network by treating the pooling layer as a regular network layer.
Fully connected layers: These layers are the same as the perceptron layers encountered in the standard ANNs. The use of multiple convolutional and pooling layers in CNNs gradually reduces the size of resulting activation maps. Finally, the activation maps from a deeper layer are rearranged into a vector which is then fed to the fully connected layers, it is a common knowledge now that the activation vectors of fc-layers often serve as very good compact representations of the input signals (e.g., images).

Recurrent Neural Networks
Standard neural networks assume that input signals are independent of each other. However, often this is not the case. For instance, a word appearing in a sentence generally depends on sequence of the words preceding it. Recurrent neural networks (RNNs) are designed to model such sequences. An RNN can be thought to maintain a 'memory' of the sequence with the help of its internal states. In Fig. 2, we show a typical RNN that is unfolded-complete network is shown for the sequence. If the RNN has three layers, it can model e.g. sentences that are three words long. In the figure, is the input at the ℎ time stamp. For instance, can be some quantitative representation of the ℎ word in a sentence. The memory of the network is maintained by the state that is computed as = ( is typically a non-linear activation function, e.g., ReLU. The output at a given time stamp ot is a function of a weighted version the network state at that time.

Machine Learning Applications in Chemical Spectrum
In this section, we review the recent contributions in chemical spectrum analysis that exploit the machine learning technology. We mainly focus on the research papers published recent years, while briefly mentioning the more influential contributions from the earlier years. The literature pertaining to each task is then further sub-categorized based on the different spectrum.

X-Ray Spectra
For the anatomic region of X-Ray spectral, John, for the first time, analyzed X-ray absorption fine structure (XAFS) though ANN [20]. They mainly focus on using ANN method to build better models about complicated and multidimensional XAFS data of catalysis. It helps researchers understand the structure and function of catalysis and generate new knowledge about catalysis. Medford et al. [21] provided a hierarchy of datainformation-knowledge derived from early works, utilizing machine-learning and uncertainty quantification.
Timoshenko et al. [22] presented supervised machine learning (SML) with input features of X-ray absorption near-edge structure (XANES) spectroscopy image dataset to polish the 3D metal catalysts structure. SML technique seeks to bridge the gap between the XANES fingerprints and catalyst geometry. Ab initio XANES simulations as the train data were used to train this SML model. Their experiment results reveal that SML could reconstruct the 3D geometry, including average size, shape, and morphology of welldefined nanoparticles (e.g., platinum). In the fact, there are essential difference between the computergenerated and experimental X-ray spectroscopy. However, the train dataset such as XANES and XAFS always use expensive and rare related instruments to gain. Extracting structural and electronic descriptors from XANES spectra is akin to solving a challenging inverse problem.

Raman Spectra
Raman spectrum can be regarded as a one-dimensional image. it can be solved by some kinds of neural network architecture. For instance, Lieber et al. [23] employed an automated method for fluorescence subtraction based on a modification to least-squares polynomial curve fitting. Because it is well-known that the inherent fluorescence of Raman spectrum generated by many biological molecules usually cover the true information among materials. Liu et al. [24] reported CNNs with multivariate treatment, including preprocessing, feature extraction and classification, to analyze Raman spectroscopy. This CNN method not only greatly simplifies the classification procedures for Raman spectroscopy, but also achieves high accuracy compared early related works using other machine learning methods such as support vector machine (SVM) with baseline corrected spectra. Chen et al. [25] described residual neural network (ResNet) was used to decode Raman spectra-encoded suspension arrays. This ResNet network presents classification stability and training convergence to different datasets, and reaches high classification accuracy 100%. Except the one-dimensional Raman images, Raman microscopic 2D image analysis in the context of bladder cancer cytopathology was also put forward by deep convolutional neural networks (DCNNs) applied in numerous pattern recognition tasks [26]. Going forward, it is significant to determine the detail information of chemical substance by automatic analysis of Raman spectra.

UV-Vis-IR Spectra
Baik et al. [27] demonstrated a system for identifying tablets using a portable visible-near-infrared (VIS-NIR) spectrometer and a CNN, which is one of the machine learning algorithms. According to spectroscopy techniques, the spectrum of each tablet has unique reflectance features. To classify tablets using their spectra, we have implemented three comparative experiments on the wavelength range and successfully classified 14 kinds of tablets. The results of the three experiments are 97.86% (using the VIS spectra), 96.90% (using the NIR spectra), and 98.81% (using the VIS-NIR spectra). This shows high accuracy without reference to the wavelength range. Bjerrum et al. [28] analyzed a tablet dataset using CNNs with automatic tuning of the model hyperparameters and regularization level in the form of dropout layers. The dataset consists of assay results from analysis of pharmaceutical tablets and NIR spectra recorded with two different instruments. Data preprocessing in the form of spectral data augmentation is implemented and the performance compared with extended multiplicative scatter correction. All dataset treatments are compared to hypothetical optimal models as a baseline method. The model types performances are also compared on a specially crafted extrapolation challenge for both assay result values and instrument recordings.

Scanning Electron/Light Optical Microscope
The microstructure of material stores the its genesis and determines all its physical and chemical properties. Since the microstructure could be a combination of different phases or constituents with complex substructures its automatic classification is very challenging. In assessing microstructures features, Velichko et al. [29] employed a method using data mining methods by extracting morphological features and a feature classification step on cast iron using SVMs. Jonathan present a max-pooling convolutional neural network approach for supervised steel defect classification [30]. Pauly et al. followed this same approach by applying on a contrasted and etched dataset of steel, acquired by scanning electron microscope (SEM) and light optical microscope (LOM) imaging using the steel images of Material Engineering Center Saarland (MECS) data set [31]. This dataset is available at the MECS website: http://www.mec-s.de/. However, it could only reach 48.89% accuracy in microstructural classification on the given dataset for four different classes due to high complexity of substructures and not discriminative enough features. More recently, Azimi et al. [32] proposed a deep learning method for microstructural classification in the examples of certain microstructural constituents of low carbon steel. This novel method employs pixel-wise segmentation via fully convolutional neural network accompanied by a max-voting scheme. Their system achieves 93.94% classification accuracy, drastically outperforming the state-of-the-art method of 48.89% accuracy. Beyond the strong performance of our method, this line of research offers a more robust and first of all objective way for the difficult task of steel quality appreciation. The authors used three possible CNN networks, namely CIFAR-Net proposed by Lecun et al. [19], AlexNet proposed by Alex et al. [33] and VGG19-Net [17], to categorize SEM and LOM images of steel. The accuracies reported by the authors for these classes are 51.27%, 56.56%, 60.02%, and 93.94%, corresponding to CIFARNet, VGG-19-DeCAF-RBF-kernel SVM, VGG-16 and MVFCNN respectively. As shown in Fig. 3, Li et al. [34] formulated an automated recognition tool based on a computer vison-based approach, including sequential a cascade object detector, convolutional neural network, and local image analysis methods. It indicates that this automated tool achieves high classification level in terms of recall and precision and achieves quantitative image/defect analysis metrics close to the human average. Katsumi et al. [35] proposed a new approach based on an image-processing or deep-learning-based method for super-resolution of 3D images with such asymmetric resolution, so as to restore the depth resolution to achieve symmetric resolution. The deeplearning-based method learns from high-resolution sub-images obtained via SEM and recovers low-resolution sub-images parallel to the FIB milling direction. The 3D morphologies of polymeric nanocomposites are used as test images, which are subjected to the deep-learning based method as well as conventional methods. they find that the former yields superior restoration, particularly as the asymmetric resolution is increased.

Photoluminescence
Wood et al. [36] proposed a machine learning algorithm to calculate time-resolved photoluminescence (TRPL) data and determine the probability distribution of decay rate an arbitrary emitter without any a priori assumptions. They employed Laplace transform and k-fold cross-validation to reduce the prediction error. In this work, Poisson statistics is used to solve the discrete Laplace transform, and the regularization method of k-fold cross validation is used to minimize the prediction error. For example, the attenuation rate distribution of Ag-In-Se nanocrystalline is best described by the machine learning model. This fitting method can be applied by researchers, especially to better understand the fluorescence principle of I-III-VI nanocrystalline. On the other hand, this machine learning model is also applied in energy-resolved, ultrafast emission from CsPbBr3 perovskite nanocrystals. Besides, computer-generated TRPL data has been fitted and calculated by their method, indicating high practicability of this machine learning model. They provided their software (LumiML) open source [37]. The machine learning approach indicates the complex dynamics of neutral excitons, charged excitons and multiexcitons [38,39], as well as their binding energies, [40] highlighting the utility of this program for improving the comprehension of novel materials.

Future Directions
Currently, the number of chemical spectral works employed by machine learning has risen with the eruptible revolution of image analysis based on CNN. In the future, the academic area of chemical imaging analysis will follow an increasing large of research findings. Hence, we present core instructions and future directions to address the problems faced by machine learning in chemical spectra analysis. We provide our perspectives based on the related papers and other literature of similar interdisciplinary fields in machine learning/computer vision such as hyperspectral remote sensing data analysis, medical imaging analysis and so on. Therefore, we propose two following present challenges and corresponding solutions. (1) Owing to the lack of large-scale chemical imaging datasets, chemical spectral analysis will be limited to develop high-stability and enoughaccuracy models. Fortunately, transfer learning can address the dilemma. Transfer learning owns the special ability of a system to recognize and apply knowledge and skills learned in other domains/tasks to novel domains/tasks [41]. Medical imagine analysis area has already employed this dramatic transfer learning [42][43][44]. It is obvious from this point that chemical spectra should make full use of transfer learning's applications in medical imaging field. (2) In the absence of large-scale chemical imaging datasets, employing the current machine learning model as high-level feature extractors carry out further training, which shows a brighter and promising direction in the future. By the way, the large-scale chemical spectral datasets should be developed with the need to not train computer-generated spectra, but experimental data.

Conclusion
In summary, we propose a survey of chemical spectral analysis via machine learning algorithm built upon the previous works. It generally presents the methodologies, applications and future directions. Fundamentally, due to the lack of understanding basic principles of machine learning among chemical spectral analyst, this review introduces the essential concepts of machine learning methods (e.g., supervised or unsupervised one) and deep learning frameworks (e.g., ANN, CNN, RNN). It can help chemical spectral analyst better understand the machine learning methods.
In the second section, it focuses on comprehensive machine learning uses in different chemical spectral. Unlike other existing related reviews in the field, this paper proposes computer vision/machine learning applications in the chemical imaging researchers' sense. This survey not only offers a deep understanding of the key concepts in machine learning for the chemical spectral field, but also puts emphasis on the intrinsic cause of the challenges in chemical spectral applications using machine learning.
Based on the recent literature, we point out that the deficiency of large chemical imaging datasets occupies the main status in machine learning for chemical image analysis at the present time. Finally, we hold the view that scientists should adapt multiple measures to address this dilemma like other research areas via machine learning such as medical imaging analysis, hyperspectral remote sensing data analysis. We hope that this frontier interdisciplinary research area which combines chemical spectra and machine learning will raise scholars' interesting among chemical analysts, computer vision and machine learning fields.
Funding Statement: This work is supported by National Natural Science Foundation of China (62072250).

Conflicts of Interest:
We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.