Automated Multi-Document Biomedical Text Summarization Using Deep Learning Model

: Due to the advanced developments of the Internet and information technologies, a massive quantity of electronic data in the biomedical sector has been exponentially increased. To handle the huge amount of biomedical data, automated multi-document biomedical text summarization becomes an effective and robust approach of accessing the increased amount of technical and medical literature in the biomedical sector through the summarization of multiple source documents by retaining the significantly informative data. So, multi-document biomedical text summarization acts as a vital role to alleviate the issue of accessing precise and updated information. This paper presents a Deep Learning based Attention Long Short Term Memory (DL-ALSTM) Model for Multi-document Biomedical Text Summarization. The proposed DL-ALSTM model initially performs data preprocessing to convert the available medical data into a compatible format for further processing. Then, the DL-ALSTM model gets executed to summarize the contents from the multiple biomedical documents. In order to tune the summarization performance of the DL-ALSTM model, chaotic glowworm swarm optimization (CGSO) algorithm is employed. Extensive experimentation analysis is performed to ensure the betterment of the DL-ALSTM model and the results are investigated using the PubMed dataset. Comprehensive comparative result analysis is carried out to showcase the efficiency of the proposed DL-ALSTM model with the recently presented models.


Introduction
Automatic text processing tool plays a vital role in efficient knowledge acquisition in massive source of textual data in the field of health care and life science, namely, clinical guidelines/electronic health records and scientific publications [1]. Automatic text summarization has been subregion of text mining and Natural Language Processing (NLP) which intends for producing a condensed form of more than one input documents by extracting the more important contents [2,3]. Text summarization tool could assist clinicians and researchers' resources saving and time by manually presenting and identifying the key concepts within long document, with no need to read the entire text [4]. Initially, text summarization is based on frequency features to recognize the most relevant contents of textual documents. After, several summarization tools have integrated a broad range of heuristics and features into the procedure of content selection. The most commonly utilized feature includes the lengths of sentences, the position of sentences, keywords extracted from the texts, the existence of cue phrases, the title words, the centroid-based cohesion, the co-occurrence feature, the existence of arithmetical contents, etc [5].
In order to resolve this limitation, other strands of study investigated the evolution of technology which utilizes source of knowledge domain for mapping the texts into concept-based representations [6]. It allows measuring the useful content of the text regarding the semantics and context behindhand the sentence, instead of shallow features. But there are few difficulties in utilizing biomedical knowledge sources in text analyses, especially in summarization [7]. Maintaining, utilizing, and Building knowledge basis could be challenging. A massive amount of automatic annotation is required to widely determine the entities and concepts and to capture the relationships among them. The selection of relevant sources of knowledge domain is challenging which might seriously affect the performances of bio-medical summarization [8]. Another problem is how to measure the useful content of sentences relying on qualitative relationships among methods. Deep neural network (DNN) based language methods [9] could be used to tackle most of the problems related to knowledge domain in context aware bio-medical summarization. In deep language (DL) algorithm is pre-trained on massive quantities of text information and learns how to characterize 4 units of text, mostly words, in a vector space [10]. The pre-trained embedding could be finetuned on down-stream tasks or straightly utilized as an arithmetical feature. [11] proposed a deep-reinforced, abstractive summarization method which can able to read biomedical publication abstract and produce summary by means of a title or one sentence headline. They present a new reinforcement learning (RL) reward metrics based biomedical expert systems, namely MeSH and UMLS Metathesaurus also shows that this method can able to produce abstractive, domain-aware summaries. [12] presented a new text summarization method for documents with Deep Learning Modifier Neural Network (DLMNN) classification. It produces an enlightening summary of the document-based entropy values. The introduced DLMNN architecture contains 6 stages. Initially, the input document is preprocessed. Next, the feature can be extracted with preprocessed information. Then, the most relevant feature can be elected by the improved fruit fly optimization algorithm (IFFOA). The entropy values for each selected feature are calculated. In [13], the major focuses on application of ML methods in 2 distinct sub-regions that are associated with medical industry. The initial applications are Sentiment Analysis (SA) of user narrated drug reviews and the next is engineering in food technology. Since ML and AI methods enforce the limitations of scientific drug detection, ML methods are chosen as another technique for 2 main factors. Initially, ML method includes distinct learning approaches and also its feasibility for numerous NLP operations. Next, its inherent capacity to model various features that capture the features of sentiment in text. [14] trying to address this limitation by suggesting a novel method with topic modelling, unsupervised neural networks, and documents clustering for building effective document representations. Initially, a novel document clustering method with the Extreme learning machine (ELM) method is implemented on massive text collection. Next, topic modelling is employed in the document collection for identifying the topic existing in all the clusters. Then, all the documents are characterized in a concept space using a matrix in which column represents the cluster topics and row represents the document sentence. The created matrix can be trained by numerous ensemble learning algorithms and unsupervised neural networks for building abstract representations of the document in the topic space. [15] designed a new biomedical text summarization method which integrates 2 commonly used data mining methods: Frequent itemset and clustering mining. Biomedical paper can be stated as a group of biomedical topics with the UMLS metathesaurus. The K-means method is utilized for clustering analogous sentences. Subsequently, the Apriori method is employed for discovering the frequent itemsets amongst the clustered sentences. Lastly, the relevant sentence from all the clusters is elected for building the summary with the detected frequent itemset.
[16] integrated frequent itemsets and sentence clustering mining for building an individual biomedical text summarization model. A bio-medical document is denoted as a set of UMLS topics. The generic concept is rejected. The vector space method is applied for representing the sentence. The K-means clustering method is employed for semantically clustering analogous sentences. The frequent itemset is extracted between the global clusters. The detected frequent itemset is applied for calculating the score of sentence. The topmost N high scoring sentences are elected for representing the last summary. [17] resolve this problem in terms of biomedical text summarization. They measure the efficiency of a graph-based summarizer with distinct kinds of contextualized and context-free embeddings. The word representation is generated by pretraining neural language methods on massive amount of bio-medical texts. The summarizer modes the input texts as graphs where the strength of relationships among the sentences are evaluated by the domain specific vector representation.

Paper Objective
The objective of this study is to design novel deep learning based Multi-document Biomedical Text Summarization model with hyperparameter tuning process.

Paper Contributions
This paper presents a Deep Learning based Attention Long Short Term Memory (DL-ALSTM) Model for Multi-document Biomedical Text Summarization. The proposed DL-ALSTM model initially performs data preprocessing to convert the available medical data into a compatible format for further processing. Then, the DL-ALSTM model gets executed to summarize the contents from the multiple biomedical documents. In order to tune the summarization performance of the DL-ALSTM model, chaotic glowworm swarm optimization (CGSO) algorithm is employed. Extensive experimentation analysis is performed to ensure the betterment of the DL-ALSTM model and the results are investigated using the PubMed dataset.

Paper Organization
The remaining sections of the paper are arranged as follows. Section 2 offers the proposed DL-ALSTM model and Section 3 discusses the performance validation. Finally, Section 4 draws the conclusion of the study.

The Proposed Biomedical Text Summarization Technique
In this study, an effective DL-ALSTM model has been presented for Multi-document Biomedical Text Summarization. The proposed DL-ALSTM model performs pre-processing, summarization, and hyperparameter optimization. The detailed working of these processes is offered in the following sections.

Pre-processing
The summarization procedure begins with running a pre-processed step. Initial, individual's part of an input document which is discarded to inclusion under the overview have been removed, and the essential text was taken. The redundant parts contain the title, abstract, keywords, author's data, headers of sections and subsections, figures and tables, and bibliography section [18]. It can be assumed that parts are unnecessary as it doesn't perform under the technique summary which is utilized for evaluating the amount of outline. The removal phase is customized dependent upon the framework of an input text and user's preference. When it can be chosen for including the title of section and subsection under the summary, further data has been saved together with all the sentences for specifying the section of text which sentence goes to.
As the input of feature extraction scripts of BERT are text files where all sentences perform in distinct lines, and all sentences are tokenized, the pre-processed step remains with splitting an input text as to distinct sentences and tokenization step. Utilizing the Natural Language ToolKit (NLTK), the summarizers split the essential text into groups of sentences, and signify all the sentences as groups of tokens. Afterward these pre-processed functions, an input sentence is ready that mapped as to contextualized vector representation.

Biomedical Text Summarization
The preprocessed data is fed into the DL-ALSTM model to summarize the multi-document biomedical text. The LSTM cell is comprised of 5 essential components: Input gate i, forget gate f , output gate o, recurring cell state c, and hidden state output h. During this difference of LSTM cell, Apaszke utilized a varying manner to compute h t . During the concrete, at all the time steps t, an internal memory cell c t ∈ R n has been added to compute h t . Mostly, at all the time steps t, the LSTM cell utilizes the earlier hidden state h t−1 , an input state x t for producing the temp internal memory state c − in t , afterward utilizing the internal memory state c t−1 and temp internal memory state c − in t for producing the internal memory state c t . With utilizing the more of gradient under this LSTM cell for minimizing the gradient explosion (GE). During the presented technique has been utilizing in the LSTM cell as fundamental units to combine encoding as well as decoding elements [19].
Based on the subsequent explains the fundamental calculations in LSTM cell:

State Update
The trained stage purposes for learning the parameters W x * , W h * for x and h correspondingly; σ (·) refers the generally sigmoid function; tanh(·) represents the hyperbolic tangent function; implies the multiplication operators, and b signifies the bias.
In this case, it can utilize stacked LSTM layer on the vertical way in which input of present LSTM layers utilizes resultant of preceding layer. Fig. 1 illustrates the structure of LSTM model.
This method has been simulated in Google neural machine translation method. It has 3 modules: Encoded, decoded, and attention networks. During the PCA-LSTM technique, the encoded utilizes stacked LSTM layer that contains 1 bi-directional LSTM layer and 3 uni-directional LSTM layers. In the bi-directional LSTM encoded, the data needed to paraphrase particular words from the resultant side is act somewhere on source side. Frequently the source side data was around left-to-right, same as the target side, however, based on the language pair the data to a specific resultant word is distributed and also to be separate in particular area of input side. Next, the last hidden state h i to all input units x i has been the concatenation of forward as well as backward hidden states. The decoded that present technique utilizes only a typical LSTM which is utilized to create paraphrase order y = y 0 , . . . , y L with calculating order of hidden state ( − → s 0 , . . . , − → s L−1 ) in which the context of present created paraphrase unit is encoder from s L−1 . Perfectly, typical attention process has been demonstrated to compute the relevance score α ti to all hidden state h i that are utilized to compute the context vector c t as: The value of hidden state significance score α ti represents the most significant source units for focusing on and is calculated as: where e ti is named as alignment technique and computed as utilizing NN f as follow: where f generally utilizes tanh function with 2 input parameters s t−1 , h i , in case, it can be also regarded that default value of β is 1. When it is assumed that word equivalent to the minimal value of tanh functions to have no role from creating paraphrase then nearly words containing a play in the typical attention method. Actually, any words are no play from paraphrasing of PG issue (it can be distinct in neural machine translations). For addressing this problem, it can be adding a novel parameter to the technique. It is obvious that the play of β on tanh functions. It is called β has penalty coefficient (PC) on alignment technique. An objective of β has for suppressing the play of words (equivalent to value of tanh function is −1) from the source order to the present word under the paraphrase order. This directly modifies the value of attention weight so it can be called this technique was Penalty Coefficient Attention (PCA). So, with utilizing the earlier hidden state s t−1 , the very relevant source contexts c t , and earlier created textual units y t−1 for computing the hidden state s t of decoded: where g represents the GRU units. During this encoder-decoder structure dependent upon PCA-LSTM approach, the calculating of created paraphrase order y = y 0 , . . . , y L is also dependent upon conditional distributions P as:

Design of CGSO Based Hyperparameter Optimization
In order to effectually adjust the hyperparameters involved in the DL-ALSTM model, the CGSO algorithm is utilized [20]. In the fundamental GSO, the n firefly (FF) individuals are arbitrarily distributed from D-dimension search space, and all fireflies (FFs) transmits luciferase l u . The FF individuals produced a specific count of fluorescence, it can be interrelated nearby individual FF, and its individual decision making area r u d (0 < r u d ≤ r s ). The luciferase size of individual FFs are compared with objective function of their individual place, the superior fluorescein, an optimum the brighter FF signify in their location, an optimum target, and conversely target is worse. The size of decision-making area radius is moved by the amount of individuals under the neighborhood, the lesser density of neighborhood FFs, FF's decision area radius is improved for finding further neighbors; on the other hand, the decision area radius of FFs are shrinking. At last, one of the FFs collects from several places. The primary FFs, all FFs separate transmits the similar luciferase concentrations l 0 and perception radius r 0 .

Fluorescein update:
where J(x u (z)) refers the main function value that is equivalent to all FFs u from the place x u (z) of tth iteration; l u (z) defines the present value of FF luciferase; γ indicates the fluorescein upgrade rate.
Probability selection: vth individual probabilities p uv (z) has been chosen near and within the regions set N u (z).
. (14) In particular, the region set is N u (z) = {v: d uv (z) < r u d (z); l u (z) < l v (z)}, 0 < r u d (z) ≤ r s , r s implies the radius of FFs individual perception. Fig. 2 demonstrates the flowchart of GSO technique.

Figure 2: Flowchart of GSO
Location update: where s represents the moving step.
The dynamic decision area radius upgrade: The GSO technique contains a primary distribution of FFs, luciferase upgrade, FF progress, and decision-making area upgrade. To improve the performance of the GSO algorithm, the CGSO algorithm is derived by integrating the concepts of chaotic theory.
The chaos method is a subdivision of mathematics that performs on nonlinear dynamic process. Nonlinear denotes that it can be inconceivable for predicting the system's response with respect to the input, and dynamic mean alters from the system in one state to another over time. The chaos purposes signify the dynamical system using deterministic formula. However, based on the initial condition, chaotic function is divergent feature performances and generated wildly unpredictable. Thus, the chaos functions are improving the diversification and intensification of enhanced methods i.e., avert local optimum solution and alter neighboring global optimum. This purpose follows easier principle and has some interconnecting portions; but, in all iterations, the created value was depending on the primary condition and earlier values.
In this case, it is performed 3 different chaotic maps such as iterative mapping, tent mapping, and logistic mapping with power exponents (p) and sensory modality (c) computation from the BOA. The chaos purpose has been determined to exhibit high efficacy associated with another chaos purpose.
Logistic map: Now, x t means the value from iteration t, as well as r represents the rate of growth i.e., proceed value in [3.0-4.0].
Iterative map: During the iterative map, the values of P are chosen among zero and one, as well the result x t has been chaotic parameter that takes values in [0-1].
Tent map: The tent map has 1D map which is same as logistic map. Now, the result x t has been chaotic variable which takes values in [0-1].

Experimental Validation
The proposed DL-ALSTM model has been validated using PubMed dataset [21], which comprises the instances in json format. The abstract, sections, and body are all sentence tokenized. The json objects includes several parameters such as: article_id, abstract_text, article_text, section_names, and sections.
The accuracy graph of the LSTM model is depicted in Fig. 3. The figure reported that the LSTM model has attained increased training and validation accuracies with an increase in epoch count. At the same time, it is noticed that the LSTM model has resulted in higher validation accuracy compared to training accuracy.               By looking into the above mentioned tables and figures, it is apparent that the DL-ALSTM technique is found to be an effective tool for biomedical text summarization process.

Conclusion
In this study, an effective DL-ALSTM model has been presented for Multi-document Biomedical Text Summarization. The proposed DL-ALSTM model initially performs data preprocessing to convert the available medical data into a compatible format for further processing. Then, the DL-ALSTM model gets executed to summarize the contents from the multiple biomedical documents. In order to tune the summarization performance of the DL-ALSTM model, CGSO algorithm is employed. Extensive experimentation analysis is performed to ensure the betterment of the DL-ALSTM model and the results are investigated using the PubMed dataset. Comprehensive comparative result analysis is carried out to showcase the efficiency of the proposed DL-ALSTM model with the recently presented models. In future, the performance of the DL-ALSTM model can be improvised by the use of advanced hybrid metaheuristic optimization techniques.