Computers, Materials & Continua DOI:10.32604/cmc.2022.021342 | |
Article |
Incremental Learning Framework for Mining Big Data Stream
1Information Systems Department, Faculty of Computers and Information, Mansoura University, Mansoura, 35516, Egypt
2Machine Learning and Information Retrieval Department, Faculty of Artificial Intelligence, Kafrelsheikh University, Kafr El-Sheikh, Egypt
3Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, 21944, Saudi Arabia
*Corresponding Author: Mohammad Dahman Alshehri. Email: alshehri@tu.edu.sa
Received: 29 June 2021; Accepted: 30 July 2021
Abstract: At this current time, data stream classification plays a key role in big data analytics due to its enormous growth. Most of the existing classification methods used ensemble learning, which is trustworthy but these methods are not effective to face the issues of learning from imbalanced big data, it also supposes that all data are pre-classified. Another weakness of current methods is that it takes a long evaluation time when the target data stream contains a high number of features. The main objective of this research is to develop a new method for incremental learning based on the proposed ant lion fuzzy-generative adversarial network model. The proposed model is implemented in spark architecture. For each data stream, the class output is computed at slave nodes by training a generative adversarial network with the back propagation error based on fuzzy bound computation. This method overcomes the limitations of existing methods as it can classify data streams that are slightly or completely unlabeled data and providing high scalability and efficiency. The results show that the proposed model outperforms state-of-the-art performance in terms of accuracy (0.861) precision (0.9328) and minimal MSE (0.0416).
Keywords: Ant lion optimization (ALO); big data stream; generative adversarial network (GAN); incremental learning; renyi entropy
In the advanced digital world, the data streams are generated by different sources, like sensors, social media networks, and internet of things (IoT) devices which are rapidly growing [1,2]. These data streams are characterized based on the changes in data distribution and high velocity with respect to time. As such, several research works are concentrated more on the issues of data stream classification, especially in the non-stationary data. The main issue in the classification process is the utilization of various concept drifts during the changes of data distribution based on time in unforeseen ways [3–6].
The data stream is the manifestation of big data that is characterized by five dimensions (5 V), namely value, variety, veracity, velocity, and volume. Data stream mining is the methodology used to deal with the data analysis of a huge volume of data samples in an ordered sequence [7–13]. Incremental learning follows the paradigm of machine learning methods. In this technique, the learning procedure can take place only when new examples appear and then adjusts to the one that is gained from previous examples. The ensemble learning model uses multiple base learners to integrate the predictions [14–19].
The large volume of data sequence leads to the need for data analysis techniques for special purposes, as it does not require recording the entire data stream in memory [20–23]. A method adopted for analyzing data streams explores the incremental production of informative patterns. This pattern signifies a synthesized vision of data records that were analyzed in the past and it progressively analysis for the availability of new records. On-line and incremental learning methods are used for dealing with the rapid arrival of continuous data, unbounded streams, and time-varying data [24–26]. Online environment is a non-stationary one that copes with the learning issues in big data conditions [27]. Moreover, learning frames are constant to develop more effective feature expression and renewal models. The prediction model requires several parameters and this model can always be rebuilt [28,29].
The main contribution of this paper can be summarized as follow:
• The authors propose an incremental learning framework for big data stream using ant lion fuzzy-generative adversarial network model (ALF-GAN) that provide speed, high efficiency, good convergence, and eliminates local optima.
• The proposed model is carried out in spark architecture that provides high scalability [30], in which master node and slave nodes are considered [31].
• The authors use a tri-model for the feature extraction process, which includes token keyword set, semantic keyword set, and contextual keyword set.
• The authors use renyi entropy for features selection that decreases over-fitting, reduces training time, and improved accuracy.
• The proposed framework is compared to other state-of-the-art methods using standard criteria.
• The authors explore the limitations of the current literature on big data stream mining techniques.
The paper is organized as follows: Section 2 describes the review of various data classification methods. Section 3 shows materials, methods and elaborates the proposed ALF-GAN model, Section 4 presents the results and discussion, the paper concluded in Section 5.
Most of the existing data stream classification methods used ensemble learning methods due to their flexibility in updating the classification scheme, like retraining, removing, and adding the constituent classifiers [32–35]. Most of these methods are trustworthy than the single classifier schemes particularly in the non-stationary environments [5,36]. The dynamic weighted majority (DWM) effectively maintains the ensemble of the classifier with the weighted majority vote model. The DWM dynamically generates and alters the classifiers with respect to concept drifts [37]. In the case of a classifier that misclassifies the instance, the weight is reduced to a certain value that disregards the output of the ensemble. The classifier with a weight less than the threshold value is removed from the ensemble [38–40].
Gupta et al. [41] introduced a scale free-particle swarm optimization (SF-PSO) and multi-class support vector machine (MC-SVM) for data classification. Features selected to minimize time complexity and to enhance the accuracy of classification. The authors validated their approach using six different high dimensional datasets but they failed to use the benefit of particle's learning mechanism. Lara-Benítez et al. [42] introduced an asynchronous dual-pipeline deep learning framework for data streaming (ADLStream). Training and testing process were simultaneously performed under different processes. The data stream was passed into both the layers in which quick prediction was concurrently offered and the deep learning model was updated. This method reduced processing time and obtained high accuracy. Unfortunately, this technique was not suitable if the label for each class instance was not immediately available. Casalino et al. [10] introduced a dynamic incremental semi-supervised fuzzy c-means for data stream classification (DISSFCM). The authors assumed that some of the labeled data that belongs to various classes were available in the form of chunks over time. The semi-supervised model was used to process each chunk that effectively achieved cluster-based classification. The authors increased the quality of classification by partitioning the data into clusters. They failed to integrate small-sized clusters, as they may hamper the cluster quality in terms of interpretability and structure. Ghomeshi et al. [5] introduced an ensemble method for various concept drifts in the process of data stream classification based on particle swarm optimization (PSO) and replicator dynamics algorithm (RD). This method was based on three-layer architecture that generated classification types with varying size. Each of the classification selected some amount of features from number of features in target data stream. This method takes long evaluation time when target data stream contains high number of features.
Yu et al. [29] introduced a combination weight online sequence extreme learning machine (CWEOS-ELM) for data stream classification. This method was evaluated based on the correlation and changing test error. The original weight value was determined using the adaboost algorithm. It forecast the performance using adaptable weight and with various base learners. It doesn't require any user intervention and the learning process was dynamic. This method was more effective but failed to predict for an increasing number of feature space. Gomes et al. [40] introduced an adaptive random forest (ARF) model for the classification of data streams. The authors generally increase the decision tree by training the original data in the re-sampled versions and thereby few features are randomly selected at each node. This method contains adaptive operators and a resampling model that cope with various concept drifts for various datasets. This method was accurate and used a feasible amount of resources. The authors failed to analyze the run-time performance by reducing the number of detectors. Dai et al. [43] introduced a distributed deep learning model for processing big data. They offered distributed training on the computed model of the data system. This method faced large overheads due to the adaptation layer among different schemes. Sleeman et al. [44] introduced an ensemble-based model for extracting the instance level characteristics by analyzing each class. It can learn the data from large-sized skewed datasets with different classes. The authors failed to solve the multi-class imbalanced issues in big data, like extreme class imbalance and class overlapping. Tab. 1 summarizes the studies undertaken for review:
Some of the issues faced by the existing data classification methods are explained as follows:
• The existing methods are not effective to face the issues of learning from imbalanced big data, as they are designed for small-sized datasets. Classification tasks can be more scalable by combining the data with high-performance computing architecture [44–47].
• In data stream classification, it is not pragmatic to suppose that all data are pre-classified. Therefore, there is a need to focus on classifying stream data that are slightly or completely unlabeled [48].
• The increasing rates of big data and the curse of its dimensionality produced difficult tasks in data classification [41,49].
• The adaptation to various concept drifts poses a great challenge for data steam classification when data distribution evolves with respect to time [50,51].
Soft computing (SC) techniques are an assemblage of intelligent, adjustable, and flexible problem-solving methods [52]. They are used in modeling complex real-world problems to achieve tractability, robustness, low solution cost. The singular characteristic of all SC techniques is their capacity for self-tuning, that is, they infer the power of generalization from approximating and learning from experimental data. SC techniques can be divided into five categories which are (machine learning, neural networks, evolutionary computation, fuzzy logic, and probabilistic reasoning) [53,54]. For more reading about SC, Sharma, S, et al. provide a comprehensive review and analysis of supervised learning (SL) and SC techniques for stress diagnosis in humans. They explore the strengths and weaknesses of different SL (support vector machine, nearest neighbors, random forest, and bayesian classifier) and SC (deep learning, fuzzy logic, and nature-inspired) [55]
3.2 Spark Architecture Based Big Data Stream Approach
This section describes the big data streaming approach using the proposed ALF-GAN in spark architecture. Most of the classification process requires a complete dataset to be loaded in memory before starting the process [56]. For online data classification, where the dataset size is extremely large, there is a need for scalability which is supported by spark architecture. The process of incremental learning is performed by considering the input data as big data. The proposed incremental learning process includes the phases, like pre-processing, feature extraction, feature selection, and incremental learning which are progressed in spark architecture. Let us consider n the number of input data B to be passed to the n number of slave nodes in order to perform pre-processing and feature extraction processes. From slave nodes, the extracted features are fed to the master node to achieve feature selection and the incremental learning process. Finally, the class output for each input of big data is computed at slave nodes. Fig. 1 represents the schematic diagram of the proposed ALF-GAN model on spark architecture.
3.3 Pre-Processing of Input Big Data
The input data considered to perform the learning process is defined as,
where
1. Stop word removal: it reduced the text count and enables to increase the system's performance.
2. Stemming: it is used to minimize variant word forms into a common representation termed as root.
3. Tokenization: it is the exploration of words in the sentence. The key role of tokenization is to identify meaningful words. The pre-processed data is represented as D.
3.4 Feature Extraction Using Tri-Model
The pre-processed data D is passed to the feature extraction phase, where features are extracted using a tri-model. It is designed by considering the techniques, like token keyword set, semantic keyword set, and contextual keyword set [57,58].
Token keyword set: it represents the word with a definite meaning. A paragraph may contain up to six tokens in the keyword set.
Semantic keyword set: in this phase word dictionary with two semantic relations is built. It represented as,
where, d represents keywords, b and c are the synonym and hyponym word.
Contextual keyword set: it identifies the related words by eliminating the word from irrelevant documents. It identifies the context terms and semantic meaning to form relevant context. Key terms are the initial indicators and the context terms are specified as validators to find whether the key terms are the indicators. Let us consider the training data as D, key term as
Identification of key term: let us consider the language model as M such that for each term, the keyword measure is computed as,
where
Identification of context term: for each key term, the contextual terms are needed to compute separately. The key term instances for both the relevant and irrelevant documents are computed and the sliding window W is applied to extract the key term around D. The relevant terms are denoted as
where
3.5 Feature Selection Using Renyi Entropy
After extracting the features from big data, feature selection becomes essential as it is complicated to mine and transform the huge volume of data into valuable insights [59]. The unique and the important features are selected using renyi entropy measure. It is defined as the generalization of shannon entropy that depends on the parameter r is given as,
The features selected using the entropy measure are represented as R with the dimension of
3.6 Incremental Learning Using Ant Lion Fuzzy-Generative Adversarial Network
Once, features are selected, the process of incremental learning is accomplished using the proposed ALF-GAN model. The significance of incremental learning is that while adding new samples, it does not require retraining all the samples, and hence it reduces the cost of training time and memory consumption.
The learning steps of the proposed ALF-GAN are presented in Fig. 2. Initially, the input data is considered as
The architecture of GAN is represented in Fig. 3. GAN [60] is an efficient network to learn the generative model from the unlabeled data. The benefit of using GAN is its ability to generate accurate and sharp distributions, and it does not require any form of functionality for training the generator network. GAN is composed of two models, namely the generator H and the discriminator C model, respectively. Let us consider the input H as the random noise
3.6.2 Fuzzy Ant Lion Optimization Algorithm
The algorithmic steps of the proposed ALF-GAN are explained as follows:
i) Initialization: let us consider the weights are initialized randomly.
ii) Error estimation: after computing the loss function of GAN, the error value
iii) Fuzzy bound computation: when a new data St+1is applied to the network, the error Qt+1 is computed. If
where,
where, E denotes fuzzy bound, and
where,
iv) Update weights: ALO algorithm [61–63] is used to select the optimal weight.
v) Termination: steps are repeated until the best solution is obtained. Algorithm 1 represents the pseudo-code of the proposed ALF-GAN.
In this section the authors discuss the datasets used in the implementation process of the proposed framework:
• WebKB dataset [64]: this dataset is collected from Mark craven's website. It consists of web pages and hyperlinks from different departments of computer science, namely the University of Texas, University of Wisconsin, University of Washington, and Cornell University.
• 20 Newsgroup dataset [65]: it is the popular dataset considered for the purpose of experiments that includes text applications, like text clustering and text classification. Newsgroup dataset includes 20, 000 newsgroup documents that are equally partitioned into 20 various newsgroups.
• Reuter dataset [66]: the total number of instances available in the dataset is 21578 without any missing values. The task used for the association purpose is classification and the attributes are ordered based on their categories. The characteristic features of the dataset are text.
The experiments carried out with the proposed ALF-GAN and result acquired to prove the effectiveness of this model is presented in this section.
The proposed ALF-GAN is developed in the PYTHON tool using WebKB, 20 Newsgroup, and Reuter datasets to conduct numerical experiments for evaluating the effectiveness of the ALF-GAN model.
The performance of the proposed ALF-GAN is analyzed by considering the metrics, like accuracy, MSE, and precision which are shown in Tab. 2.
The competitive methods used for analyzing the performance of the proposed model are scale-free particle swarm optimization (SF-PSO) [41], asynchronous dual-pipeline deep learning framework for data streaming (ADLStream) [42], and dynamic incremental semi-supervised fuzzy c-means (DISSFCM) [10].
The analysis made to show the effectiveness of the proposed framework by considering three different datasets is presented in this section.
4.4.1 Analysis Using 20 Newsgroup Dataset
Tab. 3 and Fig. 4 demonstrate and depict the analysis made using the proposed ALF-GAN based on the Newsgroup dataset. The analysis made with accuracy metric is shown in Fig. 4a. By considering the chunk size as 2, accuracy measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.6389, 0.6683, and 0.6992, while the proposed ALF-GAN computed the accuracy of 0.7068, respectively. When the chunk size is considered as 3, the accuracy obtained by the existing SF-PSO, ADLStream, and DISSFCM is 0.7008, 0.7114, and 0.7224, whereas the proposed ALF-GAN obtained higher accuracy of 0.72971, respectively. By considering the chunk size as 4, accuracy measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.7638, 0.7646, and 0.772758, while the proposed ALF-GAN computed the accuracy of 0.7774, respectively. When the chunk size is considered as 5, the accuracy of traditional SF-PSO, ADLStream, and DISSFCM is 0.8288, 0.8290, and 0.8311, while the proposed ALF-GAN achieved the accuracy of 0.8413, respectively.
The analysis carried out with MSE metric is portrayed in Fig. 4b. By considering the chunk size as 2, MSE measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.6652, 0.5861, and 0.2202, while the proposed ALF-GAN computed the MSE of 0.1463, respectively. The MSE measured by the conventional SF-PSO, ADLStream, and DISSFCM by considering the chunk size 3 is 0.3035, 0.1700, and 0.1164, while the proposed ALF-GAN computed the MSE of 0.1145, respectively. By considering the chunk size as 4, MSE measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.1528, 0.1390, and 0.0848, while the proposed ALF-GAN computed the MSE of 0.0703, respectively. When the chunk size is considered as 5, MSE measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.0724, 0.0544, and 0.0479, while the proposed ALF-GAN computed the MSE of 0.0416, respectively.
The analysis made with the precision metric is depicted in Fig. 4c. By considering the chunk size as 2, precision computed by the conventional SF-PSO, ADLStream, and DISSFCM is 0.76843, 0.7865, and 0.8065, while the proposed ALF-GAN computed the precision of 0.8109, respectively. By considering the chunk size as 3, precision computed by the conventional SF-PSO, ADLStream, and DISSFCM is 0.8165, 0.8229, and 0.8295, while the proposed ALF-GAN computed the precision of 0.83419, respectively. The precision obtained by the existing SF-PSO, ADLStream, and DISSFCM is 0.86312, 0.8635, and 0.8682, while the proposed ALF-GAN has the precision of 0.88099 for chunk size 4. By considering the chunk size as 5, precision computed by the conventional SF-PSO, ADLStream, and DISSFCM is 0.9053, 0.9055, and 0.9066, while the proposed ALF-GAN computed the precision of 0.91225, respectively.
4.4.2 Analysis Using Reuter Dataset
Tab. 4 and Fig. 5 demonstrate and depict the analysis made using the proposed ALF-GAN based on the Reuter dataset. The analysis made with accuracy metric is shown in Fig. 5a. By considering the chunk size as 2, accuracy measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.6929, 0.6961, and 0.7067, while the proposed ALF-GAN computed the accuracy of 0.72193, respectively. The accuracy of existing SF-PSO, ADLStream, and DISSFCM is 0.7047, 0.7144, and 0.7291, while the proposed ALF-GAN measure the accuracy of 0.7426 for chunk size 3. When the chunk size is considered as 4, the accuracy of existing SF-PSO, ADLStream, and DISSFCM is 0.781107, 0.7817, and 0.7903, while the proposed ALF-GAN computed the accuracy of 0.7907, respectively. By considering the chunk size as 5, accuracy measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.8383, 0.8471, and 0.8489, while the proposed ALF-GAN computed the accuracy of 0.8610, respectively.
The analysis carried out with MSE metric is portrayed in Fig. 5b. When the chunk size is considered as 2, MSE measured by traditional SF-PSO, ADLStream, and DISSFCM is 0.609, 0.4359, and 0.3585, whereas the proposed ALF-GAN measured the MSE of 0.356, respectively. The MSE measured by SF-PSO, ADLStream, and DISSFCM is 0.3384, 0.2357, and 0.14539, while the proposed ALF-GAN computed the MSE of 0.1441, for chunk size 3. By considering the chunk size as 4, MSE measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.2247, 0.13548, and 0.09774, while the proposed ALF-GAN computed the MSE of 0.0611, respectively. When the chunk size is considered as 5, MSE obtained by the existing SF-PSO, ADLStream, and DISSFCM is 0.0907, 0.07093, and 0.0668, whereas proposed ALF-GAN obtained the MSE of 0.0202, respectively.
The analysis made with the precision metric is depicted in Fig. 5c. By considering the chunk size as 2, precision computed by the conventional SF-PSO, ADLStream, and DISSFCM is 0.8078, 0.8097, and 0.8169, while the proposed ALF-GAN computed the precision of 0.82023, respectively. When the chunk size is considered as 3, precision computed by the conventional SF-PSO, ADLStream, and DISSFCM is 0.8214, 0.8276, and 0.8365, while the proposed ALF-GAN computed the precision of 0.8449, respectively. The precision achieved by the existing SF-PSO, ADLStream, and DISSFCM is 0.87453, 0.8749, and 0.8799, while the proposed ALF-GAN achieved the precision of 0.8900 for chunk size 4. By considering the chunk size as 5, precision computed by the conventional SF-PSO, ADLStream, and DISSFCM is 0.9111, 0.91603, and 0.91703, while the proposed ALF-GAN computed the precision of 0.9382, respectively.
4.4.3 Analysis Using WebKB Dataset
Tab. 5 and Fig. 6 demonstrate and depict the analysis made using the proposed ALF-GAN based on the WebKB dataset. The analysis made with accuracy metric is shown in Fig. 6a. By considering the chunk size as 2, accuracy measured by the conventional SF-PSO, ADL Stream, and DISSFCM is 0.48139, 0.4948, and 0.6687, while the proposed ALF-GAN computed the accuracy of 0.7442, respectively. When the chunk size is considered as 3, accuracy measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.48039, 0.5065, and 0.6917, while the proposed ALF-GAN computed the accuracy of 0.75150, respectively. The accuracy achieved by existing SF-PSO, ADLStream, and DISSFCM for chunk size 4 is 0.4832, 0.49754, and 0.6293, while the proposed ALF-GAN computed the accuracy of 0.75038, respectively. By considering the chunk size as 5, accuracy measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.4857, 0.5081, and 0.70018, while the proposed ALF-GAN computed the accuracy of 0.7625, respectively.
The analysis carried out with MSE metric is portrayed in Fig. 6b. By considering the chunk size as 2, MSE measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.3404, 0.23806, and 0.2086, while the proposed ALF-GAN computed the MSE of 0.2018, respectively. When the chunk size is considered as 3, MSE achieved by existing SF-PSO, ADLStream, and DISSFCM is 0.2531, 0.2505, and 0.1422, whereas proposed ALF-GAN computed the MSE of 0.1146. The MSE of SF-PSO, ADLStream, and DISSFCM is 0.3158, 0.2566, and 0.2211, while the proposed ALF-GAN computed the MSE of 0.1046 for chunk size 4. By considering the chunk size as 5, MSE measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.3108, 0.2361, and 0.1323, while the proposed ALF-GAN computed the MSE of 0.0835, respectively.
The analysis made with the precision metric is depicted in Fig. 6c. By considering the chunk size as 2, precision computed by the conventional SF-PSO, ADLStream, and DISSFCM is 0.4832, 0.4844, and 0.4914, while the proposed ALF-GAN computed the precision of 0.49754, respectively. When the chunk size is considered as 3, the precision measured by existing SF-PSO, ADLStream, and DISSFCM is 0.48139, 0.4858, and 0.49488, whereas the proposed ALF-GAN computed the precision of 0.49775. By considering the chunk size as 4, precision computed by the conventional SF-PSO, ADLStream, and DISSFCM is 0.4857, 0.4865, and 0.4946, while the proposed ALF-GAN computed the precision of 0.5081, respectively. The precision obtained by existing SF-PSO, ADLStream, and DISSFCM is 0.4803, 0.4828, and 0.5065, while the proposed ALF-GAN computed the precision of 0.51388 for chunk size 5.
These results show the higher performance of the proposed framework and prove that it provides higher accuracy and precision than comparative methods with lower MSE.
Tab. 6 portrays a comparative discussion of the proposed ALF-GAN model. From the below table, it is clearly showed that the proposed ALF-GAN model obtained better performance with the Reuter dataset for the metrics of accuracy, MSE, and precision. By considering the chunk size as 5, accuracy measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.8383, 0.8471, and 0.8489, while the proposed ALF-GAN computed the accuracy of 0.8610, respectively. By considering the chunk size as 5, MSE measured by the conventional SF-PSO, ADLStream, and DISSFCM is 0.0907, 0.07093, and 0.0668, while the proposed ALF-GAN computed the MSE of 0.0202, respectively. By considering the chunk size as 5, precision computed by the conventional SF-PSO, ADLStream, and DISSFCM is 0.91119, 0.91603, and 0.91703, while the proposed ALF-GAN computed the precision of 0.9382, respectively.
This paper provides a framework for big data stream classification in spark architecture using ALF-GAN. The proposed model achieved many merits such as scalability through using spark architecture. The incremental learning model provides high accuracy and has the ability to deal with the rapid arrival of continuous data. It uses GAN that can classify data streams that are slightly or completely unlabeled data. Renyi entropy is used to select features that decrease over-fitting, reduces training time, and improved accuracy. ALO algorithm provides speed, high efficiency, good convergence, and eliminate local optima The results showed that the proposed ALF-GAN obtained maximal accuracy which is 0.8610, precision is 0.9382, and minimal MSE value of 0.0416. The future work of research would be the enhancement of classification performance by considering some other optimization methods.
Funding Statement: Taif University Researchers Supporting Project Number (TURSP-2020/126), Taif University, Taif, Saudi Arabia.
Conflicts of Interest: The authors declare that there is no conflict of interest regarding the publication of the paper.
This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |