Enhanced Neuro-Fuzzy-Based Crop Ontology for Effective Information Retrieval

Ontology is the progression of interpreting the conceptions of the information domain for an assembly of handlers. Familiarizing ontology as information retrieval (IR) aids in augmenting the searching effects of user-required relevant information. The crux of conventional keyword matching-related IR utilizes advanced algorithms for recovering facts from the Internet, mapping the connection between keywords and information, and categorizing the retrieval outcomes. The prevailing procedures for IR consume considerable time, and they could not recover information proficiently. In this study, through applying a modified neuro-fuzzy algorithm (MNFA), the IR time is mitigated, and the retrieval accuracy is enhanced for trouncing the above-stated downsides. The proposed method encompasses three phases: i) development of a crop ontology, ii) implementation of the IR system, and iii) processing of user query. In the initial phase, a crop ontology is developed and evaluated by gathering crop information. In the next phase, a hash tree is constructed using closed frequent patterns (CFPs), and MNFA is used to train the database. In the last phase, for a specified user query, CFP is calculated, and similarity assessment results are retrieved using the database. The performance of the proposed system is measured and compared with that of existing techniques. Experimental results demonstrate that the proposed MNFA has an accuracy of 92.77% for simple queries and 91.45% for complex queries.


Introduction
The information on the webspace is augmenting at a vast speed with the progressive improvement in information technology. With dynamic research being conducted for over 30 years, only information retrieval (IR) has become omnipresent with the World Wide Web initiation. The retrieval of information proficiently and precisely has become highly imperative [1]. Although most IR systems depend on ontologies, they recurrently practice one of two subsequent extreme methodologies [2]; either they utilize the maximum of the ontology semantic expressiveness and, hence, necessitate intricate query languages,

Related Work
A paper for publication should be divided into multiple sections, including a title and full names of all authors. Ming et al. [18] recommended a methodology for hastening semantic object search and vegetable trading information detection by utilizing a Steiner tree (ST). Sequences of ontology construction methodologies were established through exploration in accordance with domain ontology for vegetable transaction facts. Jena2 proffered a rule-related reasoning engine. The results indicated that the recall and precision rate of an ontology-related IR system were much superior to those of a keyword-related IR system and presented few concrete values. However, the IR's efficiency was extremely low.
Rajendran et al. [19] introduced a multilevel object relational similarity (MORS)-related image retrieval algorithm. Manifolds were educated through extricating the objects and supporting the labeling with the administered learning procedure. Concerning the MORS value, a solo class was acknowledged [20]. Regarding the acknowledged semantic class, the upshot was raised. The presented method augmented the image-mining performance; nonetheless, it rendered a high false ratio in the retrieval procedure [21].
Sayed et al. [22] proposed an Ontological Search Engine termed Ibri College of Applied Sciences Engine Ontology (IBRI-CASONTO) for the Academies of Applied Sciences, Oman. This engine assisted Arabic and English languages [23]. It engaged two sorts of exploration: a keyword-related search and a semantic-related search. IBRI-CASONTO was regarded a divergent technology of Resource Description Framework data along with ontological graph. Thus, it rendered some unrelated data in a big dataset [24].
Selvalakshmi et al. [25] suggested a semantic IR system that employed feature selection and classification for ameliorating the relevancy score (RS). An intelligent fuzzy rough set-centered feature selection algorithm and an intelligent ontology and Latent Dirichlet Allocation-centered semantic IR algorithm were employed for IR [26]. Outcomes exhibited that the system augmented the RS to 98%. Nevertheless, the main downside of this system was that it was unsuitable in a big-data environment.

Modified Fuzzy Algorithm and Crop Ontology-Based IR
For achieving rapid and efficacious IR, a reformed neuro-fuzzy-centered crop ontology system is recommended. Three strides are encompassed by the proposed work: i) development of a crop ontology, ii) implementation of the IR system, and iii) processing user query. The first progression of the proposed method is dataset creation, in which the appropriate information concerning farmers and their crops is accumulated from various resources. Next, these data are reprocessed through restructuring and repositioning them into a more understandable manner in the Apache Jena Fuseki database. Then, a crop ontology is established through performing knowledge attainment, OWL file generation, visualization, and ontology evaluation. The derived files are again saved in the Apache Jena Fuseki database. Subsequently, IR system implementation is executed by performing the following operations on the dataset: establishing closed frequent patterns (CFPs), hash code generation, and applying a modified neuro-fuzzy algorithm (MNFA) for IR. Afterward, CFPs are established for the data values. Thereafter, hash values are created for all CFPs by using the Secure Hash Algorithm (SHA) 512 algorithm. Lastly, a hash tree is generated concerning the hash values for CFPs. For the information recovery step, MNFA is applied, in which k-medoids clustering comprises neuro-fuzzy stratification. The architecture of the proposed technique is presented in Fig. 1.

Development of a Crop Ontology
This phase comprises three main steps. They are knowledge acquisition, ontology development, and ontology evaluation. First, the information regarding each crop and their diseases and the solution for the diseases are accumulated from the website. This sort of data collection for crops is known as knowledge acquisition. Once knowledge acquisition is done, the crop ontology is advanced through creating an OWL file, which comprises massive facts about the accumulated crops. Centered on the similarity of semantics between the indexed data and the user query, the ontology-centered IR system retrieves data. Consequently, only the pertinent data are retrieved during the process of information recovery, and the recovery period is decreased. After the OWL file creation, the file is envisaged. For file creation and visualization, the proposed system utilizes Protégé that is inbuilt with the Eclipse IDE platform. Protégé is a free, open-source ontology editor and a knowledge management system. It proffers a graphic user interface for delineating ontologies. Similar to Eclipse, Protégé is a structure in which different projects advocate plug-ins. Such an application is inscribed in Java and utilizes Swingheavily for creating the user interface. The generated crop ontology in the recommended system by utilizing the Protégé tool is demonstrated in Fig. 2.
Then, the evaluation of the ontology phase is accomplished by abstracting the values from the OWL file formed in the foregoing stage and utilizing the reasoning in the Protégé tool. The reasoning is the task of deriving implicit facts from a set of proffered explicit facts. The derived details are stored in the Apache Jena Fuseki dataset.

Dataset Creation
The initial phase of the proposed technique is data collection. Here, specifics of crops and farmers are collected from divergent resources and kept as a dataset. The dataset comprises two sorts of information. They are the details of farmers and crops. The farmers' details comprise farmer's name, address, contact number, and other information; the crops' details comprise the locality of the paddy field, yield, the duration of beginning and ending, paddy type, and the length of paddy growth.

Preprocessing
Preprocessing of the proposed method encompasses reorganizing and the prearrangement of the farmers' and crops' details. During this phase, after performing the analysis on farmers' data, the following data are removed from the dataset:

Execution of the IR System
The execution of the IR system consists of three processes: creating CFPs, hash code generation, and MNFA application. First, CFPs are established from derived values (crops' and farmers' details). Next, the hash code for all CFPs is created by utilizing the SHA 512 algorithm, and a hash tree is created concerning the hash values for every CFP. A hash value is principally utilized in the hash tree for indexing. Every leaf node has the CFPs indexed and a matching hash value. Lastly, MNFA is utilized for powerful IR. Each of the progressions is elucidated in the following sections.

Constructing CFPs
To detect the CFPs in the dataset, frequent patterns (FPs) are established. The patterns indicate the number n of data in the dataset. From the FPs, the CFPs are found. "n" number of forms in the dataset are expressed as where D s signifies the dataset, and P n denotes the number of forms in the dataset.
Afterward, the FPs from the dataset are computed. The FPs of the dataset refer to the addition of occurrences of a precise form on the dataset. They are also acknowledged as the count of patterns or the frequency of patterns.
where F n indicates the number of FPs.
where C P n ð Þ symbolizes the CFPs of the dataset.

Hash Code Generation
After CFP discovery, the hash value for every CFP is created by utilizing the SHA 512 algorithm. The SHA 512 algorithm is a hash algorithm that utilizes a one-way hash function. This algorithm is an advanced version of prevailing hash algorithms called SHA 0, SHA 1, SHA 256, and SHA 384 algorithms. The SHA 512 hash function collects the input data of any size and creates a message digest of 512-bit size and 1024-bit block length. First, message bits are amplified with extra bits for forming a multiple of 1024 bits. Next, this block is split into smaller parts of 1024 bits. The chief block is integrated with the initializing vector, and the hash code is created. The consequent blocks are integrated with the formerly generated hash codes. Afterward, one hash tree is initiated, and the hash code values obtained are traversed to the hash tree's leaf node. Whether the leaf node is full is ascertained. If not, then the hash values are inserted to the hash tree. The insertion of each hash value into the hash tree is performed in this way. Lastly, a hash tree is built using the formed values of hash linked with the CFPs. Hash values are utilized to index in the hash tree. Each leaf node signifies the CFPs indexed with an associated hash value.

MNFA Application
All CFPs from leaf nodes are provided as the input to MNFA. The recommended MNFA is an amalgamation of two practices. They are KMA and NFA. KMA is combined with NFA to enhance the significance of IR in the proposed work. Thus, the proposed system of IR is termed as MNFA. In MNFA, initially, the disarranged CFPs are clustered via KMA. Next, the clustered CFPs are provided as the input to NFA. In NFA, rules are created, and hash values with specific CFPs are tested concerning the generated rules. Here, NFA is first reformed through a clustering method by utilizing KMA. Therefore, the recommended NFA is termed MNFA. The phases in MNFA are explicated as follows.
KMA functions in two phases: build and swap. First, k centrally located objects are sequentially chosen and regarded as the first medoid. Next, KMA examines the ensuing condition. The targeted function is mitigated by substituting (swapping) a certain medoid with a nonchosen object, then the swap is implemented. This process is repeated until the targeted function is not mitigated. The KMA steps are elucidated below.
Step 1: k number of random points, such as the medoids from the itemized n CFPs of the dataset, are chosen.
Step 2: Each data point of the closest medoid utilizing any of the distance metrics is associated. The distance (dist i ) between every pair of all CFPs centered on the selected dissimilarity measure is computed.
where CFP 1i and CFP 2i denote the "two" data points of CFP i . The cluster centroids s i are calculated as where CFP ic is the distance between data points i and c. J CFPs having the first j smallest values are selected as initial medoids. The initial cluster outcome is acquired by assigning every CFP value to the nearest medoid.
Step 3: The cost of the configuration decreases.
For each selected medoid S m and nonselected object NS m , S m and NS m are swapped, each data point of the closest medoid is related, and the total swapping cost TS c (total of distances of points to their medoids) is recalculated.
If TS c < 0 ; S m is replaced with NS m : If TS c of the configuration is augmented in Step 3.1, then the swap is undone.
Step 4: The present medoid in every cluster is updated via replacement with a new medoid. Every object is allocated to the closest medoid, and the result of the cluster is obtained.
Step 5: The distance from every CPF to their medoid is summed. The algorithm is stopped if the sum is equivalent to the former one. Otherwise, Step 2 is repeated. Lastly, the attained K number of clusters is expressed as The pseudo code of the k-medoid algorithm is presented in Fig. 3. The hash code values are grouped, and the KMA outcomes are integrated into the neuro-fuzzy system in IR by following the above given steps of KMA. A neuro-fuzzy system is basically a fuzzy system that utilizes the learning algorithm motivated by the theory of neural network for determining its parameters (fuzzy sets and rules) through data sample processing. A neuro-fuzzy system is always elaborated as a system of fuzzy rules. It likely generates the system through training data from scratch, given that it is feasible to commence it by using former knowledge in the practice of fuzzy rules. Such systems are generally characterized as distinct multilayer feed-forward neural networks. The NFA structure contains five layers. Here, the hash valueclustered data for a specific CFP resulted from the former step are the inputs to the first layer of NFA. In "five" layers, the first and fourth layers have adaptive nodes, whereas the other layers comprise fixed nodes. The information concerning a farmer or crop is precisely retrieved by utilizing NFA. The two elementary rules of NFA are stated in the following equations.
where A i , T i , A iþ1 , and T iþ1 denote the fuzzy sets. H 0 and H 1 present the divergent clustered hash values derived from KMA. x i , y i , z i , x iþ1 , y iþ1 , and z iþ1 values are the parameter set. The layers in NFA are elucidated as follows: Layer 1: This layer is termed the fuzzification layer. Every node in this layer is an adaptive node with a function.
where H i is the input to node i. Every node acclimatizes to a function parameter. The output of each node is a grade of membership value that is given by the membership function (MF) input. The MF utilized in NFA is the bell MF, as indicated in the succeeding equation.
where x i , y i , and z i are the MF parameters, which may deduce the figure of the MF. The parameters are indicated as the premise parameters.
Layer 2: Each node in this layer is an immobile node marked as X , whose output is the total of all incoming signals.
The throughput of this layer L 2;i indicates the firing strength of a rule.
Layer 3: Every node in this layer is a firm node marked as K. All these nodes have a purpose, that is, computing the ratio of the i th rule's FS to the total of all the rules' FSs. The outcome is marked as the normalizing FS. The mathematical delineation of the formulations is elucidated as follows: For accessibility, the throughput of this layer is termed normalized firing strength.
where B i suggests the standardized FS based on the prior layer, and Rules i indicates the rule of the system. The employed parameters are labeled succeeding parameters.
Layer 5: The solo node of this layer is a firm node called NFA, which totals the complete output as the sum of all incoming signals. In this layer, the circle node is termed X

Processing of User Query
After the execution of the IR system, user query processing is performed by utilizing MNFA. The query of the user by using the Semantic Web search engine is provided as an input. The testing procedure of the proposed method is similar to the training process. Here, the user input query is preprocessed, and the CFPs for the input query are established. Next, the SHA 512 algorithm is accomplished on CFPs for deriving hash values, and a hash tree is created concerning the hash values that are created. Then, the CFP hash values of the input query are equated with the trained database, and the result is retrieved by utilizing MNFA. An assessment of work is accomplished by utilizing the PageRank (PR) algorithm, given that the work is completed at the center of the Semantic Web search engine.

Result and Discussion
The recommended IR methodology utilizing MFNA is applied in the Java working platform.

Performance Assessment
In this subsection, the performance of the proposed MNFA is equated with that of the prevailing techniques of IBRI-CASONTO and ST regarding precision, recall, F-score, accuracy, returned vs. effective information, retrieved results, and query retrieval time. Such measures are matched for divergent kinds of input queries, such as simple and complex queries. The acquired outcomes of the recommended MNFA and prevalent methods are presented in Tab. 1.
Tab. 1 presents the comparison outcomes of the suggested MNFA and prevailing IBRI-CASONTO and ST for uncomplicated and intricate queries regarding precision, recall, F-score, and accuracy. For simple and intricate queries, the precision values of the recommended MNFA are 96.56 and 94.45, respectively. By contrast, IBRI-CASONTO and ST provide 52.49 and 95.2 for simple queries and 50.12 and 93.56 for

Performance Analysis for Simple Queries
The demonstration of the proposed MNFA and prevailing methods regarding precision, recall, F-score, and accuracy for simple queries is designed in Fig. 4.

Performance Investigation for Intricate Queries
A comparison of the proposed MNFA and prevailing IBRI-CASONTO and ST on intricate queries based on precision, recall, F-score, and accuracy is demonstrated in Fig. 6. .34 for F-score, respectively. In all measures, the entire flow of IBRI-CASONTO is excessively slow. ST and MNFA present comparatively good performance. However, the recommended MNFA attains the highest outcomes for every metric. The performance of the proposed MNFA and prevailing techniques regarding returned vs. effective information, retrieved outcomes, and query retrieval duration is elucidated in Fig. 7.  92%, whereas those of the prevailing IBRI-CASONTO and ST are only 66% and 78%, respectively. Thus, MNFA provides the greatest retrieval outcomes. For the query retrieval duration MNFA takes 13964 ms, whereas the prevailing IBRI-CASONTO and ST take 18984 and 16687 ms, respectively. MNFA consumes minimal time in retrieving the input query compared with IBRI-CASONTO and ST. In sum, from the graph mentioned above, MNFA provides the maximum percentage of retrieved outcomes and takes minimal time to retrieve the query.

Conclusions
In this paper, an ontology-related IR system is proposed using MNFA. Three methodologies are included in this system: training, testing, and assessment. After the three phases are performed, the results of the proposed MNFA are compared with those of the prevailing technologies of IBRI-CASONTO and ST. Here, the performance valuation of the proposed and existing methods is accomplished for two sorts of queries: simple and complex queries. For both sorts of queries, the valuation is made in consideration of precision, recall, F-score, accuracy, returned vs. effective information, retrieved outcomes, and query retrieval duration. The proposed MNFA proffers an accuracy of 92.77% for simple queries and 91.45% for complex queries. It retrieves 92.77% and 92% of the information for simple and intricate queries, respectively. The time needed by the proposed MNFA for retrieving the query is 9887 ms for simple queries and 13964 ms for intricate queries. Hence, the recommended MNFA attains superior performance for all aforesaid performance metrics and is comparable with prevailing techniques for simple and intricate queries. In the future, the proposed work can be extended by incorporating natural language processing to lessen the retrieval time.