Application of Quicksort Algorithm in Information Retrieval

: With the development and progress of today’s network information technology, a variety of large-scale network databases have emerged with the situation, such as Baidu Library and Weipu Database, the number of documents in the inventory has reached nearly one million. So how do you quickly and effectively retrieve the information you want in such a huge database? This requires finding efficient algorithms to reduce the computational complexity of the computer during Information Retrieval, improve retrieval efficiency, and adapt to the rapid expansion of document data. The Quicksort Algorithm gives different weights to each position of the document, and multiplies the weight of each position with the number of matches of that position, and then adds all the multiplied sums to set a feature value for Quicksort, which can achieve the full accuracy of Information Retrieval. Therefore, the purpose of this paper is to use the quick sort algorithm to increase the speed of Information Retrieval, and to use the position weighting algorithm to improve the matching quality of Information Retrieval, so as to achieve the overall effect of improving the efficiency of Information Retrieval.


Introduction
With the rapid development of Internet technology and the increasingly widespread use of the Internet, more and more information needs to be stored in the form of electronic data. How do you find the information you want in such a huge data storage warehouse? In response to this demand, information retrieval technology has emerged. Information Retrieval technology is one of them [1]. Compared with the original immature Information Retrieval technology, this type of technology has now been greatly improved and gradually matured. Literature retrieval is an important way for researchers to obtain resource information, and it has become a very important field in Information Retrieval. Scientific literature retrieval can help researchers learn from and summarize the research results of predecessors. It can not only promote the rapid development and utilization of literature resources, but also avoid repeated research and other phenomena [2].
The previous traditional Information Retrieval techniques generally have a single function. Either only considers word frequency and ignores the document value manifestation brought by the number of references between users and the number of document downloads, or considers the latter and ignores the former, and ultimately cannot Retrieve documents that are closest to user needs, which reduces user experience. On the basis of combining these loopholes, this paper further proposes a comprehensive idea, that is, to increase the function of users to independently select more detailed requirements, and finally meet the requirements of full accuracy of Information Retrieval [3]. Furthermore, this paper is committed to achieving comprehensiveness and accuracy of Information Retrieval, and at the same time, it uses a Quicksort Algorithm to sort and output the documents. The Quicksort Algorithm can achieve the fastest sorting when there are many and disorderly arranged data. The speed is very suitable for the huge amount of literature nowadays, and it can well meet the requirements of rapid Information Retrieval. This paper simulates the experiment under ubuntu with C++ environment installed, and finally proves that the research content of the paper is correct and can be implemented. The Quicksort Algorithm can improve the Information Retrieval rate very well without being affected by the hardware equipment, and has real application prospects.

Related Works
One of the core problems of Information Retrieval technology is to retrieve the results through a certain rule algorithm, and then use the Sorting algorithm to sort and output the retrieval results in a certain order [4]. There have been many research precedents for retrieval technology at home and abroad. The generalization can be divided into three generations: the first-generation Information Retrieval system based on word frequency, the second-generation Information Retrieval system based on links, and the third-generation Information Retrieval system based on intelligent sorting [5]. Take the threegeneration Information Retrieval system as a clue to introduce the research status at home and abroad [6].
The first-generation Information Retrieval system based on word frequency is sorted according to the frequency and position of the retrieved keywords in the document [7]. Its operating principle is: the higher the number of search terms in a document and the more important the Position, the greater the correlation between the document and the search term, the TFIDF (Term Frequency-Inverse Document Frequency) algorithm can better handle the relationship between the frequency of the search term and the position where it appears, and the relevance score is calculated for ranking, which is considered to be this One of the most important inventions of the stage [8][9].
Next is the second-generation Information Retrieval system based on links. According to historical evidence, we know that although the PageRank algorithm improves the efficiency of Google's web search system, it only determines the importance of the document by considering the number of times the document has been cited, while ignoring the relevance of the content of the document itself and the user's search terms [10][11]. Although the recommended literature given to users is of high value and authority, it is not what users need most [12][13].
The third-generation Information Retrieval system is to solve the problem of the single retrieval result of the second-generation retrieval system. Intelligent sorting is dedicated to providing personalized services and realizing intelligent retrieval of documents [14][15][16]. What is intelligent retrieval? Even if the retrieval technology is more user-friendly. Intelligent retrieval technology can analyze the relevant keywords of the retrieved keywords on the current Internet, increase the semantic retrieval function and user feedback function, integrate these for personalized analysis, and finally select and arrange the most relevant to the user's search terms. Documents that can meet user needs. Therefore, the third-generation Information Retrieval system solves the problem of single and inaccurate Information Retrieval results.

The meaning of Quicksort Algorithm
In 1962, Tony Hoare developed a sorting algorithm that relied on recursion, called a Quick sort Algorithm. The Quicksort Algorithm adopts a divide-and-conquer method. In the average state, the time complexity of the Quicksort Algorithm is O(nlogn), that is, nlogn comparisons are required to quickly sort n data.
The algorithm rules of the Quicksort Algorithm can be stated as: Pick an element from the sequence to be sorted and use it as the "benchmark". Generally, the first number in the sequence is selected as the benchmark.
Advantages of the Quicksort Algorithm: History has proved through countless experiments that the Quicksorting Algorithm has a speed advantage over other algorithms when the larger and more disordered the sequence to be sorted is. Nowadays, the number of documents that can only be described as extremely large is suitable for Quicksorting Algorithms. Under this condition, the advantages of Quicksort are more obvious.

Application of Quicksort Algorithm
In the process of Information Retrieval, we multiply the number of times the search term appears in a certain position of the document and the weight of that position to obtain a sub-Eigen value, and then add the sub-Eigen values of all the positions of the document as the relative The Eigen value of this search term. The eigenvalues of all documents form an unordered number sequence. At this time, we use the Quicksort Algorithm to sort these eigenvalues, and output the documents with the largest eigenvalues first to meet the user's Information Retrieval requirements. In this process of Information Retrieval, the Quicksort Algorithm has played an important role. We know that, in general, the time complexity of the Quicksort Algorithm is O(nlogn), which is significantly better than the O(n2) time complexity of some traditional sorting algorithms such as Selection Sort, Swap Sort, and Insertion Sort. Today, the number of documents on the Internet is increasing and becoming more and more complex. In the case that the larger the sequence to be sorted, the more disorderly it is, the Quicksort Algorithm is also superior to some advanced sorting algorithms with O(nlogn) time complexity, such as Merge Sort. In this way, in today's rapid expansion of the number of documents, the Quicksort Algorithm has great advantages to be used in Information Retrieval, and has great application prospects.

Technical Basis of Information Retrieval Technology
When the user enters the word he wants to search in the search box, the search engine searches the document resource database according to the user search word, and when it finds a document that matches the user search, it uses a preset algorithm to calculate the document Compare the matching degree of search terms. Use the same method to retrieve the relevance of each relevant document in the literature resource database, and then return the corresponding documents to the user according to the order of relevance. To facilitate understanding, this paper uses word frequency and location weighting algorithms (that is, giving different weights to the title, subtitle, abstract, text, reference of the document, etc., and then multiplying the location weight with the matching degree of the location to obtain a sub Eigen value, add the Eigen values of all positions to get a final Eigen value) Calculate the Eigen value, use the Quicksort Algorithm to sort the Eigen values, and then sort and output the documents in the sorted sequence. In order to better meet the needs of users, we preset several priority selection buttons under the search interface. When users pay attention to the matching degree of search words in a certain position of the document, they can click the button and the background will check that position. The weight is weighted. Through this method, the document resource database can efficiently retrieve documents that match the user's needs.

Quicksort Algorithm Design
Assuming that the online literature resource library to be selected already exists, the order of the literature is random. Simulate the user's input of search terms, regard the search terms as a pattern string, and the documents in the resource library as the target string. Match the target string and the pattern string formed by each document (KMP Algorithm principle). If there is a segment equal to the pattern string in the target string, that is, the target substring, it means that a match is successful, and the document Eigen value is weighted once Processing, otherwise the matching is unsuccessful.

Design and Calculation of Document Matching
A document resource database of 15 documents has been simulated and established, simulating user needs to input search terms, the search terms are used as pattern strings, and the documents to be retrieved are used as target strings, and matching is performed according to the KMP (The Knuth-Morris-Pratt) algorithm.
Set the pattern string to the sliding window to start matching with the target string one by one. The matching process is shown in the following simulation: First match: Target string X Y X Y Z X Y Z X Z Y X Y = = != Pattern string (search keywords) X Y Z X Z Second match: Target string X Y X Y Z X Y Z X Z Y X Y = = = = != Pattern string (search keywords) X Y Z X Z Third match: Target string X Y X Y Z X Y Z X Z Y X Y = = = = = Pattern string (search keywords) X Y Z X Z In this simulation display, when the first match is performed, the third character is not equal. At this time, according to the principle of the KMP algorithm, the pattern string slides back two characters, and the third character is compared one by one again. When encountering a situation where the comparison characters are not equal again, slide and compare according to the same principle until the pattern string slides to the end of the target string.

Design and Calculation of Document Eigenvalues
How does the matching degree of the user's demand reflect? It can be reflected in this way. First, we assume that when the searched matching position is at the document title, a certain weight is added to the document, and the corresponding weight needs to be added for each match. Similarly, when the searched matching position is in the subtitle, in the text, or in the document, the specified weight is added, and the weight is added once for each match. In addition, in order to consider the value of the literature itself and the fluidity brought about by mutual references between the literature. We set that when a document is cited once, it also needs to be marked once, and the corresponding weight is added to increase its relevance. This requires that the documents in the database have established links. The more citations, the more authoritative and valuable the documents, and should be output first. Secondly, the number of downloads of a document can also reflect the needs of users. A document is marked once every time it is downloaded, weighted, and finally the corresponding Eigen value of each document can be obtained according to the formula.
The above fully demonstrates the method of using position weighting to calculate Eigen values to represent the relevance of documents in the conventional mode, and user needs are further considered here. If the user pays more attention to the matching degree of the terms in the title when searching documents, then we will weight the matching weight at the title to meet the needs of this user. Similarly, when users feel that the degree of matching in the text is more important, we give the weight of the text a proper weight. How to show this choice? We envisage adding a few more priority matching buttons on the Information Retrieval interface, giving priority to the corresponding positions, and users can choose by themselves.
We preset the weight settings as shown in Table 1. According to the above rules, the Information Retrieval system is constructed. When the user enters the information to be retrieved in the search box, the program starts to analyze and calculate the Eigen value of each document in the resource library. The Eigen value calculation principle is R = ∑[(The weighted coefficient + ( Priority weighting)) * Matching success times], the finally calculated Eigen value defaults to the retrieval relevance, importance, and user demand of the corresponding literature, but these Eigen values are still arranged in disorder. At this time, it is necessary to introduce a Quicksort Algorithm, and use the Eigen value of the literature as the sorting element to sort and output the literature in the resource library, so that users can obtain better literature resources first. The use of Quicksort Algorithm is to improve the efficiency of the system, so that users can retrieve the desired results as quickly as possible.

Implementation and Analysis of the Quicksort Algorithm
The working principle of quick sort is Divide and Conquer, namely, a huge problem that needs to be dealt with is transformed into several small problems. These small problems are essentially the same as the original problem, but they are far less complex than the original problem. In this way, the decomposition layer by layer is approached successively, and finally the big problem is solved. In the sorting process, introducing the idea of quick sorting can effectively improve the efficiency of Information Retrieval.
Use the eigenvalue of the document as the key, and use the Quicksort Algorithm and several traditional sorting algorithms to sort the output, and compare their operating efficiency. After actual simulation, we will find that the Quicksort Algorithm is significantly better than the traditional sort algorithm O(n^2) in time complexity. Output the sorted documents, that can meet the user's retrieval needs. The following is a comparison simulation with a set of eigenvalues.
After a predetermined Weighted Rule, the feature value of each document is calculated, and finally the 15 documents in the simulated resource library have obtained their eigenvalue, and these eigenvalues are recorded on each document as a mark of the document. As shown in Table 2, at this time, these 15 eigenvalues are still out of order and cannot be provided to users. At this time, the Quicksort Algorithm needs to be executed.

The Advantages of the Quicksort Algorithm over Merge Sort Algorithm
The time complexity of Merge Sort is also O(nlogn), which is also better than traditional sorting algorithms. Comparing it with Quicksort Algorithm, it can intuitively reflect the advantages of Quicksort Algorithm over Merge Sorting algorithms and other sorting algorithms.
1: Now compare the Quicksort Algorithm and Merge Sort by simulation experiment: Experimental environment: ubuntu operating system with configured C language and C++ environment.
(1) Merge Sort The operation result is shown in Fig. 1: The comparison shows that when the data to be sorted is small, the Quicksort Algorithm may be faster than the Merge Sort algorithm (because only one experimental result is simulated, so no conclusion can be drawn!), we will further increase the length and the degree of randomness fully proves that when the data is large enough, the Quicksort Algorithm has an absolute advantage over the Merge Sort.
Experimental environment: Ubuntu operating system with configured C language and C++ environment.
Here we need to slightly modify the previous algorithm, add a Random function to generate random numbers, and sort them. When the array is greater than 700, the execution steps can be used to replace the running time: (1) The Merge Sort algorithm randomly calls part of the function code: int const n(700); int a[n]; srand((int)time(NULL)); for(int i=0;i<n;i++) a[i]=rand(); mergeSort(a,0,n-1); for(int i=0;i<n;++i){ cout<<a[i]<<" "; if((i+1)%10==0) cout<<endl;} cout<<endl; cout<<"The number of execution steps is:"<<count<<endl; //count set as a global variable return 0; The results of counting the number of steps performed when the array length is 700, 800, 900, and 1000 are shown in Fig. 3 to Fig. 6:       It can be seen from Fig. 11 that when the data becomes larger and larger, the time complexity of the Quicksort Algorithm is almost O(nlogn), and the time complexity of the Merge Sort has far exceeded O(nlogn). Since the data is randomly generated, it can basically represent generality. Therefore, it can be proved that the Quicksort Algorithm is better than the Merge Sort when the number of permutations increases. the Quicksort Algorithm is more adaptable to the increasing number of documents, and can better improve the efficiency of Information Retrieval!

Conclusion
In recent years, with the increasing number and variety of documents on the Information Retrieval platform and the ever-expanding demand of users, the society urges us to put forward higher requirements for the technology and efficiency of the Information Retrieval. In the design ideas of the Information Retrieval system in this paper, we fully refer to the more common design ideas of position weighting and user behavior feedback in current Information Retrieval engines, and combine the characteristics of Information Retrieval to increase the function of independent selection by users. It further improves the comprehensive indexes such as the matching degree, value, importance, and user needs of the retrieved documents. While improving the retrieval accuracy, the quick sorting algorithm is introduced to improve the sorting rate of document eigenvalue, optimize the performance of the Information Retrieval system, and finally achieve the effect of searching documents that meet the needs at the fastest speed, which has great applications prospect.