TY  - EJOU
AU  - Sun, Guang  
AU  - Xiang, Huanxin  
AU  - Li, Shuanghu  

TI  - On Multi-Thread Crawler Optimization for Scalable Text Searching
T2  - Journal on Big Data

PY  - 2019
VL  - 1
IS  - 2
SN  - 2579-0056

AB  - Web crawlers are an important part of modern search engines. With the development of the times, data has exploded and humans have entered a “big data era”. For example, Wikipedia carries the knowledge from all over the world, records the real-time news that occurs every day, and provides users with a good database of data, but because of the large amount of data, it puts a lot of pressure on users to search. At present, single-threaded crawling data can no longer meet the requirements of text crawling. In order to improve the performance and program versatility of single-threaded crawlers, a high-speed multi-threaded web crawler is designed to crawl the network hyper-scale text database. Multi-threaded crawling uses multiple threads to process web pages in parallel, combining breadth-first and depth-first algorithms to control web crawling. The practice project is based on the Python language to achieve multi-threaded optimization network hyper-large-scale text database-Wikipedia book crawling method, the project is inspired by the article on the Wikipedia article in the Big Data Digest public number.
KW  - Multi-threading
KW  -  text database
KW  -  optimization
KW  -  breadth-first search
KW  -  depth-first search

DO  - 10.32604/jbd.2019.07235