Please use this identifier to cite or link to this item: http://localhost:8081/xmlui/handle/123456789/2183
Title: DESIGN OF IMPROVED AND EFFICIENT INDEXING ALGORITHM FOR TEXT RETREIVAL
Authors: Shrivastava, Mridul
Keywords: ALGORITHM;DATA MANAGEMENT;DATA INDEXING;ELECTRONICS AND COMPUTER ENGINEERING
Issue Date: 2012
Abstract: Web-scale search engines deal with a volume of data and queries that forces them to make use of an index partitioned across many machines. Two main methods of partitioning an index for distributed processing have been described in the literature. In document partitioning, each processor node holds the information for a subset of documents, while in term partitioning, each node holds the information for a subset of terms. The major drawback in these approach are that the redistribution of data during the merge process make the indexing process tedious. So, we are presenting a novel distributed indexing algorithm which makes use of some novel data structures which helps in making merge process fast. Our algorithm also helps in maintaining proper load balancing as now the no special nodes are assigned for the merging process as is done in previous algorithm. We have presented an efficient alternative to the pipelined approach and the ad-hoc non-pipelined approach. Our method combines non-pipelined disk-accesses, a heuristic method to choose between pipelined and non-pipelined posting list processing, and an efficient query routing strategy. According to the experimental result,. our method provides a higher throughput than the pipelined approach, a shorter latency than the non-pipelined approach, and significantly improves the overall throughput/latency ratio.
URI: http://hdl.handle.net/123456789/2183
Other Identifiers: M.Tech
Research Supervisor/ Guide: Kumar, Padam
metadata.dc.type: M.Tech Dessertation
Appears in Collections:MASTERS' THESES (E & C)

Files in This Item:
File Description SizeFormat 
ECDG21949.pdf2.34 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.