Please use this identifier to cite or link to this item: http://localhost:8081/xmlui/handle/123456789/9485
Full metadata record
DC FieldValueLanguage
dc.contributor.authorSatankar, Hemlata-
dc.date.accessioned2014-11-19T09:38:15Z-
dc.date.available2014-11-19T09:38:15Z-
dc.date.issued2005-
dc.identifierM.Techen_US
dc.identifier.urihttp://hdl.handle.net/123456789/9485-
dc.guideSarje, A. K.-
dc.description.abstractWith the unabated growth of data amassed from business, scientific and engineering disciplines, cluster analysis and other data mining functionalities, play a more and more important role. They can reveal previously unknown and potentially useful patterns and relations in large databases. One of the most significant challenges in data . mining is scalability - effectively handling large databases with linear computational complexity and limited main memory. This dissertation addresses the problem of clustering algorithms for databases with large number of data items. In this method, only one database scan is needed. The method uses the buffer to store-the subset of the database at a time; the clustering is being performed on that buffer and the subclusters are found, and are then combined to the get the final clusters. Most current algorithms, aim to assign all points into the clusters, however, in this method, main interest is in identifying the most `informative points', and process them to _determine the hidden classes in the given database. This avoids the bad effect of noise and outliers. In this method, the algorithm has been designed in such a way, which guarantees to converge to global minimum. The algorithm has been designed to run on large data sets as found in web directories and bioinformatics. Though the algorithm is general and can be applied on any dataset. In this dissertation the experiment has been done on three real world gene expression datasets. The results are very promising. In all the three datasets set, good quality of functional categories of genes has been found. This information can be used to analyze the particular diseaseen_US
dc.language.isoenen_US
dc.subjectELECTRONICS AND COMPUTER ENGINEERINGen_US
dc.subjectELECTRONICS AND COMPUTER ENGINEERINGen_US
dc.subjectELECTRONICS AND COMPUTER ENGINEERINGen_US
dc.subjectELECTRONICS AND COMPUTER ENGINEERINGen_US
dc.titleCLUSTERING ALGORITHMS FOR LARGE DATABASES WITH APPLICATION TO MICROARRAY DATA ANALYSISen_US
dc.typeM.Tech Dessertationen_US
dc.accession.numberG12398en_US
Appears in Collections:MASTERS' THESES (E & C)

Files in This Item:
File Description SizeFormat 
ECDG12398.pdf5.99 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.