DATA SANITIZATION APPROACHES FOR SENSITIVE ITEMSET HIDING

Shalini

Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/19378

Full metadata record

DC Field	Value	Language
dc.contributor.author	Shalini	-
dc.date.accessioned	2026-03-02T15:19:58Z	-
dc.date.available	2026-03-02T15:19:58Z	-
dc.date.issued	2022-11	-
dc.identifier.uri	http://localhost:8081/jspui/handle/123456789/19378	-
dc.guide	Toshniwal, Durga	en_US
dc.description.abstract	Collaborative frequent itemset mining is the process of analyzing the data shared from various business entities to extract informative patterns. A massive amount of data is collected by organizations for analysis of hidden patterns present in the dataset to understand the market trends. The analysis of global trends proves to be very helpful for managers in making various businessoriented decisions. However, data sharing brings high privacy risks because some patterns may infer business or individual-specific sensitive information. Exposure of such patterns can reveal confidential information that significantly impacts information sharing since people are becoming more reluctant to share their personal data. Therefore, enhanced privacy-preserving data mining (PPDM) techniques are ever-demanding for safe and reliable information exchange. PPDM aims to conduct data analytics without breaching data privacy collaboratively. In recent times, PPDM has evolved as an important research field. This thesis presents the work on sensitive itemsets hiding using heuristic and meta-heuristic/evolutionary-based approaches. Sensitive itemset hiding is the process of sanitizing the data so that all non-sensitive itemsets and no sensitive itemsets are extracted. PPDM provides solutions that perturb the transactions so that sensitive itemsets get masked (e.g., suppression of the confidential rule’s generating itemsets) but may accidentally hide a significant number of non-sensitive itemsets during data sanitization. The most common approaches for sensitive itemset suppression are to delete sensitive transactions or victim items, or add dummy items or transactions in the data, to make the sensitive itemsets infrequent. Sensitive pattern hiding techniques thus affect the results of data mining models. Maintaining a trade-off between data privacy and data utility is an NP-hard problem because it requires selecting such items for deletion or transactions for modification that incur minimum side effects. Evolutionary-based approaches prove to be effective for NP-hard problems. There are various algorithms proposed by researchers that use evolutionary techniques such as genetic algorithm(GA), particle swarm optimization (PSO) and ant colony optimization (ACO). The evolutionary SPH algorithms mask sensitive patterns through the deletion of sensitive transactions. Failure in the sensitive patterns masking and data loss have been the biggest challenges for such algorithms. The performance of evolutionary algorithms further gets degraded when applied to dense datasets. In our first objective, we proposed a victim item deletion based PSO inspired evolutionary algorithm named VIDPSO to sanitize the dense datasets. In the proposed algorithm, each particle of the population consists of n number of sub-particles derived from pre-calculated victim items. The proposed algorithm has a high exploration capability to search the solution space for selecting optimal transactions. Experiments conducted on real and synthetic dense datasets depict that the VIDPSO algorithm performs better vis-a-vis GA, PSO and ACO-based SPH algorithms in terms of hiding failure with minimal data loss. In our second objective, we proposed two heuristic-based algorithms to mask sensitive high utility itemsets. The economic utilities of itemsets help to evaluate the drivers behind a customer’s purchase decision. From a business perspective, a utility can be the benefit associated with selling a particular item or the usefulness or satisfaction that a customer experiences from a product. Privacy-preserving utility mining (PPUM) is a branch of PPDM that presents various algorithms which intend to hide sensitive high utility itemsets (SHUIs) and maintain a balance between maximizing the utility and privacy preservation. This study considers the identified research gaps in PPUM and proposes two SHUIs hiding algorithms, MinMax and Weighted, that differ in the victim item selection approach. Both the algorithms consider the sensitive item’s participation in sensitive itemsets and non-sensitive itemsets for choosing it as the victim item. Further, this thesis work compares three traditional transaction selection approaches for dense and sparse datasets. Experimental analysis proved that the proposed algorithms performed better than the existing algorithms. MinMax algorithm performs better than the Weighted algorithm for sparse datasets. In dense datasets, there are some cases when Weighted performs better than MinMax and vice-versa, depending upon the type of sensitive patterns and dataset under consideration. Researchers have proposed various methods to hide sensitive frequent itemsets that reduce the support count of the sensitive itemset below a minimum support threshold (MST). Our third research work addresses the question, “Is suppressing a sensitive itemset enough to hide it?” Our work shows that a suppressed sensitive itemset may behave like an outlier among its neighboring itemsets after suppression, indicating that the dataset is likely altered. KL-divergence and 2-divergence are used to calculate the difference between expected and actual probabilvi ity distributions of itemsets for observing anomalous behavior. Experimental results on four datasets show that suppressed sensitive itemsets may stand out as the most significant outlier, irrespective of the victim item selection method in many cases. We propose two heuristic-based defensive approaches that counter this attack and ensure privacy. Keywords:	en_US
dc.language.iso	en	en_US
dc.publisher	IIT Roorkee	en_US
dc.subject	Frequent Itemset, Sensitive Pattern, Data Privacy, Privacy-Preserving Data Mining, Knowledge Hiding, Evolutionary Approach, Data Sanitization, High Utility Itemsets, Privacy-Preserving Utility Mining	en_US
dc.title	DATA SANITIZATION APPROACHES FOR SENSITIVE ITEMSET HIDING	en_US
dc.type	Thesis	en_US
Appears in Collections:	DOCTORAL THESES (CSE)

Files in This Item:

File	Description	Size	Format
SHALINI 17911009.pdf		5.02 MB	Adobe PDF	View/Open

Show simple item record