PRIVACY PRESERVING FREQUENT ITEMESET MINING WITH REDUCED SENSITIVE ITEMSETS FOR BIG DATA

Makkar, Himanshu

PRIVACY PRESERVING FREQUENT ITEMESET MINING WITH REDUCED SENSITIVE ITEMSETS FOR BIG DATA

Makkar, Himanshu

URI: http://localhost:8081/xmlui/handle/123456789/15200

Date: 2018-05

Abstract:

Frequent itemset mining is a field of data mining wherein we extract frequent itemsets from the dataset. This may reveal sensitive patterns. Privacy Preserving Data Mining(PPDM) approaches are used to hide sensitive information from the dataset but they also reduce the utility of the dataset. Heuristics-based PPDM approaches remove the sensitive patterns from the transactions containing them, based on some heuristics. Heuristic-based approaches are simple and take lesser computational time as compared to the border-based and exact approaches. Hence they have been given much attention by researchers for exploring better heuristics that can preserve the utility of data to a great extent. In this work, we have proposed two heuristics-based approaches- Removal of Closed Sensitive Itemsets with Maximum Support (MaxRCSI) and Removal of Closed Sensitive Itemsets with Minimum Support (MinRCSI). In these proposed approaches, sensitive itemsets are reduced to closed sensitive itemsets and sanitization process is carried over reduced closed sensitive itemsets. Experiments have been performed on real datasets as well as on benchmark dataset where the proposed approaches have resulted into the sanitized data with substantially better utility as compared to the existing approaches. But these sequential approaches are not able to cope up with the massive amount of data. The other two proposed approaches- Parallelized Removal of Closed Patterns with Minimum Support (MinPRCP) and Parallelized Removal of Closed Patterns with Maximum Support (MaxPRCP) are the parallel implementation of MinRCSI and MaxRCSI on spark parallel computing framework. These parallelized approaches are scalable enough for handling large dataset. Experiments performed using benchmark datasets shows that MinPRCP and MaxPRCP scales better as compared to MinRCSI, MaxRCSI, and other sequential approaches

Show full item record