Record Detail

Advanced Search

Text

Data Mining for Non-Redundant Big Data Using Dynamic KMEAN Clustering

Saja Taha Ahmed - Personal Name

There is an increasing demand for techniques that can process and collect valuable information from huge data in the Big Data era. Duplicates can seriously influence data processing and data mining, so the major challenge is finding as many duplicate records as possible. Data deduplication (or Redundancy Removal) removes redundant data and stores only one copy, promoting single instance storage. The main idea suggests using K-Means clustering for big data deduplication. K-Means Clustering, a localized optimization approach, is vulnerable to the starting point chosen from the cluster’s center. The K-Means Clustering technique will produce more errors and bad cluster outcomes if the center of a defective cluster is used as the starting point. The suggested deduplication solution is based on the numeric conversion of the dataset and pre-processing them to extract useful information utilized by Dynamic K-Mean clustering (DKMEAN) to categorize replicated chunks. The proposed system greatly improves dataset quality and ultimately reduces resource consumption. It outperformed Traditional K-Means (TKMEAN) in terms of the number of detected redundant chunks, accuracy, the number of iterations, and efficiency.

Availability

No copy data

Detail Information

Series Title	-
Call Number	-
Publisher	International Journal of Computing and Digital Systems : Bahrain., 2023
Collation	006
Language	English
ISBN/ISSN	2210-142X
Classification	NONE
Content Type	-

Media Type	-
Carrier Type	-
Edition	-
Subject(s)	Data Mining Clustering Big Data Data Deduplication Kmean algorithm
Specific Detail Info	-
Statement of Responsibility	-

Other Information

Accreditation	Scopus Q3

Other version/related

No other version available

File Attachment

Data Mining for Non-Redundant Big Data Using Dynamic KMEAN Clustering

Information

Web Online Public Access Catalog - Use the search options to find documents quickly