Record Detail
Advanced Search
Text
Data Mining for Non-Redundant Big Data Using Dynamic KMEAN Clustering
There is an increasing demand for techniques that can process and collect valuable information from huge data in the Big Data era. Duplicates can seriously influence data processing and data mining, so the major challenge is finding as many duplicate records as possible. Data deduplication (or Redundancy Removal) removes redundant data and stores only one copy, promoting single instance storage. The main idea suggests using K-Means clustering for big data deduplication. K-Means Clustering, a localized optimization approach, is vulnerable to the starting point chosen from the cluster’s center. The K-Means Clustering technique will produce more errors and bad cluster outcomes if the center of a defective cluster is used as the starting point. The suggested deduplication solution is based on the numeric conversion of the dataset and pre-processing them to extract useful information utilized by Dynamic K-Mean clustering (DKMEAN) to categorize replicated chunks. The proposed system greatly improves dataset quality and ultimately reduces resource consumption. It outperformed Traditional K-Means (TKMEAN) in terms of the number of detected redundant chunks, accuracy, the number of iterations, and efficiency.
Availability
No copy data
Detail Information
Series Title |
-
|
---|---|
Call Number |
-
|
Publisher | International Journal of Computing and Digital Systems : Bahrain., 2023 |
Collation |
006
|
Language |
English
|
ISBN/ISSN |
2210-142X
|
Classification |
NONE
|
Content Type |
-
|
Media Type |
-
|
---|---|
Carrier Type |
-
|
Edition |
-
|
Subject(s) | |
Specific Detail Info |
-
|
Statement of Responsibility |
-
|
Other Information
Accreditation |
Scopus Q3
|
---|
Other version/related
No other version available
File Attachment
Information
Web Online Public Access Catalog - Use the search options to find documents quickly