Image of SIMILAR QUESTIONS IDENTIFICATION ON INDONESIAN LANGUAGE SUBJECTS USING MACHINE LEARNING

Text

SIMILAR QUESTIONS IDENTIFICATION ON INDONESIAN LANGUAGE SUBJECTS USING MACHINE LEARNING



Question similarity is carried out to evaluate similarities between questions in a collection of questions in the question and answer forum and on other platforms. This is done to improve the performance of the question-and-answer forum so that new questions submitted by users can be identified as similar to existing questions in the database. Currently, research related to question similarity is still being carried out on foreign language datasets. The purpose of this research is to identify the similarity of questions in a collection of questions in Indonesian. The method used is Support Vector Machine and IndoBERT. For feature extraction, we evaluate the lexical features and syntax features of each question. For lexical feature extraction, we use the cosine similarity algorithm to calculate the distance between two objects which are represented as vectors. For syntax feature extraction we use the Indonesian part of speech tagger (POS Tag). The dataset used is a collection of questions on Indonesian subjects at the primary and secondary school levels. The results of this study show that the best performance of the Support Vector Machine is obtained from the use of the cosine similarity feature with an accuracy of 85%. While the use of the POS Tag feature or the combination of POS Tag and cosine similarity causes the model to be overfitted and the accuracy decreases to 77%. Meanwhile, for the IndoBERT model, an accuracy of 95% was obtained.


Availability

No copy data


Detail Information

Series Title
-
Call Number
-
Publisher Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI) : Indonesia.,
Collation
005
Language
English
ISBN/ISSN
2089-8673
Classification
NONE
Content Type
-
Media Type
-
Carrier Type
-
Edition
-
Subject(s)
Specific Detail Info
-
Statement of Responsibility

Other Information

Accreditation
-

Other version/related

No other version available


File Attachment



Information


Web Online Public Access Catalog - Use the search options to find documents quickly