Image of SGAKE: Semantic Graph-based Automatic Keyword Extraction from Hindi Text Documents

Text

SGAKE: Semantic Graph-based Automatic Keyword Extraction from Hindi Text Documents



Automatic keyword extraction is an automated process to identify terms that best describe the subject of the document. These terms can be in the form of key terms or key phrases representing the most relevant information conveyed by the documents. Keyword extraction techniques can be Statistical based, Linguistic based, Machine Learning based, Graph-based, or Hybrid of any these. Each approach has its limitations and strengths. This paper focuses on Graph-based approaches. These approaches rely on the exploration of network properties like Degree, Structural Diversity Index, Strength, Clustering Coefficient, Neighborhood Size, Page Rank, Closeness, Betweenness, Eigenvector Centrality, Hub, and Authority Score. In the proposed approach, the graph is constructed using semantic linkages between the terms in the document. The semantic linkages between the document terms are extracted using Hindi Wordnet as a background knowledge source. Further, fourteen different graphical measures are applied to extract the keywords. The experiments are conducted on the Tourism and Health data set of the Hindi language. The results of the proposed approach are evaluated and compared with the state-of-the-art approach TextRank as well as with the Human Annotated keywords. The result shows that the closeness centrality measure produces better precision and recall as compared to other graphical measures in case of matching with human-annotated keywords while authority proved as a good graphical measure to produce keywords, matching with TextRank. The experiments prove that the proposed semantic graph-based approach performs better as compared to the state of art approach TextRank. This paper also explored the correlation between different graph-theoretic measures using different methods of correlations.


Availability

No copy data


Detail Information

Series Title
-
Call Number
-
Publisher International Journal of Computing and Digital Systems : Bahrain.,
Collation
005
Language
English
ISBN/ISSN
2210-142X
Classification
NONE
Content Type
-
Media Type
-
Carrier Type
-
Edition
-
Subject(s)
Specific Detail Info
-
Statement of Responsibility

Other Information

Accreditation
Scopus Q3

Other version/related

No other version available


File Attachment



Information


Web Online Public Access Catalog - Use the search options to find documents quickly