Record Detail

Advanced Search

Text

ClassifyWiki: An Experimental Study on Building Generic-Type Wikipedia Classifiers

Michel Naim Gerguis - Personal Name
M. Watheq El-Kharashi - Personal Name
Cherif Salama - Personal Name

This paper introduces ClassifyWiki, a framework that automatically generates Wikipedia-based text classifiers using a small set of positive training articles. ClassifyWiki aims to simplify the process of collecting hundreds or thousands of Wikipedia pages with the same entity class, using a set of positive articles with sizes possibly as small as 10 pages. The customer could define the small initial set at any different level of granularity (i.e. people, sports people, or even footballers). ClassifyWiki leveraged many previous efforts in Wikipedia entity classification in order to build a generic framework that works for any entity at any level of granularity with few examples. The framework does not only offer a set of pre-built models for many entity classes but a tool tuned through hundreds of experiments to generate models for any given set of articles. To test the framework, we manually tagged a data set of 2500 Wikipedia pages. This data set covers 808 unique entity classes on different levels of granularity. ClassifyWiki was tested over 103 different entity classes varying in size down to only 5 positive articles. On our blind set, ClassifyWiki achieved a macro-averaged f1-score of 83% with 96% precision and 74% recall using 50 or more positive articles.

Availability

No copy data

Detail Information

Series Title	-
Call Number	-
Publisher	International Journal of Computing and Digital Systems : Bahrain., 2022
Collation	005
Language	English
ISBN/ISSN	2210-142X
Classification	NONE
Content Type	-

Media Type	-
Carrier Type	-
Edition	-
Subject(s)	Text Classification ClassifyWiki Entity Classification Fine-Grained Entity Classification Wikipedia Classification
Specific Detail Info	-
Statement of Responsibility	-

Other Information

Accreditation	Scopus Q3

Other version/related

No other version available

File Attachment

ClassifyWiki: An Experimental Study on Building Generic-Type Wikipedia Classifiers

Information

Web Online Public Access Catalog - Use the search options to find documents quickly