On discovering concept entities from web sites

Ming Yin; Dion Hoe Lian Goh; Ee Peng Lim

doi:10.1007/11424826_125

On discovering concept entities from web sites

Ming Yin^*, Dion Hoe Lian Goh, Ee Peng Lim

^*Corresponding author for this work

Nanyang Technological University

Research output: Contribution to journal › Conference article › peer-review

Abstract

A web site usually contains a large number of concept entities, each consisting of one or more web pages connected by hyperlinks. In order to discover these concept entities for more expressive web site queries and other applications, the web unit mining problem has been proposed. Web unit mining aims to determine web pages that constitute a concept entity and classify concept entities into categories. Nevertheless, the performance of an existing web unit mining algorithm, iWUM, suffers as it may create more than one web unit (incomplete web units) from a single concept entity. This paper presents a new web unit mining algorithm, kWUM, which incorporates site-specific knowledge to discover and handle incomplete web units by merging them together and assigning correct labels. Experiments show that the overall accuracy has been significantly improved.

Original language	English
Pages (from-to)	1177-1186
Number of pages	10
Journal	Lecture Notes in Computer Science
Volume	3481
Issue number	II
DOIs	https://doi.org/10.1007/11424826_125
Publication status	Published - 2005
Externally published	Yes
Event	International Conference on Computational Science and Its Applications - ICCSA 2005 - , Singapore Duration: May 9 2005 → May 12 2005

ASJC Scopus Subject Areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/11424826_125

Cite this

@article{7668318388074379bb85e7f3cfdd14b9,

title = "On discovering concept entities from web sites",

abstract = "A web site usually contains a large number of concept entities, each consisting of one or more web pages connected by hyperlinks. In order to discover these concept entities for more expressive web site queries and other applications, the web unit mining problem has been proposed. Web unit mining aims to determine web pages that constitute a concept entity and classify concept entities into categories. Nevertheless, the performance of an existing web unit mining algorithm, iWUM, suffers as it may create more than one web unit (incomplete web units) from a single concept entity. This paper presents a new web unit mining algorithm, kWUM, which incorporates site-specific knowledge to discover and handle incomplete web units by merging them together and assigning correct labels. Experiments show that the overall accuracy has been significantly improved.",

author = "Ming Yin and Goh, \{Dion Hoe Lian\} and Lim, \{Ee Peng\}",

year = "2005",

doi = "10.1007/11424826\_125",

language = "English",

volume = "3481",

pages = "1177--1186",

journal = "Lecture Notes in Computer Science",

issn = "0302-9743",

publisher = "Springer Verlag",

number = "II",

note = "International Conference on Computational Science and Its Applications - ICCSA 2005 ; Conference date: 09-05-2005 Through 12-05-2005",

}

TY - JOUR

T1 - On discovering concept entities from web sites

AU - Yin, Ming

AU - Goh, Dion Hoe Lian

AU - Lim, Ee Peng

PY - 2005

Y1 - 2005

N2 - A web site usually contains a large number of concept entities, each consisting of one or more web pages connected by hyperlinks. In order to discover these concept entities for more expressive web site queries and other applications, the web unit mining problem has been proposed. Web unit mining aims to determine web pages that constitute a concept entity and classify concept entities into categories. Nevertheless, the performance of an existing web unit mining algorithm, iWUM, suffers as it may create more than one web unit (incomplete web units) from a single concept entity. This paper presents a new web unit mining algorithm, kWUM, which incorporates site-specific knowledge to discover and handle incomplete web units by merging them together and assigning correct labels. Experiments show that the overall accuracy has been significantly improved.

AB - A web site usually contains a large number of concept entities, each consisting of one or more web pages connected by hyperlinks. In order to discover these concept entities for more expressive web site queries and other applications, the web unit mining problem has been proposed. Web unit mining aims to determine web pages that constitute a concept entity and classify concept entities into categories. Nevertheless, the performance of an existing web unit mining algorithm, iWUM, suffers as it may create more than one web unit (incomplete web units) from a single concept entity. This paper presents a new web unit mining algorithm, kWUM, which incorporates site-specific knowledge to discover and handle incomplete web units by merging them together and assigning correct labels. Experiments show that the overall accuracy has been significantly improved.

UR - http://www.scopus.com/inward/record.url?scp=24944450427&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=24944450427&partnerID=8YFLogxK

U2 - 10.1007/11424826_125

DO - 10.1007/11424826_125

M3 - Conference article

AN - SCOPUS:24944450427

SN - 0302-9743

VL - 3481

SP - 1177

EP - 1186

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

IS - II

T2 - International Conference on Computational Science and Its Applications - ICCSA 2005

Y2 - 9 May 2005 through 12 May 2005

ER -

On discovering concept entities from web sites

Abstract

ASJC Scopus Subject Areas

Access to Document

Other files and links

Fingerprint

Cite this