Design and development of a concept-based multi-document summarization system for research abstracts

Shiyan Ou; Christopher Soo Guan Khoo; Dion H. Goh

doi:10.1177/0165551507084630

Design and development of a concept-based multi-document summarization system for research abstracts

Shiyan Ou^*, Christopher Soo Guan Khoo, Dion H. Goh

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

34 Citations (Scopus)

Abstract

This paper describes a new concept-based multi-document summarization system that employs discourse parsing, information extraction and information integration. Dissertation abstracts in the field of sociology were selected as sample documents for this study. The summarization process includes four major steps - (1) parsing dissertation abstracts into five standard sections; (2) extracting research concepts (often operationalized as research variables) and their relationships, the research methods used and the contextual relations from specific sections of the text; (3) integrating similar concepts and relationships across different abstracts; and (4) combining and organizing the different kinds of information using a variable-based framework, and presenting them in an interactive web-based interface. The accuracy of each summarization step was evaluated by comparing the system-generated output against human coding. The user evaluation carried out in the study indicated that the majority of subjects (70%) preferred the concept-based summaries generated using the system to the sentence-based summaries generated using traditional sentence extraction techniques.

Original language	English
Pages (from-to)	308-326
Number of pages	19
Journal	Journal of Information Science
Volume	34
Issue number	3
DOIs	https://doi.org/10.1177/0165551507084630
Publication status	Published - Jun 2008
Externally published	Yes

ASJC Scopus Subject Areas

Information Systems
Library and Information Sciences

Keywords

Discourse parsing
Information extraction
Information integration
Multi-document summarization

Access to Document

10.1177/0165551507084630

Cite this

@article{853dd4207d5247ec8612c60f1b9f6c88,

title = "Design and development of a concept-based multi-document summarization system for research abstracts",

abstract = "This paper describes a new concept-based multi-document summarization system that employs discourse parsing, information extraction and information integration. Dissertation abstracts in the field of sociology were selected as sample documents for this study. The summarization process includes four major steps - (1) parsing dissertation abstracts into five standard sections; (2) extracting research concepts (often operationalized as research variables) and their relationships, the research methods used and the contextual relations from specific sections of the text; (3) integrating similar concepts and relationships across different abstracts; and (4) combining and organizing the different kinds of information using a variable-based framework, and presenting them in an interactive web-based interface. The accuracy of each summarization step was evaluated by comparing the system-generated output against human coding. The user evaluation carried out in the study indicated that the majority of subjects (70\%) preferred the concept-based summaries generated using the system to the sentence-based summaries generated using traditional sentence extraction techniques.",

keywords = "Discourse parsing, Information extraction, Information integration, Multi-document summarization",

author = "Shiyan Ou and Khoo, \{Christopher Soo Guan\} and Goh, \{Dion H.\}",

year = "2008",

month = jun,

doi = "10.1177/0165551507084630",

language = "English",

volume = "34",

pages = "308--326",

journal = "Journal of Information Science",

issn = "0165-5515",

publisher = "SAGE Publications Ltd",

number = "3",

}

TY - JOUR

T1 - Design and development of a concept-based multi-document summarization system for research abstracts

AU - Ou, Shiyan

AU - Khoo, Christopher Soo Guan

AU - Goh, Dion H.

PY - 2008/6

Y1 - 2008/6

N2 - This paper describes a new concept-based multi-document summarization system that employs discourse parsing, information extraction and information integration. Dissertation abstracts in the field of sociology were selected as sample documents for this study. The summarization process includes four major steps - (1) parsing dissertation abstracts into five standard sections; (2) extracting research concepts (often operationalized as research variables) and their relationships, the research methods used and the contextual relations from specific sections of the text; (3) integrating similar concepts and relationships across different abstracts; and (4) combining and organizing the different kinds of information using a variable-based framework, and presenting them in an interactive web-based interface. The accuracy of each summarization step was evaluated by comparing the system-generated output against human coding. The user evaluation carried out in the study indicated that the majority of subjects (70%) preferred the concept-based summaries generated using the system to the sentence-based summaries generated using traditional sentence extraction techniques.

AB - This paper describes a new concept-based multi-document summarization system that employs discourse parsing, information extraction and information integration. Dissertation abstracts in the field of sociology were selected as sample documents for this study. The summarization process includes four major steps - (1) parsing dissertation abstracts into five standard sections; (2) extracting research concepts (often operationalized as research variables) and their relationships, the research methods used and the contextual relations from specific sections of the text; (3) integrating similar concepts and relationships across different abstracts; and (4) combining and organizing the different kinds of information using a variable-based framework, and presenting them in an interactive web-based interface. The accuracy of each summarization step was evaluated by comparing the system-generated output against human coding. The user evaluation carried out in the study indicated that the majority of subjects (70%) preferred the concept-based summaries generated using the system to the sentence-based summaries generated using traditional sentence extraction techniques.

KW - Discourse parsing

KW - Information extraction

KW - Information integration

KW - Multi-document summarization

UR - http://www.scopus.com/inward/record.url?scp=43449093478&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=43449093478&partnerID=8YFLogxK

U2 - 10.1177/0165551507084630

DO - 10.1177/0165551507084630

M3 - Article

AN - SCOPUS:43449093478

SN - 0165-5515

VL - 34

SP - 308

EP - 326

JO - Journal of Information Science

JF - Journal of Information Science

IS - 3

ER -

Design and development of a concept-based multi-document summarization system for research abstracts

Abstract

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Fingerprint

Cite this