Text Classification to Detect Interpretese in Bidirectional Simultaneous Interpreting: Improved TF*IDF and Stacking

Dan Feng Huang; Dennis Zhiming Tay; Andrew K.F. Cheung

doi:10.1109/ACCESS.2025.3563148

Text Classification to Detect Interpretese in Bidirectional Simultaneous Interpreting: Improved TF*IDF and Stacking

Dan Feng Huang, Dennis Zhiming Tay, Andrew K.F. Cheung^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

The study utilizes text classification (TC) to observe “interpretese”, the distinctive linguistic patterns employed by interpreters, in simultaneous interpreting (SI) at United Nations Security Council conferences. A text vectorization method known as TF*IDF is improved with Shannon’s entropy and employed to convert interpreted and non-interpreted target language speeches into vectors. Subsequently, stacking ensemble learning classifies the vectors reduced in dimensions into two labeled categories: interpreted speech and non-interpreted speech. Accurate classifications would support the interpretese hypothesis. To explore the universality of interpretese, this study detects interpretese in bidirectional SI when interpreters work from their first to second languages in one direction and from their second to first languages in the other direction. The results demonstrate successful classifications in the two interpreting directions, thereby supporting that the interpretese hypothesis. Notably, a higher classification accuracy score is yielded when the interpreters work into their first language than into their second language, suggesting interpretese is more pronounced in the former direction, and interpreting directions impacts interpreters’ language processing. Different classification algorithms vary in terms of their performances in the classification tasks, underscoring the importance of using stacking for ensemble learning to achieve reliable results and justify algorithm selection.

Original language	English
Journal	IEEE Access
DOIs	https://doi.org/10.1109/ACCESS.2025.3563148
Publication status	Accepted/In press - 2025
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

ASJC Scopus Subject Areas

General Computer Science
General Materials Science
General Engineering

Keywords

ensemble learning
entropy
interpretese
interpreting directions
text classification
TF*IDF

Access to Document

10.1109/ACCESS.2025.3563148

Cite this

@article{5a2b66e3d66e44cda718d2abe6aa684e,

title = "Text Classification to Detect Interpretese in Bidirectional Simultaneous Interpreting: Improved TF*IDF and Stacking",

abstract = "The study utilizes text classification (TC) to observe “interpretese”, the distinctive linguistic patterns employed by interpreters, in simultaneous interpreting (SI) at United Nations Security Council conferences. A text vectorization method known as TF*IDF is improved with Shannon{\textquoteright}s entropy and employed to convert interpreted and non-interpreted target language speeches into vectors. Subsequently, stacking ensemble learning classifies the vectors reduced in dimensions into two labeled categories: interpreted speech and non-interpreted speech. Accurate classifications would support the interpretese hypothesis. To explore the universality of interpretese, this study detects interpretese in bidirectional SI when interpreters work from their first to second languages in one direction and from their second to first languages in the other direction. The results demonstrate successful classifications in the two interpreting directions, thereby supporting that the interpretese hypothesis. Notably, a higher classification accuracy score is yielded when the interpreters work into their first language than into their second language, suggesting interpretese is more pronounced in the former direction, and interpreting directions impacts interpreters{\textquoteright} language processing. Different classification algorithms vary in terms of their performances in the classification tasks, underscoring the importance of using stacking for ensemble learning to achieve reliable results and justify algorithm selection.",

keywords = "ensemble learning, entropy, interpretese, interpreting directions, text classification, TF*IDF",

author = "Huang, \{Dan Feng\} and Tay, \{Dennis Zhiming\} and Cheung, \{Andrew K.F.\}",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2025",

doi = "10.1109/ACCESS.2025.3563148",

language = "English",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Text Classification to Detect Interpretese in Bidirectional Simultaneous Interpreting

T2 - Improved TF*IDF and Stacking

AU - Huang, Dan Feng

AU - Tay, Dennis Zhiming

AU - Cheung, Andrew K.F.

PY - 2025

Y1 - 2025

N2 - The study utilizes text classification (TC) to observe “interpretese”, the distinctive linguistic patterns employed by interpreters, in simultaneous interpreting (SI) at United Nations Security Council conferences. A text vectorization method known as TF*IDF is improved with Shannon’s entropy and employed to convert interpreted and non-interpreted target language speeches into vectors. Subsequently, stacking ensemble learning classifies the vectors reduced in dimensions into two labeled categories: interpreted speech and non-interpreted speech. Accurate classifications would support the interpretese hypothesis. To explore the universality of interpretese, this study detects interpretese in bidirectional SI when interpreters work from their first to second languages in one direction and from their second to first languages in the other direction. The results demonstrate successful classifications in the two interpreting directions, thereby supporting that the interpretese hypothesis. Notably, a higher classification accuracy score is yielded when the interpreters work into their first language than into their second language, suggesting interpretese is more pronounced in the former direction, and interpreting directions impacts interpreters’ language processing. Different classification algorithms vary in terms of their performances in the classification tasks, underscoring the importance of using stacking for ensemble learning to achieve reliable results and justify algorithm selection.

AB - The study utilizes text classification (TC) to observe “interpretese”, the distinctive linguistic patterns employed by interpreters, in simultaneous interpreting (SI) at United Nations Security Council conferences. A text vectorization method known as TF*IDF is improved with Shannon’s entropy and employed to convert interpreted and non-interpreted target language speeches into vectors. Subsequently, stacking ensemble learning classifies the vectors reduced in dimensions into two labeled categories: interpreted speech and non-interpreted speech. Accurate classifications would support the interpretese hypothesis. To explore the universality of interpretese, this study detects interpretese in bidirectional SI when interpreters work from their first to second languages in one direction and from their second to first languages in the other direction. The results demonstrate successful classifications in the two interpreting directions, thereby supporting that the interpretese hypothesis. Notably, a higher classification accuracy score is yielded when the interpreters work into their first language than into their second language, suggesting interpretese is more pronounced in the former direction, and interpreting directions impacts interpreters’ language processing. Different classification algorithms vary in terms of their performances in the classification tasks, underscoring the importance of using stacking for ensemble learning to achieve reliable results and justify algorithm selection.

KW - ensemble learning

KW - entropy

KW - interpretese

KW - interpreting directions

KW - text classification

KW - TFIDF

UR - http://www.scopus.com/inward/record.url?scp=105003484487&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=105003484487&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2025.3563148

DO - 10.1109/ACCESS.2025.3563148

M3 - Article

AN - SCOPUS:105003484487

SN - 2169-3536

JO - IEEE Access

JF - IEEE Access

ER -

Text Classification to Detect Interpretese in Bidirectional Simultaneous Interpreting: Improved TF*IDF and Stacking

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Fingerprint

Guangdong Polytechnic Normal University Researchers Update Current Data on Engineering (Text Classification to Detect Interpretese in Bidirectional Simultaneous Interpreting: Improved TF-IDF and Stacking)

Cite this

Text Classification to Detect Interpretese in Bidirectional Simultaneous Interpreting: Improved TF*IDF and Stacking

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Keywords

Access to Document

Other files and links

Fingerprint

Press/Media

Guangdong Polytechnic Normal University Researchers Update Current Data on Engineering (Text Classification to Detect Interpretese in Bidirectional Simultaneous Interpreting: Improved TF-IDF and Stacking)

Cite this