Graph-Based Extractive Text Summarization Sentence Scoring Scheme for Big Data Applications

No Thumbnail Available

Authors

Bhargav, Shir
Bhattacharya, Pronaya
Bhavsar, Madhuri
Bostani, Ali
Chowdhury, Subrata
Mehbodniya, Abolfazl
Verma, Jai Prakash
Webber, Julian

Issue Date

2023-08-22

Type

Conference Presentations/Proceedings

Language

Keywords

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

The recent advancements in big data and natural language processing (NLP) have necessitated proficient text mining (TM) schemes that can interpret and analyze voluminous textual data. Text summarization (TS) acts as an essential pillar within recommendation engines. Despite the prevalent use of abstractive techniques in TS, an anticipated shift towards a graph-based extractive TS (ETS) scheme is becoming apparent. The models, although simpler and less resource-intensive, are key in assessing reviews and feedback on products or services. Nonetheless, current methodologies have not fully resolved concerns surrounding complexity, adaptability, and computational demands. Thus, we propose our scheme, GETS, utilizing a graph-based model to forge connections among words and sentences through statistical procedures. The structure encompasses a post-processing stage that includes graph-based sentence clustering. Employing the Apache Spark framework, the scheme is designed for parallel execution, making it adaptable to real-world applications. For evaluation, we selected 500 documents from the WikiHow and Opinosis datasets, categorized them into five classes, and applied the recall-oriented understudying gisting evaluation (ROUGE) parameters for comparison with measures ROUGE-1, 2, and L. The results include recall scores of 0.3942, 0.0952, and 0.3436 for ROUGE-1, 2, and L, respectively (when using the clustered approach). Through a juxtaposition with existing models such as BERTEXT (with 3-gram, 4-gram) and MATCHSUM, our scheme has demonstrated notable improvements, substantiating its applicability and effectiveness in real-world scenarios.

Description

Citation

Publisher

Multidisciplinary Digital Publishing Institute (MDPI)

License

Journal

Volume

14

Issue

9

PubMed ID

ISSN

EISSN