Graph-Based Extractive Text Summarization Sentence Scoring Scheme for Big Data Applications
No Thumbnail Available
Authors
Bhargav, Shir
Bhattacharya, Pronaya
Bhavsar, Madhuri
Bostani, Ali
Chowdhury, Subrata
Mehbodniya, Abolfazl
Verma, Jai Prakash
Webber, Julian
Issue Date
2023-08-22
Type
Conference Presentations/Proceedings
Language
Keywords
Alternative Title
Abstract
The recent advancements in big data and natural language processing (NLP) have necessitated proficient text mining (TM) schemes that can interpret and analyze voluminous textual data. Text summarization (TS) acts as an essential pillar within recommendation engines. Despite the prevalent use of abstractive techniques in TS, an anticipated shift towards a graph-based extractive TS (ETS) scheme is becoming apparent. The models, although simpler and less resource-intensive, are key in assessing reviews and feedback on products or services. Nonetheless, current methodologies have not fully resolved concerns surrounding complexity, adaptability, and computational demands. Thus, we propose our scheme, GETS, utilizing a graph-based model to forge connections among words and sentences through statistical procedures. The structure encompasses a post-processing stage that includes graph-based sentence clustering. Employing the Apache Spark framework, the scheme is designed for parallel execution, making it adaptable to real-world applications. For evaluation, we selected 500 documents from the WikiHow and Opinosis datasets, categorized them into five classes, and applied the recall-oriented understudying gisting evaluation (ROUGE) parameters for comparison with measures ROUGE-1, 2, and L. The results include recall scores of 0.3942, 0.0952, and 0.3436 for ROUGE-1, 2, and L, respectively (when using the clustered approach). Through a juxtaposition with existing models such as BERTEXT (with 3-gram, 4-gram) and MATCHSUM, our scheme has demonstrated notable improvements, substantiating its applicability and effectiveness in real-world scenarios.
Description
Citation
Publisher
Multidisciplinary Digital Publishing Institute (MDPI)
License
Journal
Volume
14
Issue
9