Open Access iconOpen Access



Educational Videos Subtitles’ Summarization Using Latent Dirichlet Allocation and Length Enhancement

Sarah S. Alrumiah*, Amal A. Al-Shargabi

Department of Information Technology, College of Computer, Qassim University, Buraydah, 51452, Saudi Arabia

* Corresponding Author: Sarah S. Alrumiah. Email: email

(This article belongs to this Special Issue: Machine Learning Applications in Medical, Finance, Education and Cyber Security)

Computers, Materials & Continua 2022, 70(3), 6205-6221.


Nowadays, people use online resources such as educational videos and courses. However, such videos and courses are mostly long and thus, summarizing them will be valuable. The video contents (visual, audio, and subtitles) could be analyzed to generate textual summaries, i.e., notes. Videos’ subtitles contain significant information. Therefore, summarizing subtitles is effective to concentrate on the necessary details. Most of the existing studies used Term Frequency–Inverse Document Frequency (TF-IDF) and Latent Semantic Analysis (LSA) models to create lectures’ summaries. This study takes another approach and applies Latent Dirichlet Allocation (LDA), which proved its effectiveness in document summarization. Specifically, the proposed LDA summarization model follows three phases. The first phase aims to prepare the subtitle file for modelling by performing some preprocessing steps, such as removing stop words. In the second phase, the LDA model is trained on subtitles to generate the keywords list used to extract important sentences. Whereas in the third phase, a summary is generated based on the keywords list. The generated summaries by LDA were lengthy; thus, a length enhancement method has been proposed. For the evaluation, the authors developed manual summaries of the existing “EDUVSUM” educational videos dataset. The authors compared the generated summaries with the manual-generated outlines using two methods, (i) Recall-Oriented Understudy for Gisting Evaluation (ROUGE) and (ii) human evaluation. The performance of LDA-based generated summaries outperforms the summaries generated by TF-IDF and LSA. Besides reducing the summaries’ length, the proposed length enhancement method did improve the summaries’ precision rates. Other domains, such as news videos, can apply the proposed method for video summarization.


Cite This Article

S. S. Alrumiah and A. A. Al-Shargabi, "Educational videos subtitles’ summarization using latent dirichlet allocation and length enhancement," Computers, Materials & Continua, vol. 70, no.3, pp. 6205–6221, 2022.


cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 2437


  • 1764


  • 0


Share Link