TY - EJOU AU - Alrumiah, Sarah S. AU - Al-Shargabi, Amal A. TI - Educational Videos Subtitles’ Summarization Using Latent Dirichlet Allocation and Length Enhancement T2 - Computers, Materials \& Continua PY - 2022 VL - 70 IS - 3 SN - 1546-2226 AB - Nowadays, people use online resources such as educational videos and courses. However, such videos and courses are mostly long and thus, summarizing them will be valuable. The video contents (visual, audio, and subtitles) could be analyzed to generate textual summaries, i.e., notes. Videos’ subtitles contain significant information. Therefore, summarizing subtitles is effective to concentrate on the necessary details. Most of the existing studies used Term Frequency–Inverse Document Frequency (TF-IDF) and Latent Semantic Analysis (LSA) models to create lectures’ summaries. This study takes another approach and applies Latent Dirichlet Allocation (LDA), which proved its effectiveness in document summarization. Specifically, the proposed LDA summarization model follows three phases. The first phase aims to prepare the subtitle file for modelling by performing some preprocessing steps, such as removing stop words. In the second phase, the LDA model is trained on subtitles to generate the keywords list used to extract important sentences. Whereas in the third phase, a summary is generated based on the keywords list. The generated summaries by LDA were lengthy; thus, a length enhancement method has been proposed. For the evaluation, the authors developed manual summaries of the existing “EDUVSUM” educational videos dataset. The authors compared the generated summaries with the manual-generated outlines using two methods, (i) Recall-Oriented Understudy for Gisting Evaluation (ROUGE) and (ii) human evaluation. The performance of LDA-based generated summaries outperforms the summaries generated by TF-IDF and LSA. Besides reducing the summaries’ length, the proposed length enhancement method did improve the summaries’ precision rates. Other domains, such as news videos, can apply the proposed method for video summarization. KW - Subtitle summarization; educational videos; topic modelling; LDA; extractive summarization DO - 10.32604/cmc.2022.021780