Topic Analysis in Political Speech Video Transcripts Using the Latent Dirichlet Allocation (LDA) Method
DOI:
https://doi.org/10.32528/justindo.v11i1.4044Keywords:
Political Speech, Video Transcript, Latent Dirichlet Allocation, Topic Modeling, Speech-to-TextAbstract
Political speeches are an important medium for conveying a country’s leader’s vision, mission, and policy directions to the public. This study aims to identify and analyze the main topics in the video transcripts of President Joko Widodo’s political speeches during the 2014–2024 period using the Latent Dirichlet Allocation (LDA) method. The data consist of 185 press conference speech videos obtained from the Indonesian Cabinet Secretariat’s YouTube channel and converted into text using speech-to-text technology. The dataset is divided into 81 videos from the 2014–2023 period as training data and 104 videos from 2024 as testing data. The analysis process includes text preprocessing, rule-based automatic labeling, LDA model training, and evaluation using coherence score and perplexity. The results show that in the training data, the topics of Infrastructure and Economy are the dominant topics, reflecting the government’s focus on physical development and economic growth. In contrast, in the 2024 testing data, Healthcare emerges as the most dominant topic, followed by the topics of Infrastructure, Economy, Education, and Technology. The Infrastructure topic consistently achieves the highest coherence score of 0.85, indicating strong semantic consistency among its constituent terms. This study contributes to understanding the temporal dynamics of political communication and demonstrates the effectiveness of LDA in analyzing political speech data derived from video transcripts.
References
Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G., Chen, J., Chen, J., Chen, Z., Chrzanowski, M., Coates, A., Diamos, G., Ding, K., Du, N., Elsen, E., … Zhu, Z. (2016). Deep speech 2: End-to-end speech recognition in English and Mandarin. 33rd International Conference on Machine Learning, ICML 2016, 1, 312–321.
Blei, David M., Andrew Y. Ng, dan Michael I. Jordan. 2003. “Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data.” The Art and Science of Analyzing Software Data 3:139–59. doi: 10.1016/B978-0-12-411519-4.00006-9.
Anggai, S., Tukiyat, Rivai, A. K., & Zain, R. M. (2024). Ekstraksi Topik dalam Dataset Menggunakan Teknik Pemodelan Topik. Jurnal Ilmu Komputer, 2, 78–96.
Bianchi, F., Terragni, S., Hovy, D., Nozza, D., & Fersini, E. (2021). Cross-lingual contextualized topic models with zero-shot learning. EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 1676–1683. https://doi.org/10.18653/v1/2021.eacl-main.143
Griffiths, Thomas L., dan Mark Steyvers. 2004. “Finding scientific topics.” Proceedings of the National Academy of Sciences of the United States of America 101(SUPPL. 1):5228–35. doi: 10.1073/pnas.0307752101.
Hannigan, T., Haans, R. F. J., Vakili, K., Tchalian, H., Glaser, V. L., Wang, M., Kaplan, S., & Jennings, P. D. (2019). Topic modeling in management research. Academy of Management Annals, 13(2), 586–632.
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., & Ng, A. Y. (2014). Deep Speech: Scaling up end-to-end speech recognition. 1–12. http://arxiv.org/abs/1412.5567
Matira, Yayang, dan Iman Setiawan. 2023. “Pemodelan Topik pada Judul Berita Online Detikcom Menggunakan Latent Dirichlet Allocation.” Estimasi: Journal of Statistics and Its Application 4(1):53–63. doi: 10.20956/ejsa.vi.24843.
Yunita, R. D., Rozikin, C., & Jajuli, M. (2022). Implementasi Metode Linear Discriminan Analysis Untuk Klasifikasi Biji Kopi. Jurnal Teknologi Informatika Dan Komputer, 8(1), 27–39. https://doi.org/10.37012/jtik.v8i1.664
Zhang, Z., Fang, M., Chen, L., & Namazi-Rad, M. R. (2022). Is Neural Topic Modelling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics. NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 3886–3893. https://doi.org/10.18653/v1/2022.naacl-main.285
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Dhea Intan Septiara, Deni Arifianto, Wiwik Suharso

This work is licensed under a Creative Commons Attribution 4.0 International License.



