Analysis of text analytics methods for knowledge extraction from Ukrainian-language social media
DOI:
https://doi.org/10.32347/2411-4049.2026.1.161-170Keywords:
text analytics, data processing, Coherence Score, F1-score, LSA, NMF, LDA, Top2Vec, BERTopic, OSINTAbstract
The purpose of the study is to review and systematize current text analytics and natural language processing methods for knowledge extraction from unstructured social media content, with a focus on Ukrainian-language sources.
A comparative analysis of topic modelling methods (LSA, NMF, LDA, HDP, Top2Vec, BERTopic), ontology construction approaches, OSINT data collection tools, and the F1 evaluation metric for named entity recognition tasks was conducted.
Comparative analysis of four topic modelling methods applied to real Twitter datasets demonstrated that BERTopic (coherence score 0.62) outperforms LDA (0.45) and Top2Vec (0.56) for short texts; the NER-UK 2.0 corpus provides a baseline solution for Ukrainian named entity recognition with an F1 score of 0.89.
Theoretically, the selection of methods that take into account the temporal dynamics of topics is justified. Practically, five-block pipeline architecture for knowledge extraction from Ukrainian-language social media is proposed.
The originality of the work lies in the adaptation of the Methontology-based approach to ontology generation for short unstructured Ukrainian-language texts.
Further prospects include practical implementation and validation of the proposed pipeline on real Ukrainian social media datasets.
References
Salloum, S. A., Al-Emran, M., & Shaalan, K. (2017). Mining social media text: Extracting knowledge from Facebook. International Journal of Computing and Digital Systems, 6(2), 73–81. https://www.researchgate.net/publication/314095118_Mining_Social_Media_Text_ Extracting_Knowledge_from_Facebook
Prokipchuk, O., Vysotska, V., Pukach, P., Lytvyn, V., Uhryn, D., Ushenko, Yu., & Hu, Z. (2023). Intelligent analysis of Ukrainian-language tweets for public opinion research based on NLP methods and machine learning technology. International Journal of Modern Education and Computer Science, 15(3), 70–93. https://doi.org/10.5815/ijmecs.2023.03.06
Vysotska, V., Przystupa, K., Kulikov, Yu., Chyrun, S., Ushenko, Yu., Hu, Z., & Uhryn, D. (2025). Recognizing fakes, propaganda and disinformation in Ukrainian content based on NLP and machine-learning technology. International Journal of Computer Network and Information Security, 17(1), 92–127. https://doi.org/10.5815/ijcnis.2025.01.08
Vysotska, V., Mazepa, S., Chyrun, L., Brodyak, O., Shakleina, I., & Schuchmann, V. (2022). NLP tool for extracting relevant information from criminal reports or fakes/propaganda content. Proceedings of the IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), 93–98. https://doi.org/10.1109/CSIT56902.2022.10000563
Ozyurt, B., & Akcayol, M. A. (2023). A deep learning-based sentiment analysis approach (MF-CNN-BILSTM) and topic modeling of tweets related to the Ukraine-Russia conflict. Applied Soft Computing, 143, 110404. https://doi.org/10.1016/j.asoc.2023.110404
Liao, H., Wang, C., Gu, Y., & Liu, R. (2025). A text data mining-based digital transformation opinion thematic system for online social media platforms. Systems, 13(3), 159. https://doi.org/10.3390/systems13030159
Chaplynskyi, D., & Romanyshyn, M. (2024). Introducing NER-UK 2.0: A rich corpus of named entities for Ukrainian. Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024, 23–29. https://aclanthology.org/2024.unlp-1.4
Ramamoorthy, T., Kulothungan, V., & Mappillairaju, B. (2024). Topic modeling and social network analysis approach to explore diabetes discourse on Twitter in India. Frontiers in Artificial Intelligence, 7, 1329185. https://doi.org/10.3389/frai.2024.1329185
Egger, R., & Yu, J. (2022). A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify Twitter posts. Frontiers in Sociology, 7, 886498. https://doi.org/10.3389/fsoc.2022.886498
Saha, A., & Sindhwani, V. (2012). Learning evolving and emerging topics in social media: A dynamic NMF approach with temporal regularization. Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM '12), 693–702. https://doi.org/10.1145/2124295.2124376
Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010). Automatic evaluation of topic coherence. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, 100–108. https://aclanthology.org/N10-1012
Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (WSDM '15), 399–408. https://doi.org/10.1145/2684822.2685324
Krishnan, A., & Kennedyraj. (2023). Exploring the power of topic modeling techniques: A comparative analysis. arXiv preprint arXiv:2308.11520. https://arxiv.org/abs/2308.11520
Maedche, A., & Staab, S. (2001). Ontology learning from text: A survey. IEEE Intelligent Systems, 16(4), 72–79. https://doi.org/10.1007/3-540-45399-7_30
Ji, S., Pan, S., Cambria, E., Marttinen, P., & Philip, S. Y. (2021). A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 33(2), 494–514. https://doi.org/10.1109/TNNLS.2021.3070843
Tupayachi, J., Xu, H., Omitaomu, O. A., Camur, M. C., Sharmin, A., & Li, X. (2024). Towards next-generation urban decision support systems through AI-powered construction of scientific ontology using large language models. arXiv preprint arXiv:2405.19255. https://doi.org/10.48550/arXiv.2405.19255
Boutaleb, A., Picault, J., & Grosjean, G. (2024). BERTrend: Neural topic modeling for emerging trends detection. arXiv preprint arXiv:2411.05930. https://arxiv.org/abs/2411.05930
Noy, N. F., Sintek, M., Decker, S., Crubezy, M., Fergerson, R. W., & Musen, M. A. (2001). Creating semantic web contents with Protege-2000. IEEE Intelligent Systems, 16(2), 60–71. https://doi.org/10.1109/5254.920601
Christen, P., Hand, D. J., & Kirielle, N. (2023). A review of the F-measure: Its history, properties, criticism, and alternatives. ACM Computing Surveys, 56(3), 73. https://doi.org/10.1145/3606367
Mühlroth, C., & Grottke, M. (2022). Artificial intelligence in innovation: How to spot emerging trends and technologies. IEEE Transactions on Engineering Management, 69(2), 493–510. https://doi.org/10.1109/TEM.2020.2989214
Terentiev, O. M., Duda, V. O., & Abroskin, Yu. Yu. (2025). Analiz tekstovoi informatsii z metoiu klasteryzatsii ta vyiavlennia hrup ekonomichnykh novyn shchodo auktsioniv Ministerstva finansiv iz zaluchennia zovnishnoho finansuvannia [Analysis of textual information for clustering and identification of groups of economic news on Ministry of Finance auctions for external financing]. Development of Education, Science and Business: Results 2025: Proceedings of the International Scientific and Practical Internet Conference, December 18–19, 2025, 511–513. FOP Marenichenko V.V., Dnipro, Ukraine. ISBN 978-617-8293-60-4. ISSN 2664-4819. http://www.wayscience.com/wp-content/uploads/2025/12/Conference-Proceedings-December-18-19-2025.pdf (in Ukrainian)
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 О.М. Терентьєв, Ю.Ю. Аброскін, В.О. Дуда, Т.І. Просянкіна-Жарова

This work is licensed under a Creative Commons Attribution 4.0 International License.
The journal «Environmental safety and natural resources» works under Creative Commons Attribution 4.0 International (CC BY 4.0).
The licensing policy is compatible with the overwhelming majority of open access and archiving policies.