Analysis of text analytics methods for knowledge extraction from Ukrainian-language social media

Authors

  • Oleksandr Terentiev Doctor of Technical Sciences, Associate Professor, Principal researcher, Institute of Telecommunications and Global Information Space of the National Academy of Sciences of Ukraine, Kyiv, Ukraine https://orcid.org/0000-0002-4288-1753
  • Yurii Abroskin Graduate student, Institute of Telecommunications and Global Information Space of the National Academy of Sciences of Ukraine, Kyiv, Ukraine https://orcid.org/0009-0009-9828-5596
  • Volodymyr Duda Graduate student, Institute of Telecommunications and Global Information Space of the National Academy of Sciences of Ukraine, Kyiv, Ukraine https://orcid.org/0009-0002-4278-4635
  • Tetyana Prosyankina-Zharova Doctor of Technical Sciences, Associate Professor, senior researcher, Institute of Telecommunications and Global Information Space of the National Academy of Sciences of Ukraine, Kyiv, Ukraine https://orcid.org/0000-0002-9623-8771

DOI:

https://doi.org/10.32347/2411-4049.2026.1.161-170

Keywords:

text analytics, data processing, Coherence Score, F1-score, LSA, NMF, LDA, Top2Vec, BERTopic, OSINT

Abstract

The purpose of the study is to review and systematize current text analytics and natural language processing methods for knowledge extraction from unstructured social media content, with a focus on Ukrainian-language sources.
A comparative analysis of topic modelling methods (LSA, NMF, LDA, HDP, Top2Vec, BERTopic), ontology construction approaches, OSINT data collection tools, and the F1 evaluation metric for named entity recognition tasks was conducted.
Comparative analysis of four topic modelling methods applied to real Twitter datasets demonstrated that BERTopic (coherence score 0.62) outperforms LDA (0.45) and Top2Vec (0.56) for short texts; the NER-UK 2.0 corpus provides a baseline solution for Ukrainian named entity recognition with an F1 score of 0.89.
Theoretically, the selection of methods that take into account the temporal dynamics of topics is justified. Practically, five-block pipeline architecture for knowledge extraction from Ukrainian-language social media is proposed.
The originality of the work lies in the adaptation of the Methontology-based approach to ontology generation for short unstructured Ukrainian-language texts.
Further prospects include practical implementation and validation of the proposed pipeline on real Ukrainian social media datasets.

References

Salloum, S. A., Al-Emran, M., & Shaalan, K. (2017). Mining social media text: Extracting knowledge from Facebook. International Journal of Computing and Digital Systems, 6(2), 73–81. https://www.researchgate.net/publication/314095118_Mining_Social_Media_Text_ Extracting_Knowledge_from_Facebook

Prokipchuk, O., Vysotska, V., Pukach, P., Lytvyn, V., Uhryn, D., Ushenko, Yu., & Hu, Z. (2023). Intelligent analysis of Ukrainian-language tweets for public opinion research based on NLP methods and machine learning technology. International Journal of Modern Education and Computer Science, 15(3), 70–93. https://doi.org/10.5815/ijmecs.2023.03.06

Vysotska, V., Przystupa, K., Kulikov, Yu., Chyrun, S., Ushenko, Yu., Hu, Z., & Uhryn, D. (2025). Recognizing fakes, propaganda and disinformation in Ukrainian content based on NLP and machine-learning technology. International Journal of Computer Network and Information Security, 17(1), 92–127. https://doi.org/10.5815/ijcnis.2025.01.08

Vysotska, V., Mazepa, S., Chyrun, L., Brodyak, O., Shakleina, I., & Schuchmann, V. (2022). NLP tool for extracting relevant information from criminal reports or fakes/propaganda content. Proceedings of the IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), 93–98. https://doi.org/10.1109/CSIT56902.2022.10000563

Ozyurt, B., & Akcayol, M. A. (2023). A deep learning-based sentiment analysis approach (MF-CNN-BILSTM) and topic modeling of tweets related to the Ukraine-Russia conflict. Applied Soft Computing, 143, 110404. https://doi.org/10.1016/j.asoc.2023.110404

Liao, H., Wang, C., Gu, Y., & Liu, R. (2025). A text data mining-based digital transformation opinion thematic system for online social media platforms. Systems, 13(3), 159. https://doi.org/10.3390/systems13030159

Chaplynskyi, D., & Romanyshyn, M. (2024). Introducing NER-UK 2.0: A rich corpus of named entities for Ukrainian. Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024, 23–29. https://aclanthology.org/2024.unlp-1.4

Ramamoorthy, T., Kulothungan, V., & Mappillairaju, B. (2024). Topic modeling and social network analysis approach to explore diabetes discourse on Twitter in India. Frontiers in Artificial Intelligence, 7, 1329185. https://doi.org/10.3389/frai.2024.1329185

Egger, R., & Yu, J. (2022). A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify Twitter posts. Frontiers in Sociology, 7, 886498. https://doi.org/10.3389/fsoc.2022.886498

Saha, A., & Sindhwani, V. (2012). Learning evolving and emerging topics in social media: A dynamic NMF approach with temporal regularization. Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (WSDM '12), 693–702. https://doi.org/10.1145/2124295.2124376

Newman, D., Lau, J. H., Grieser, K., & Baldwin, T. (2010). Automatic evaluation of topic coherence. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, 100–108. https://aclanthology.org/N10-1012

Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (WSDM '15), 399–408. https://doi.org/10.1145/2684822.2685324

Krishnan, A., & Kennedyraj. (2023). Exploring the power of topic modeling techniques: A comparative analysis. arXiv preprint arXiv:2308.11520. https://arxiv.org/abs/2308.11520

Maedche, A., & Staab, S. (2001). Ontology learning from text: A survey. IEEE Intelligent Systems, 16(4), 72–79. https://doi.org/10.1007/3-540-45399-7_30

Ji, S., Pan, S., Cambria, E., Marttinen, P., & Philip, S. Y. (2021). A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, 33(2), 494–514. https://doi.org/10.1109/TNNLS.2021.3070843

Tupayachi, J., Xu, H., Omitaomu, O. A., Camur, M. C., Sharmin, A., & Li, X. (2024). Towards next-generation urban decision support systems through AI-powered construction of scientific ontology using large language models. arXiv preprint arXiv:2405.19255. https://doi.org/10.48550/arXiv.2405.19255

Boutaleb, A., Picault, J., & Grosjean, G. (2024). BERTrend: Neural topic modeling for emerging trends detection. arXiv preprint arXiv:2411.05930. https://arxiv.org/abs/2411.05930

Noy, N. F., Sintek, M., Decker, S., Crubezy, M., Fergerson, R. W., & Musen, M. A. (2001). Creating semantic web contents with Protege-2000. IEEE Intelligent Systems, 16(2), 60–71. https://doi.org/10.1109/5254.920601

Christen, P., Hand, D. J., & Kirielle, N. (2023). A review of the F-measure: Its history, properties, criticism, and alternatives. ACM Computing Surveys, 56(3), 73. https://doi.org/10.1145/3606367

Mühlroth, C., & Grottke, M. (2022). Artificial intelligence in innovation: How to spot emerging trends and technologies. IEEE Transactions on Engineering Management, 69(2), 493–510. https://doi.org/10.1109/TEM.2020.2989214

Terentiev, O. M., Duda, V. O., & Abroskin, Yu. Yu. (2025). Analiz tekstovoi informatsii z metoiu klasteryzatsii ta vyiavlennia hrup ekonomichnykh novyn shchodo auktsioniv Ministerstva finansiv iz zaluchennia zovnishnoho finansuvannia [Analysis of textual information for clustering and identification of groups of economic news on Ministry of Finance auctions for external financing]. Development of Education, Science and Business: Results 2025: Proceedings of the International Scientific and Practical Internet Conference, December 18–19, 2025, 511–513. FOP Marenichenko V.V., Dnipro, Ukraine. ISBN 978-617-8293-60-4. ISSN 2664-4819. http://www.wayscience.com/wp-content/uploads/2025/12/Conference-Proceedings-December-18-19-2025.pdf (in Ukrainian)

Published

2026-04-03

How to Cite

Terentiev, O., Abroskin, Y., Duda, V., & Prosyankina-Zharova, T. (2026). Analysis of text analytics methods for knowledge extraction from Ukrainian-language social media. Environmental Safety and Natural Resources, 57(1), 161–170. https://doi.org/10.32347/2411-4049.2026.1.161-170

Issue

Section

Information technology and mathematical modeling