Restoration of missing environmental data in an air quality monitoring system based on a naive Bayes classifier

Authors

  • Dmytro Shevchenko PhD, Assistant, Department of Computer Science, National University of Life and Environmental Sciences of Ukraine, Kyiv, Ukraine https://orcid.org/0009-0001-7736-8263
  • Bella Holub Candidate of Engineering Sciences, Associate Professor, Head of the Department of Computer Science, 1National University of Life and Environmental Sciences of Ukraine, Kyiv, Ukraine https://orcid.org/0000-0002-1256-6138
  • Taras Trysnyuk Candidate of Technical Sciences, Senior Researcher, Institute of Telecommunications and Global Information Space of NAS of Ukraine, Kyiv, Ukraine https://orcid.org/0000-0002-3672-8242

DOI:

https://doi.org/10.32347/2411-4049.2026.2.262-273

Keywords:

Data Mining, K-Means, data restoration, information and analytical system, intelligent technology, information technologies, data quality, data reliability and validity

Abstract

The article examines an approach to restoring missing environmental data in an air quality monitoring system based on a Naive Bayes classifier. The relevance of the study is determined by the fact that missing values in observational time series reduce the reliability of air quality index calculations, complicate the interpretation of environmental conditions, and weaken the analytical support of managerial decision-making. The study is a logical continuation of previous research in which stable and representative monitoring stations suitable for forming a high-quality training dataset were identified using cluster analysis. In contrast to approaches that use the entire set of available measurements without considering their reliability, the proposed method involves training the model only on data from selected stations characterized by higher completeness, stability, and credibility of time series.
The study forms a feature space based on concentrations of major pollutants and accompanying meteorological parameters, uses CAQI categories as the target variable, and implements a procedure for restoring missing values according to the most probable air quality class. The obtained results confirmed the acceptable quality of the constructed model: the overall classification accuracy reached 0.71, which indicates the suitability of the approach for basic air quality assessment and its further use in intelligent data restoration tasks.
The practical value of the proposed approach lies in its potential integration into environmental information and analytical systems in order to improve data completeness, enhance the quality of air quality index calculations, and provide more reliable analytical support for decision-making.

References

Shevchenko, D. V., Holub, B. L., & Borodkina, I. L. (2026). Station clustering for detecting data instability in an air quality monitoring network. Cybersecurity: Education, Science, Technique, 4(32), 1054–1064.

World Health Organization. (2021). WHO global air quality guidelines: Particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide. WHO. https://www.who.int/publications/i/item/9789240034228

European Environment Agency. (2024). Europe’s air quality status 2024 (Briefing No. 06/2024). Publications Office of the European Union. https://doi.org/10.2800/5970

World Health Organization. (2026). Air quality indexes: Key considerations and roadmaps for best practices. WHO. https://www.who.int/publications/i/item/9789289062701

U.S. Environmental Protection Agency. (2026). Air sensor performance targets and testing protocols. EPA. https://www.epa.gov/air-sensor-toolbox/air-sensor-performance-targets-and-testing-protocols

Gbangou, T., Colette, A., Soares, J., & González Ortiz, A. (2023). ETC HE report 2023/8: Long-term trends of air pollutants at European and national level 2005–2021. European Topic Centre on Human Health and the Environment. https://www.eionet.europa.eu/etcs/etc-he/products/etc-he-products/etc-he-reports/etc-he-report-2023-8-long-term-trends-of-air-pollutants-at-european-and-national-level-2005-2021

Ferrer-Cid, P., Paredes Ahumada, J. A., Allka, X., Guerrero Zapata, M., Barceló Ordinas, J. M., & García Vidal, J. (2024). A data-driven framework for air quality sensor networks. IEEE Internet of Things Magazine, 7(1), 128–134. https://doi.org/10.1109/IOTM.001.2300112

Hadeed, S. J., O’Rourke, M. K., Burgess, J. L., Harris, R. B., & Canales, R. A. (2020). Imputation methods for addressing missing data in short-term monitoring of air pollutants. Science of the Total Environment, 730, Article 139140. https://doi.org/10.1016/j.scitotenv.2020.139140

Hua, V., Nguyen, T., Dao, M.-S., Nguyen, H. D., & Nguyen, B. T. (2024). The impact of data imputation on air quality prediction problem. PLoS ONE, 19(9), Article e0306303. https://doi.org/10.1371/journal.pone.0306303

Decorte, T., Mortier, S., Lembrechts, J. J., Meysman, F. J. R., Latré, S., Mannens, E., & Verdonck, T. (2024). Missing value imputation of wireless sensor data for environmental monitoring. Sensors, 24(8), Article 2416. https://doi.org/10.3390/s24082416

Wang, Y., Liu, K., He, Y., Fu, Q., Luo, W., Li, W., Liu, X., Wang, P., & Xiao, S. (2023). Research on missing value imputation to improve the validity of air quality data evaluation on the Qinghai-Tibetan Plateau. Atmosphere, 14(12), Article 1821. https://doi.org/10.3390/atmos14121821

Lee, J., Berkelhammer, M., O’Brien, J., McNicol, G., Vincent, A. E. S., Grover, M., Packman, A. I., Kaludi, B., Cho, A., & Gonzalez-Meler, M. (2026). Imputation of urban environmental sensor data using gated attention bidirectional long short-term memory (GA-BiLSTM): Methods, performance, and implications. Environmental Monitoring and Assessment, 198, Article 262. https://doi.org/10.1007/s10661-026-15112-8

Wei, X., Meng, H., Shao, L., Fu, D., Ma, L., & Zhang, D. (2025). A decomposition-based imputation algorithm for long consecutive missing atmospheric pollution data and its application. Journal of Computational Science, 92, Article 102697. https://doi.org/10.1016/j.jocs.2025.102697

Khotimah, B. K., Miswanto, & Suprajitno, H. (2019). Modeling naïve Bayes imputation classification for missing data. IOP Conference Series: Earth and Environmental Science, 243(1), Article 012111. https://doi.org/10.1088/1755-1315/243/1/012111

Naive Bayes classifier with Scikit-learn tutorial. (n.d.). DataCamp. https://www.datacamp.com/tutorial/naive-bayes-scikit-learn

Shevchenko, D. V., & Holub, B. L. (2025). Monitorynh yakosti povitria v realnomu chasi [Air quality monitoring in real time]. Matematychni mashyny i systemy, (1), 103–112.

Trysnyuk, T., Trysnyuk, V., Okhariev, V., & Shumeiko, A. (2018). Cartographic model of Dniester River basin probable flooding. Series D: Geology and Environmental Engineering, 32(1), 51–55. https://doi.org/10.37193/SBSD.2018.1.07

Zaitsev, S., Vasylenko, V., Trysnyuk, V., & Trysnyuk, T. (2023). Adaptive method for assessing information reliability under uncertainty for 5G and IoT systems. In Proceedings of the 3rd International Workshop on Information Technologies: Theoretical and Applied Problems (CEUR Workshop Proceedings). CEUR-WS. https://ceur-ws.org/Vol-3628/paper2.pdf

Trofymchuk, O. M., Trysnyuk, V. M., Anpilova, E. S., Butenko, O. S., Vyshnyakov, V. Yu., Zagorodnya, S. A., Klymenko, V. I., Krasovska, I. G., Kreta, D. L., Myrontsov, M. L., Okharev, V. O., Popova, M. A., Radchuk, I. V., Trysnyuk, T. V., Shevyakina, N. A., & Shumeiko, V. O. (2022). Geoinformation studies of aquatic ecosystems of Ukraine: Monitoring and forecasting. Suprun V. P. Publishers.

Published

2026-06-18

How to Cite

Shevchenko, D., Holub, B., & Trysnyuk, T. (2026). Restoration of missing environmental data in an air quality monitoring system based on a naive Bayes classifier. Environmental Safety and Natural Resources, 58(2), 262–273. https://doi.org/10.32347/2411-4049.2026.2.262-273

Issue

Section

Information technology and mathematical modeling