Transformation and Normalisation of Timber Diameters with Machine Learning and Python

Kato Samuel Namuene *

Department of Forestry and Wildlife, Faculty of Agriculture and Veterinary Medicine, University of Buea, PMB63 Buea, Cameroon.

Jorelle Nguiffo Lemofouet

University of Natural Resources and Life Sciences (BOKU), Vienna, Austria.

*Author to whom correspondence should be addressed.


Abstract

Diameter at breast height (DBH) is a central variable in forest inventory, timber volume estimation, biomass assessment and compliance evaluation under minimum cutting diameter regulations. This study examined the distributional properties of harvested tropical timber diameters from Forest Management Units in the South West Region of Cameroon using a Python-based analytical workflow. The dataset comprised 373 harvested tree records representing 24 commercial timber species. Descriptive statistics showed a mean diameter of 87.97 cm, standard deviation of 30.19 cm, median of 68.50 cm, minimum of 44.00 cm and maximum of 160.50 cm. The distribution was positively skewed and strongly platykurtic, with visual evidence of bimodality. Formal normality assessment using Shapiro-Wilk, Kolmogorov-Smirnov, Anderson-Darling and D'Agostino-Pearson tests rejected normality for the raw diameter data. Logarithmic, square-root, Box-Cox, Yeo-Johnson and square transformations reduced skewness to different degrees but did not remove the bimodal structure or achieve normality. Five normalisation approaches, including Min-Max scaling, Z-score standardisation, Robust scaling and PowerTransformer variants, rescaled the data but preserved the underlying distributional pattern. Principal Component Analysis showed that the normalised features were highly redundant, with the first component explaining 99.23% of the variance. K-Means clustering separated the diameter data into three size-related groups, whereas Isolation Forest identified 18 anomalous observations, representing 4.8% of the dataset. The findings indicate that multi-species harvested timber diameter data may violate normality assumptions and should be analysed using distribution-aware methods. The workflow provides a reproducible statistical approach for screening diameter distributions in tropical forest management datasets.

Keywords: Forest management units, tropical timber, diameter distribution, normality testing, Box-Cox transformation, Yeo-Johnson transformation, K-means clustering, principal component analysis, isolation forest, scikit-learn, South West Cameroon, Congo Basin.


How to Cite

Namuene, Kato Samuel, and Jorelle Nguiffo Lemofouet. 2026. “Transformation and Normalisation of Timber Diameters With Machine Learning and Python”. BIONATURE 46 (2):106-31. https://doi.org/10.56557/bn/2026/v46i22130.

Downloads

Download data is not yet available.