Machine Learning of Subsurface Geological Models from Sparse Site-specific Data and Prior Geological Knowledge for Underground Digital Twin


Student thesis: Doctoral Thesis

View graph of relations



Awarding Institution
Award date1 Mar 2022


The concept of smart city originated from a campaign launched by IBM in 2008 and targets to run cities in an efficient manner with the assistance of information and communications technology. Over the past decade, smart city blueprints have been published by major cities around the world, e.g., Copenhagen, Hong Kong and Hangzhou, aiming to improve the city operation efficiency. An important step for developing a smart city is establishment of a digital twin, which is a digital replica or virtual representations of two-dimensional (2D) or three-dimensional (3D) physical entities in the real world. An accurate digital twin can help streamline engineering design, construction and subsequent performance monitoring and maintenance of infrastructures. However, it is challenging for 2D & 3D modelling of subsurface stratigraphy in underground digital twin due to the insufficient site-specific measurements and a lack of efficient 2D & 3D spatial interpolation method for sparse data.

To address this challenge, a Bayesian supervised learning method is first proposed to interpolate complex subsurface stratigraphic boundaries from sparse site-specific measurements. All valuable prior knowledge of local geology is concisely represented in an ensemble training image, which can be a simple geological profile borrowed from nearby sites with similar geological settings. Soil/rock types at un-sampled locations are interpolated by combining prior geological knowledge reflected in a single training image and site-specific measurements in a supervised learning manner. The proposed method is purely data-driven and does not require specification of any governing parametric function. Subsequently, a smart sampling strategy is proposed for delineation of multi-layered slope stratigraphy and planning of geotechnical boreholes. The data-driven framework can identify locations of largest interpolation uncertainty, which gradually reduces as borehole number increases. More importantly, the optimal number and locations of boreholes required for slope stability analysis are automatically determined by the proposed method. To further improve the computational efficiency of spatial interpolation, a novel iterative convolution eXtreme Gradient Boosting model (IC-XGBoost) is developed to delineate 2D subsurface geological cross-sections. The algorithm integrates training images with the framework of convolution neural network (CNN) and uses readily available tools in image processing to adaptively extract stratigraphic connectivity of varied scales from a single training image. IC-XGBoost is also extended to build 3D subsurface geological model from limited site-specific boreholes and 2D training images, which reflect prior quasi-three-dimensional geological knowledge. The proposed 3D geological modelling method can provide best estimate of soil/rock types at unsampled locations with quantified uncertainty. For practical applications of training image-based algorithms, a qualified training image is required. In this study, a data-driven method based on edge orientation detection is proposed for selection of the optimal training image. It is demonstrated that edge orientation successfully differentiates soil/rock stratigraphic patterns between different training images, and the derived edge orientation distribution can be used as a quantitative indicator for selection of the optimal training image.

The training images-based spatial interpolation methods developed in this study are applied to a recent reclamation project in Hong Kong. It is demonstrated that the developed 3D geological modelling methods can efficiently and accurately delineate stratigraphic variations, particularly spatial distribution of interbedded fine-grained materials, with quantified uncertainty. More importantly, a unified framework accounting for both stratigraphic uncertainty and spatial variability of soil properties is proposed for assessment of reclamation-induced consolidation settlement. The framework effectively generates multiple realizations of geological cross-section and random field samples of geotechnical properties from limited measurements and offers valuable insights into spatial distribution of the estimated total primary consolidation settlement curves and angular distortion. The 2D and 3D stochastic simulation methods provide an effective visualization tool for underground planning and operation in a smart city system.