Orthophoto segmentation for outcrop detection in the boreal forest
This experiment borrows an artificial neural network architecture originally developed in the field of biomedical imaging and applies it to the processing of orthophotos (orthorectified Earth observation images captured by drones, aircraft or satellites). In geology and in mineral exploration specifically, orthophotos are commonly used to guide the logistic planning of field work. In this post I show that the U-Net architecture can be trained to automatically map waterbodies and detect areas where the bedrock is outcropping using high-resolution orthophotos. This tool is fast and provides new insights in preparation of geological mapping traverses, rock/soil sampling campaigns, and for the logistics and interpretation of geophysical surveys. I presented this work in French during the AI tools session at Québec Mines+Énergie 2019 with help from my colleagues Véronique Bouzaglou and Lindsay Hall. The code and documentation can be found on GitHub.
Outcrops are occurrences of bedrock being exposed at the surface of the Earth. They act as control points for geological mapping, geophysical survey interpretation and sampling for geochemical analyses. In the boreal forest of the Canadian Shield, the availability of outcrops is relatively low (it is estimated to be 2% of the total surface area in this work). It is therefore important to quickly identify outcropping areas before sending teams in the field to maximize the time spent studying bedrock while minimizing the difficulty of traverses in the boreal forest.
Orthophoto analysis is a fundamental part of field work planning in mineral exploration, because it provides a high-resolution bird’s eye view of the area of interest. Figure 1 shows part of a larger orthophoto captured over the boreal forest near Red Lake (ON). In this georeferenced image, each pixel corresponds to 1 m² of the Earth’s surface. This orthophoto was captured by a drone-mounted camera equipped with four sensors: red (R), green (G), blue (B) and near infrared (NIR). It would be valuable to develop a tool that can guide geological field work by automatically highlighting where the bedrock appears exposed (move the slider in Figure 1 to reveal the outcrops).
Visual (human-based) inspection and labelling of the images can prove to be a tedious and time-consuming process, especially when the area of interest is large. There exists methods to partially automate this process. In related work, the analysis of remote sensing images often relies on the computation of spectral band ratios to help highlight certain features of the Earth’s surface. For example, the normalized difference vegetation index (NDVI) is commonly used to assess whether areas contain live vegetation. Other common examples include water and bare land indices. Detection methods based on these indices have at least two major drawback. First, the user needs to set a threshold to segment an image based on an index value. Although some rules-of-thumb and heuristics have been proposed in the remote sensing literature, the determination of these thresholds can involve a lot of trial-and-error. The ideal threshold value also depends on the area being studied, and could vary across large images. Second, the band ratio methods only consider pixels on an individual basis. Consequently they do not capture the context and texture of the image like convolutional neural networks do.
Convolutional neural networks, and particularly the encoder-decoder type, have proven to perform extremely well in image segmentation tasks. This is the case of the U-Net architecture, originally proposed by Ronneberger et al. (2015). The potential to fully automate the outcrop detection process is the main advantage of using a neural network over the band ratio technique or even a human-based visual delineation. In fact, while a human can easily outline the outcrops in a given orthophoto, labelling large areas can be time consuming. Moreover, the quality of the labelling tends to diminish with the number of consecutive hours worked. To give a sense of scale, for this work it took approximately 40 hours to label the training images, and an additional 16 to label the testing images.
In this post I will train the U-Net architecture with orthophotos from a relatively small (131 km²) area near Red Lake (ON) to automatically detect the location of bedrock outcrops and to delineate waterbodies in the boreal forest of Canada. I also answer the following three questions by further blind testing the model in the Red Lake area.
- What are the best spectral bands to use in the training images?
- What is the ideal tile size for the training images?
- Does the spatial resolution of the images affect the prediction results?
Finally, I demonstrate that the trained model can make fast predictions on a large (488 km²) unlabelled part of the boreal forest in the Baie-James area (QC), more than 1000 km away from the location of the training images.
2. Model training and validation
U-Net is an artificial neural network that encodes images into lower dimension representations and decodes the representations back into segmentation maps. The weights of this network can be optimized using gradient descent by computing a loss function between the predicted and true segmentation maps, given a large amount of training examples. The PyTorch implementation of U-Net I used in this work can be found here. I used a cross-entropy loss and the Adam optimizer to train the network. The reader is referred to the original U-Net paper for more details.
My colleagues and I labelled the 131 km² Red Lake orthophoto in QGIS by drawing shapefiles delineating any visible bedrock outcrop. The freely available Ontario Hydro Network shapefiles were also used to delineate the waterbodies. This yielded three classes for U-Net to learn: (1) outcrops, (2) waterbodies and (3) forest/other.
Large geospatial rasters require some preprocessing steps. The Red Lake orthophoto was tiled into 6145 non-overlapping 128x128 training images. During training, 1537 non-overlapping tiles were randomly kept aside to validate the model. The model weights were not updated from the validation tiles, but the loss was computed to monitor how well the model could generalize on previously unseen examples. Using an initial learning rate of 1E-03, the network learned the segmentation task in approximately 20 training epochs (Figure 2). Past the 20 epochs mark, the training loss was still improving but the validation loss was not, indicating possible overfitting.
The weights at the 20 epochs mark were used to test the model on a dedicated testing zone located immediately South of the main Red Lake orthophoto. The model’s outputs consist of membership probabilities for each class (each pixel is assigned probability values ranging from 0 to 1 of belonging to either outcrop, forest or waterbody). The probability maps for each class were composited in a single RGB image and then superimposed with the original image in QGIS. I put outcrop probability in the R channel, forest in the G channel and waterbody in the B channel for visualization purposes. An example of segmentation results from the testing area is given in Figure 3.
A second example is given in Figure 4 to give a sense of how large the orthophotos and segmentation maps actually are.
3. Further testing
Further testing was done with the baseline model to understand the influence of certain data characteristics on the capability of U-Net to learn the segmentation task. This can be quantified with the Jaccard Index, which is the Intersection over Union (IoU) of true and predicted labels (Figure 5). The following tests aim to answer the questions raised in the introduction.
What are the best spectral bands to use in the training images?
Different sensors capture data in different spectral bands. Depending on the situation, the data could be a 1-band panchromatic image, a conventional 3-band RGB image, or a 4-band RGB-NIR multispectral image. Each spectral band is a different input variable, which can be more or less informative with respect to the target variable (here the segmentation classes). With domain-specific knowledge, input variables can even be combined to generate new ones (e.g. NDVI). The user must then decide which spectral bands the U-Net should learn from. In other words, what combination of bands is the best to detect outcrops? To answer this question the model was trained and tested on several datasets consisting of various spectral band combinations. The class-specific IoU results were computed for each dataset and the results are presented in the Figure 6.
It is evident from Figure 6 that using only spectral indices such as NDVI and NDWI does not allow the model to learn the segmentation task very well. The best Jaccard indices were obtained with the RGB-NIR, grayscale and RGB images, indicating that the model could be trained with or without the NIR band, and even on monochromatic images. This is good news, because not all drone-mounted cameras are equipped with a NIR sensor. Finally, combining all indices and bands together (NDVI-NDWI-RGB-NIR) in a 6-band image did not significantly improve the outcrop detection IoU.
What is the ideal tile size for the training images?
There is a trade-off between the size of the training tiles and the amount of training examples available. The larger the tiles, the less non-overlapping tiles can be used for training. Several data sets were produced with varying tile sizes ranging from 8x8 to 1024x1024. The IoU scores obtained with this range of tile sizes are presented in Figure 7.
A maximum IoU plateau was observed for a range of tile sizes between 32x32 and 256x256. Below this range, the tiles fail to capture the texture in the image, and the various classes are not properly segmented. Above this range, there are not enough training examples for the model to learn the task.
Does the spatial resolution of the images affect the prediction results?
Drone-mounted cameras are a great tool to capture high-resolution orthophotos. However, it is not always possible to fly them in certain areas, and it would be great if satellite imagery could be used to make predictions instead. Many satellite companies offer spatial resolutions that are comparable to drone-mounted cameras, but it should be noted that the cost per km² increases rapidly with image resolution. Several testing data sets were produced by down sampling the original 1 m² resolution orthophotos to test the model on lower resolution orthophotos. The IoU obtained with each data set is shown in Figure 8.
Unsurprisingly, the original images with 1 m² resolution yielded the best segmentation scores. Outcrops being relatively small targets, the IoU for this class decreases rapidly with decreasing spatial resolution. This confirms that high-resolution pictures are necessary to perform outcrop detection, and that predictions made from orthophotos with a pixel size of 5 m² or more will be inaccurate.
4. Application to a new area
The final model was trained in accordance with the previous test results using the Red Lake orthophoto RGB channels, a tiling size of 128x128 pixels and a spatial resolution of 1 m². After training and validation, the model weights were frozen and saved, and later used to generate segmentation maps of the Baie-James area (QC). This 488 km² area can be considered new to the model because it is located approximately 1000 km east of the original training area. Additionally, the Baie-James image was captured from space by the SPOT-6 satellite sensor instead of a drone-mounted camera, and it has a spatial resolution of 1.5 m² instead of 1 m². To validate the predictions in this new area, evidences of outcrop observations in the area were compiled from the Système d’information géominière du Québec (SIGÉOM) and superimposed on the map. An example of close-up segmentation results in the Baie-James area is presented in Figure 9 and a larger scale example is given in Figure 10.
Because the SIGÉOM outcrops were recorded as points (and not outline polygons like the labels used to train the model) it is not possible to compute an IoU score. The SIGÉOM data set also does not contain every single outcrop that exists in the field, only those which have been studied by geologists: a validation score like precision is then not appropriate. Instead, a recall score was used to establish that 21% of known SIGÉOM outcrops have been recovered.
Three factors can explain the relatively low recall score in the Baie-James area. First, some outcrops may have a size that is comparable to the spatial resolution of the image, making them impossible to detect. Second, several outcrops recorded in the SIGÉOM data set could not be confirmed visually, either because they were hidden under trees or because of GPS recording errors. Third, there are significant differences in the sensors that captured the images (drone in Red Lake, satellite in Baie-James). Nevertheless, the results were judged satisfying enough to generate heat maps of bedrock outcrop likelihood in preparation for summer field work.
By adapting U-Net to large geospatial images, I showed that it is possible to build a simple bedrock outcrop detection tool for geological applications. The outcrop detector is fast: an area of 488 km² was processed in under 5 minutes using a consumer-grade GPU. This tool is designed to help the exploration geologist with:
- Providing targets for geological/structural mapping.
- Minimizing traverses in the dense boreal forest.
- Logistic planning of ground geophysical surveys.
- Interpreting geophysical surveys with insights on the state the of overburden, bedrock and waterbodies.
- Planning rock, soil, or lake sediment sampling.
Neural networks become increasingly good at performing a specific task the more training examples they are given. To improve the outcrop detector while minimizing resources spent on further data labelling, an incremental learning strategy where new predictions are QA/QC’d by an expert and fed back in the training loop could be useful. It would also be interesting to add new classes to the model (e.g. swamps, powerline clearings, roads or access trails).
Charles L. Bérubé has a Ph.D. in Mineral Engineering from Polytechnique Montréal with a specialization in applied geophysics. He has collaborated on pan-Canadian research projects aiming to characterize the footprints of mineral systems including hydrothermal gold, porphyry copper and uranium deposits. Currently at GoldSpot Discoveries Corp.