Wildlife-Monitoring with UAVs – Artificial Intelligence for Automated Detection of Infrared Signatures
Translation currently in progress, please be aware that review is not finished yet.
Published in: 39th Scientific-Technical Annual Meeting of the DGPF in Vienna – Publications of the DGPF, Volume 28, 2019
Adrian F. Meyer, Natalie Lack, Denis Jordan
 All authors: University of Applied Sciences Northwestern Switzerland, Institute Geomatics, Hofackerstr. 30, CH – 4132 Muttenz
The detection of wild animals is a central monitoring instrument in ecology, hunting, forestry and agriculture. Previous methods are complex, often based only on indirect evidence and so often provide only a rough estimate of the stocks. The remote-sensing evaluation of UAV surveys over the Southern Black Forest and northwestern Switzerland carried out in this work showed that especially thermal imaging data are suitable for automation of wild animal detection. For this purpose, a modern method of artificial intelligence (Faster R-CNN) has been developed, which is able to extract by training properties characteristics of labeled animal signatures. For some species of animals (deer, goat, European bison, grazing livestock) extremely robust detection results could be achieved in the subsequent application (inferencing). The efficient implementation of the prototype allows real-time analysis of live video feeds under field conditions. With a detection rate of 92.8% per animal, or 88.6% in the classification according to species, it could be shown that the new technology has enormous potential for innovation in the future of wildlife monitoring.
For areas of application such as population management, fawn rescue and game damage prevention in ecology, hunting, forestry and agriculture, it is of crucial importance to be able to carry out the most accurate collection of wild animal populations. In conventional monitoring methods are currently mostly used, each of which has significant disadvantages (Silveira et al., 2003): Counting campaigns with visual confirmation (searching for searchlights on forest roads) are enormously labor-intensive; Camera trap analyzes only cover a small part of the landscape; Hunting and wildlife statistics are associated with a strong bias; Tracking transmitters are very accurate, but also invasive and complex in their implementation.
The Institute of Geomatics (FHNW) has been cooperating since January 2018 with the Wildlife Foundation of the Aargau Hunting Association (Stiftung Wildtiere) to develop a method for wild animal detection using UAVs (Unmanned Aerial Vehicles). It will be analyzed how far automated remote sensing offers advantages over conventional monitoring by saving time or human resources and making surveys more accurate and complete (Gonzalez et al., 2016). Central questions which this study should answer are the choice of sensors and carrier systems, the general visibility of animal signatures on infrared aerial images (eg the robustness against shadows in the mixed forest), and the structure of a high – performance algorithm for the automated detection and classification of the Wildlife individuals. One result of this analysis is a prototype designed to enable automated animal detection on aerial image data.
2.1 Data collection
In the spring of 2018, 27 aerial surveys were conducted on seven natural game enclosures with native species in northwestern Switzerland and the southern Black Forest. For each enclosure, approximately 500 RGB images, 500 NIR multispectral images and 5000 TIR thermal images (radiometric thermograms) were generated using the senseFly Albris multicopter or the fixed-wing UAV senseFly eBee to facilitate a technology comparison (see Fig. 1). The recording time (February / March) was chosen so that the heat contrast between carcass and mostly wooded environment would be as high as possible. At the same time, the foliage-free vegetation should minimize shading.
Fig. 1: Left: Used senseFly aircraft “eBee” (above) and “Albris” (below). Right: Typical trajectory with the eBee (blue) over an animal park (green) with the trigger positions for aerial photos (white). (Visualizations: Gillins et al., 2018; Google 2018; senseFly 2018)
With the Fixed Wing large areas can be easily detected with interchangeable sensors (RGB, NIR, TIR), including a high resolution thermal camera ( ThermoMap, 640x512Px, max 22ha at 15cm / Px GSD and 100m AGL). Although the Multicopter can fly much more flexibly and more deeply due to its hoverability, the permanently installed thermal camera has a much lower resolution (80x60Px). The loud rotor noise with a low trajectory also represents a much stronger interference with animal behavior compared to the fixed wing.
The very high resolution RGB and NIR images (~ 3cm / Px GSD) are well-suited for orthophotomosaic mapping, but often lack sufficient contrast for the visual recognition of animal signatures under foliage-free vegetation. In the further course of the study, this could also be verified by terrestrial hyperspectral reference measurements (λ: 350-1000 nm) on forest soil, vegetation and animal carcasses.
The thermograms, on the other hand, show high-contrast signatures of individual wild animals (Fig. 2). At the same time, the images are hardly suitable for photogrammetric bundle block balancing, as the animals usually move too much between two shots. In the relevant image areas, this does not result in sufficient overlay fidelity, so that processed TIR orthophotomosaics of contiguous habitats often contain no visible signatures. Therefore, for automated analysis, the thermograms were either processed directly as non-oriented raw data or individually orthorectified by DSM projection.
3.1 Shape of the Thermal Animal Signatures
Visible changes in the appearance of the signatures were first examined systematically by varying the reference parameters. Thus, a shallower recording perspective supports animal identification by a human observer (Figure 2, left): features such as head-torso ratio or extremities are more prominent. The delimitation of the individuals from each other, however, is supported by a steeper perspective.
Although dense branches may reduce the contrast of the signature due to convection heat distribution and shielding in mixed forest that is as foliage-free as possible. However, the form, scope and basic visibility of the signatures are largely retained (Fig. 2, right).
Fig. 2: Thermograms with the signatures of a fallow deer pack (six animals, blue 4°C, red 10°C). Left: Comparing signatures from six different angles.
Middle / Right: Signatures next to and below a foliage-free ash in comparison.
3.2 Strategies for Automated Signature Detection
Several strategies for the automated detection of signatures were implemented iteratively and checked for their classification accuracy and applicability. The classic remote-sensing approach of classifying thermograms into, for example, Erda’s Imagine Objective using object-based image analysis was rejected. Due to the variety of the signatures, this method could not find a feature-describing set of variables that would reach a detection precision of over 41%.Convolutional Neural Networks (CNN), on the other hand, have demonstrated exceptional robustness in image classification through automatic feature extraction in recent years (Szegedy et al., 2016). Sections 3.3 and 3.4 describe two CNN approaches that achieve precise animal detection in different ways.
3.3 Raster Segment Classification with dichotomous CNN
A dichotomous (“two-way decision”) CNN with a depth of 7 neuron layers
(Figure 3 center) was built with Keras and Tensorflow under Python 3.6 . It classifies raster segments of orthorectified thermograms by inferencing into the classes “animal” and “non-animal”. The input layer is a 64x64Px matrix, which corresponds to the maximum possible GSD geoprocessed 5x5m segments (Figure 3 links). After about 3 hours of training on desktophardware, a high degree of classification accuracy of approx. 90% can be achieved for a specific aerial survey (Fig. 3, right). The pre-processing of the thermal data (3D projection on DSM, orthophoto generation, geoprocessing), however, is very time-consuming and computationally intensive and thus can be classified under field conditions as impractical to automate. In the case of time-critical applications such as fawn rescue, classification results must, at best, already be available during the flight. Inferencing to live raw data would not be subject to these limitations. Due to the raw data resolution of 640x512Px, this approach provides the UAV operator
However, due to the 64x64Px input resolution, only a coarse 10 × 8 detection grid was used in the practical application.
Fig. 3: Left: Approx. 10’000 5x5m footprints as input tiles, generated from 45 orthorectified thermograms. Middle: Scheme of the dichotomous neural network, neuronal layers in the purple marker. Right: Classification – 71 tiles Green: “Animal”; Rest Red: “non-animal”.
3.4 Object recognition by means of R-CNN
For live raw data interpretation, Faster Region-based Convolutional Neural Networks (Faster R-CNN) are better suited. Models of this class can classify objects on higher-resolution overall images by locating regions of interest (RoI) through iterative region proposals . Also, different classes can be trained and recognized at the same time.
An Inception v2 network is used (see Fig. 4), which mimics the structure of the pyramidal cells in the visual cortex of vertebrates with a depth of 42 neuron layers. By pre-training with 100’000 everyday images (so-called COCO dataset ), the edge weights between the neuron layers in the specific training can be adapted faster and more efficiently to new goals for setting the bounding boxes . Even with partially limited hardware requirements, the model is still considered fast and precise (Szegedy et al., 2016).
The implementation was carried out using the Tensorflow Object Detection Library with the support of the Nvidia CUDA / cuDNN deep learning framework to parallelize the GPU shader cores. For the training, a test dataset of approx. 600 thermal images with approx. 8’000 animal signatures was manually marked by drawing approx. 1’800 bounding boxes. After about 12 hours of training (about 100’000 steps), the approx. 50 Mbyte Frozen Inference Graph was exported. A high-performance Python-based prototype applies this knowledge scheme to new thermal data via inferencing.
Fig. 4: Schematic structure of the constructed R-CNN (subschema “Inception v2” from Alemi, 2016)
In comparison, object recognition using R-CNN proved to be the superior approach because of the ability to use raw data and train multiple classes simultaneously. This architecture was therefore used in the prototype implementation.
If the network is only trained for the general detection of animals (Fig. 5-A), then inferencing achieves an enormously high detection rate of 92.8% per animal. The analyzes are also relatively robust in the general detection compared to quality losses in the input data (eg motion blur and shadows), as the animal signatures are usually recognized correctly in at least one frame and can be used for the detection.
Fig. 5: Evaluation of the detection results on video feeds simulating a live overflight. Left: Table of count statistics (* Class of fallow deer – Damwild – contains fallow deer, sika deer and Axis deer).The proportion of false-negative detections is not listed because it corresponds to the reciprocal of the detectability. Right: Examples by inference of calculated bounding boxes.
Precision is slightly lower in the animal species classification (Figure 5-BCD), but far exceeds the success rates of conventional detection methods for the fallow deer , red deer and goat species . European bison and Scottish highland cattle also achieve very high precision (detectability> 90% per animal,> 80% per frame with n≈150 trained bounding boxes).However, these values are not comparable with those of wild animals in semi-natural mixed-forest enclosures, since only data in open pasture landscape was available for both training and inferencing (see pasture in Fig. 5-A, mixed forest in Fig. 5 BCD). The wild boars, humans and smaller mammals classes did not achieve sufficient classification precision due to a low number of training data (n <60).
5 Discussion and Outlook
Combining UAV-based infrared thermography with state-of-the-art deep learning techniques indicates the potential to increase efficiency and quality in population estimation. The current standard – a laborious, labor-intensive process of searching for headlights, in which many kilometers of forest roads are traversed to map only a small, unknown proportion of animals – could be complemented with modern methods of pattern recognition. The implemented prototype achieves an inferencing performance of about 8 FPS on mobile hardware (2016 consumer-grade laptop). This makes the system so efficient that it can be applied to a live live video feed during the flight. These promising results thus show that a replacement of the classical methods of detection for certain areas in the future is conceivable.
Another important application is the fawn recovery. Fawns hiding in meadows often fall victim to combine harvesters due to their pushing reflex. If thermal UAVs are used today, however, the process is still largely manual. In addition, the training of pilots for signature recognition is complex. The presented software automation can make UAV-based fawn rescue much more available in the future.
When it comes to game damage prevention, the focus is usually on wild boar rots, which invisibly haunt a retreat in arable crops from the outside. The location of the animals is possible with this technology even before the emergence of a major damage. Both for the game damage prevention, as well as for the Rehkitzrettung it requires for operational use additional training data. Once these have been surveyed and marked, the existing deep learning network can be further trained by means of fine tuning and built on the knowledge already acquired.
Due to the rapid progress in UAV technology, it is quite conceivable that smaller multicopters will soon be able to fly quieter and thus cause less disruption of animal behavior. With lower altitudes and stronger sensors, these would be able to generate even better thermograms, which in turn facilitates signature classification for the neural network. It would be conceivable to identify further individual characteristics of the species already analyzed, such as age and gender, or to extend the analysis to smaller species such as badgers, hares and foxes, as well as rare species such as lynxes and wolves.
Alemi, A., 2016: Improving Inception and Image Classification in Tensorflow. Google AI Blog.
Google, 2018: Google Earth Pro 7.3.1, Aerial Texture: GeoBasis-DF / BKG, 2017-08-07.
Gonzalez, L., Montes, G., Puig, E., Johnson, S., Mengersen, K., Gaston, K., 2016: Unmanned Aerial Vehicles (UAVs) and Artificial Intelligence Revolutionizing Wildlife Monitoring and Conservation. Sensor 16 (1), Item 97.
Gillins, D., Parrish, C., Gillins, M., H. Simpson, C., 2018: Eyes in the Sky: Bridge Inspections with Unmanned Areal Vehicles. Oregon Dept. of Transportation, SPR 787 Final Report.
SenseFly, 2018: https://www.sensefly.com/drone/bee-mapping-drone/ (6.5.2018).
Silveira, L., Jacomo, A. & Diniz-Filho, J., 2003: Camera trap, line transect census and track surveys: a comparative evaluation. Biological Conservation 114 (3), 351-335.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z., 2016: Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, 2818-2826.