Back to articles
1 May 2020 | Geospatial solutions

Can we automate the classification of camera trap data using machine learning?

Vishwas S. Chitale, Tanuja Shrestha, Mir Abdul Matin & Samuel Thomas

11 mins Read

70% Complete
A barking deer captured on camera at the Knowledge Park in Godavari. Credit: ICIMOD

Tanuja Shrestha, Mir Abdul Matin, Vishwas S. Chitale, Samuel Thomas

The short answer to the titular question is yes. Previous studies have used machine-learning models to classify images from camera traps and speed up the process of extracting information, which is useful in ecological analyses, including tackling issues like human-wildlife conflict (Singh, Carpio, & Jukan, 2018; Tabak et al., 2019; Yousif, Yuan, Kays, & He, 2019). This raises the obvious question, “Why? Are humans not able to do this?” Again, the answer to this question is: yes, they can. However, machines will outperform us in the long run because of the large number of images captured on camera traps in long-term monitoring efforts.

Camera traps are generally programmed to capture images at a certain interval or when there is movement in front of the camera – to capture the presence of animals. Let us assume that 1,000 camera traps are deployed at major biodiversity hotspots in a country. Can you imagine the volume of images generated in five years? A person can manually navigate through the first 1,000 images and identify the species and where and when they were recorded. But as the number of images increases significantly over time, the scale of the task becomes enormous and sorting images even from a single camera trap becomes challenging, let alone images from multiple camera traps. In the Serengeti, researchers have used machine-learning models to classify 48 animal species from 3.2 million images from the Snapshot Serengeti Dataset with considerable accuracy. This would have taken thousands of crowdsourced human volunteers to manually classify, and using machine-learning models saved approximately more than 8.4 years of manual effort (Norouzzadeh et al., 2018).

Furthermore, in a study by Singh, Carpio, and Jukan (2018), automating the classification of camera trap data is scaled up as a solution to animal-human cohabitation in conflict zones. Researchers used an image classification model to generate an alert signal of animal crossings for passing vehicles and humans with the help of wireless sensor networks. Additionally, the image classification task aided in predicting time-varying data traffic and contributed to dynamic allocation of bandwidth (rate of data transfer) at various WiFi access networks. The whole process was found to be efficient in reducing the average time taken to process the data and generate an alert signal for vehicles/humans.

Natural resources in the HKH region
Fig 1: Natural resources in the HKH region (Source: Wester et al. 2019)


The Hindu Kush Himalaya (HKH) is a gigantic data hub for ecological, physical, and hydrological data, among others. The region covers approximately 4.2 million km2 and includes parts or all of four global biodiversity hotspots and 330 important bird and biodiversity areas. The resources and ecosystem services from the HKH indirectly benefit an estimated 35% of the world’s population. Some 1.9 billion people living in the hills and mountains and in the river valleys downstream depend on the HKH for water, food, and energy. While the region is known for its protected area network and rich biological diversity, it is also affected by human-wildlife conflict.

A study by Chettri and Gurung (2017) reported human-wildlife conflict in the Kanchenjunga Landscape, which is part of the Himalaya biodiversity hotspot spanning across Nepal, Bhutan, and India. The study highlighted issues like human, livestock, and wildlife injury and death; destruction of crops and other infrastructure by animals; and degradation of important habitats. Some major species of mammals reported to be involved in crop depredation are barking deer, elephant, porcupine, rabbit, sambar deer, sloth bear, wild pig, black bear, civet, hare, jungle rat, macaques, and squirrel. Similarly, among birds are Eurasian jay and white-throated laughing thrush (Chettri and Gurung, 2017). There is a need to understand the behavioural and movement patterns of the species to develop strategies to manage human-wildlife conflict.

white-throated laughingthrush
Fig 2: White-throated laughingthrush (Source: www.clementfrancis.com)


How do machines identify animals?

The simplest explanation of machine learning is that it is the process of making machines capable of doing what humans have been doing – learning from data. In the past, we were seeking answers directly from machines or computers, and now, we are training machines to come up with a set of rules by giving them data and answers so that they can generalize answers from new data. Technically, this is conducted by training machine-learning models, which are designed to solve case-specific problems, such as image classification, object detection, natural language processing, and speech and voice recognition, among other things.

 A survey of wildlife at the ICIMOD Knowledge Park in Godavari

ICIMOD established the Godavari Knowledge Park as a demonstration and training centre in 1993. The park is located around 15 km south of Kathmandu, Nepal, in the Pulchowki Watershed Area. It covers approximately 30 hectares with the Godavari Kunda Community Forest to its northeast and the Diyale Community Forest to its southwest. The elevation in the park ranges from 1,510 to 1,780 masl and the slope from 0 to 60 degrees. The climate is warm-temperate and subtropical with a mean annual temperature of 17.2°C. Most of the precipitation is received during the monsoon months with a mean annual rainfall of 2,000 mm. The vegetation in the Park is dominated by mixed deciduous and evergreen broadleaf species mostly naturally regenerated. Natural forests occur on steep slopes, shrub lands on mixed slopes, and shrubs and bushes on the valley floor (Karki & Udas, 2016).

Land use and land cover of the Godavari Landscape
Fig 3: Land use and land cover of the Godavari Landscape

Six camera traps were deployed at various locations in the park to document wildlife presence and movement. The animals captured so far are wild boar, barking deer, Himalayan or masked palm civet, large Indian civet, yellow-throated marten, rhesus macaque, black-naped hare, leopard cat, jungle cat, and common leopard. Species that matched with ones found in Kanchenjunga Landscape are barking deer, wild boar, civet, and macaque.

Therefore, against the backdrop of the aforementioned literature on machine learning and the human-wildlife conflict in the region, we decided to conduct a pilot study to automate the identification of animals from images captured at the Godavari site. Only images from the first round of camera trapping were used for this study. We ran image classification models using deep learning provided by Keras Neural Network Library – Convolutional Neural Network (CNN) – to classify two categories of data sets, first – ‘Wild Boar’ and ‘Deer’, and second – ‘Wild Boar’ and ‘Others’ – out of ten animal species captured on the cameras. The CNN, to describe it simply, is a machine-learning algorithm used by computers to recognize patterns and features, especially from images. Based on these features, models learn what the image is and generate an output. In our case, the output is a label for an image – which is either ‘Wild Boar’ or ‘Deer’ or ‘Others’. Transfer learning with VGG16 image classification model architecture was implemented by replacing the fully connected layer trained on Imagenet data. Tensorflow Machine Learning Library was used for the task.

How does machine learning process the data?

First, the data set is divided for training and testing purposes, with 80% allocated for training and the rest for testing. Then, we train the models, i.e. we provide labels of images with training data set so that models learn features and patterns from the training data set along with labels (answers). Simultaneously, we also watch how the model is doing on testing data where labels (answers) are not provided. And, when models perform well for both data sets, we stop training and accept the models.

Table 1: The model’s learning curves for two data sets
Wild Boar and Deer Wild Boar and Others

Error (loss) and classification accuracy for ‘Wild Boar and Deer’ data set
Fig 4: Error (loss) and classification accuracy for ‘Wild Boar and Deer’ data set

Error (loss) and classification accuracy for ‘Wild Boar and Others’ data set
Fig 5: Error (loss) and classification accuracy for ‘Wild Boar and Others’ data set


A model’s learning curve gives us the best idea on how well the model is doing. It comprises, basically, loss (error) and classification accuracy metrics for both training and testing data. The reason for using training data is to test whether the model performs satisfactorily with both data sets or not – the data set with which it is trained and the data set which it is supposed to generalize. As the training progresses – i.e. the number of iterations (epoch) increases, the loss decreases and accuracy increases, provided the model’s complexity is fit for a given data set. The training continues until the model converges, i.e. when there is no further reduction in the loss. We can see the lines flattening in the above curves after a certain iteration, which denotes the model’s convergence. If the gap between training and testing curves is large, it is generally understood that the model is not performing well on the unseen data (test data). The good news is that we were able to achieve 90% accuracy on our ‘Wild Boar and Deer’ test data set, and around 80% in the case of ‘Wild Boar and Others’ test data set (Figures 4 and 5).

Looking at the curves in this instance, there might be a question: “Why is the yellow line in Figure 2 not doing as well as in Figure 1?” For now, the best explanation we can provide is that the second data set with ‘Others’ category entailed images from all other remaining animal species except wild boar during training, unlike only two animal species in the first data set. Additionally, the number of images in the ‘Others’ category was also less compared with the first data set because of which the accuracy would have been lower compared to the model trained on the first data set. Another plausible argument for this anomaly could be the imbalanced number of images for each animal species in the second data set that could have increased ‘noise’ and made learning challenging to the model. We plan to tackle this unknown in our future work by increasing the number of images, fine-tuning more parameters, and conducting more controlled experiments.


Data cleaning and producing quality training data was challenging given the nature of data. The coverage of animals in image frames was not uniform even though the number of image frames was large. Also, irrelevant features for classification in images, like leaves, branches, and tree trunks affected the overall classification task as the model learned more unwanted features than useful features in classification.

Model is classifying the deer as ‘Boar’ with label 0


Model is classifying the deer as ‘Boar’ with label 0

Fig 6: Model is classifying the deer as ‘Boar’ with label 0

Boar images used for training

Fig 7: Boar images used for training


For example, the tree trunk present in the above images (Fig 7) could be responsible for the model’s output as boar since a considerable number of training boar images had tree trunks in them. As a solution, we plan to conduct a two-step processing in our future studies – implement an object detection model at first so that only the area of image with animals is extracted, and then implement the image classification model to classify the species as shown in Figure 8.

Bounding box acquired from object detection model for Boar Bounding box acquired from object detection model for deer
Fig 8: Bounding box acquired from object detection model for Boar and Deer


Future plan

We plan to run multi-class image classification models to identify all animal species found as we collect more data in the future. From there, we will pursue other possible avenues of AI and wildlife research that will take us closer to addressing issues of human-wildlife conflict in addition to automating classification of animal species from camera traps, one of them being real time monitoring and classification of several animal species. We strongly hope that this pilot will set the tone for machine learning studies in the region, especifically for wildlife surveys and monitoring.


Machine learning is sure to revolutionize the way wildlife monitoring activities are carried out in future. In the HKH region, most wildlife monitoring efforts are limited to protected areas and their periphery. There is great scope to scale up and locate more such efforts in community institutions, such as community forest user groups, and local governments to generate a more complete picture of the presence and movement of wildlife in the region. Machine learning can help us to quickly and efficiently analyse the vast amount of data that such an exercise will generate. It can also help us analyse crowdsourced images to fill data gaps in areas that are not being systematically monitored. All in all, data-driven approaches like machine learning and artificial intelligence can be a significant tool for achieving better and faster results in areas of ecological research and for managing human-wildlife interactions and conflict.


Chettri, N., & Gurung, J. (2017). Human Wildlife Conflict in the Hindu Kush Himalayas. Retrieved March 17, 2020, from https://www.researchgate.net/publication/320977346_Human_Wildlife_Conflict_in_the_Hindu_Kush_Himalayas_Regional_perspectives

Karki, S., & Udas, E. (2016). Assessment of Forest Carbon Stock and Carbon Sequestration Rates at the ICIMOD Knowledge Park at Godavari. (November).

Norouzzadeh, M. S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M. S., Packer, C., & Clune, J. (2018). Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences of the United States of America, 115(25), E5716–E5725. https://doi.org/10.1073/pnas.1719367115

Singh, S. K., Carpio, F., & Jukan, A. (2018). Improving animal-human cohabitation with machine learning in fiber-wireless networks. Journal of Sensor and Actuator Networks, 7(3), 4–13. https://doi.org/10.3390/jsan7030035

Tabak, M. A., Norouzzadeh, M. S., Wolfson, D. W., Sweeney, S. J., Vercauteren, K. C., Snow, N. P., … Miller, R. S. (2019). Machine learning to classify animal species in camera trap images: Applications in ecology. Methods in Ecology and Evolution, 10(4), 585–590. https://doi.org/10.1111/2041-210X.13120

Wester, Philippus & Mishra, Arabinda & Mukherji, Aditi & Shrestha, Arun. (2019). The Hindu Kush Himalaya Assessment: Mountains, Climate Change, Sustainability and People. 10.1007/978-3-319-92288-1.

Yousif, H., Yuan, J., Kays, R., & He, Z. (2019). Animal Scanner: Software for classifying humans, animals, and empty frames in camera trap images. Ecology and Evolution, 9(4), 1578–1589. https://doi.org/10.1002/ece3.4747


(This article was written by Tanuja Shrestha, Mir Abdul Matin, Vishwas S. Chitale, Samuel Thomas)

Stay current

Stay up to date on what’s happening around the HKH with our most recent publications and find out how you can help by subscribing to our mailing list.