How we deployed a simple wildlife monitoring system on Google Cloud

Author

Simone Gayed Said

Machine Learning Engineer, Artefact Benelux

Read our article on

As Artefact, we care about positively impacting people, the environment, and the community. That’s why we are committed to partnering with nonprofit organizations that make these values the basic building blocks of their vision.
Therefore, we collaborated with Smart Parks, a Dutch company that provides advanced sensor solutions to conserve endangered wildlife and efficiently manage park areas by providing cutting-edge technology.

In this series of posts, we chronicle our journey in designing and building an ML system to use Smart Parks’ camera traps media. In particular, the goal of the project is to use an ML approach to ingest the data coming from the camera traps and then provide insights, such as the presence of people or specific kinds of animals in the images or videos captured by the cameras. This information then is used by the park rangers to better protect the wildlife and sooner detect possible dangers like poachers.

Introduction

Smart Parks needed a wildlife monitoring system able to accomplish the following tasks:

Ingest the media (images and/or videos) coming from camera traps in a single place
Automatically detect the presence of humans and animals in the media
Access the predictions in Earth Rangers, an application used to manage the parks and their wildlife
Monitor the media coming from the camera traps

Our guiding principle here was one that favored speed. So, as we got started, our singular priority was to deploy a barebone but fully functioning end-to-end product as soon as possible.

This will be the first article of many and it will focus on the context of the project, the high-level view of the designed system, and the advantages of our cloud-based solution. In the upcoming ones, we will go more in-depth into how to connect camera traps to the Google Cloud Platform and external endpoints using a tool called Node-RED and how to design a simple web app using Streamlit to manage the camera traps placed in the parks.

Let’s get started!

Camera Traps

Before we jump in, let’s quickly review what camera traps are and how they can be used to support animal protection and conservation.

Camera traps are devices that have built-in sensors so that when activity is detected in front of them, a picture or a video is immediately taken. They let park rangers and wildlife biologists see our fellow species without interfering with their normal behavior.

Going around the parks and collecting information is a valid technique, but it is an expensive, labor-intensive, and people-intensive process. In addition, there is also the risk of running into dangerous wildlife or, even worse, poachers.

While different techniques to collect data come with different tradeoffs, camera traps are an excellent source. The great advantage of camera traps is that they operate continually and silently and they can record very accurate data without disturbing the photographed subject. They can be helpful both in surreptitiously monitoring possible illicit activities and in quantifying the number of different species in an area and determining their behavior and activity patterns.

Google Cloud Platform

For the camera traps media storage and management, we chose to use a cloud-based solution, more in particular, the Google Cloud Platform.

Google offers Storage solutions like Google Cloud Storage, object storage with integrated edge caching to store unstructured data, compute solutions such as Cloud Functions, Functions as a Service to run event-driven code and it also offers useful AI APIs for example:

Cloud Vision API — Image analysis service based on machine learning
Cloud Video Intelligence — Video analysis service based on machine learning

Having all these components in a single unified environment was the ideal solution for us and helped us provide a working solution in a short time.

The Workflow

First of all the media are uploaded to a Google Cloud Storage bucket, how exactly this happens will be discussed in the second article of this series. The bucket is organized in folders, one for each camera trap. Once a file is uploaded, a Google Cloud Function is immediately triggered, this function takes care of the following tasks:

Read the uploaded media
Call the Cloud Vision or the Cloud Video Intelligence API to retrieve the predictions
Archive the API responses in another Cloud Storage Bucket
Send the predictions to an endpoint outside GCP

This architecture provides multiple advantages:

Scalability: Thanks to the usage of Cloud Functions the solution is able to automatically scale based on the number of requests (i.e., the number of media uploaded in the input Cloud Storage bucket at the same time)
Cheap and durable Storage: Storing unstructured data in Google Cloud Storage is quite inexpensive (just $0.026 per GB-month for the Standard storage tier) moreover it offers best-in-class durability of objects over a given year
Automation: Using all these services together allows us to have a fully automated pipeline, no human intervention is needed. From the data ingestion to the predictions retrieval, everything runs automatically as soon as a new media is uploaded

Cloud Vision and Cloud Video Intelligence APIs

Using Machine Learning, specifically, Computer Vision, to automatically identify people and animals in images or videos has seen significant advances in recent years and nowadays it is widely considered a “game-changer” by wildlife researchers. Let’s focus more on the used APIs.

Vision API and Video Intelligence API offer powerful pre-trained machine learning models through REST and RPC APIs. The first one is meant to work with images whereas the second one, as the name suggests, with videos. Both of them are capable of automatically recognizing a vast number of objects, places, and actions.

For this project we focused mainly on these 3 features provided by the APIs:

Label detection: To have an idea of the entities (e.g. animals, people, vehicles) present in the media. Based on that it could be possible to create rules that trigger an alarm in the presence of a specific set of entities
Object detection/tracking: To have a more precise idea of the location of the detected animals/people in the media. Unlike in the label detection case, here we get also the box annotations of the detections
Faces/Person detection: To have more information about the detected people, for example, to understand their emotions or extract their clothing. This additional information could be then used to distinguish poachers from normal people

You can play with the Vision API just by uploading your image over here.

The trail ahead

The journey so far is a foundation for the exciting and impactful journey that lies ahead. With basic tooling in place in the near future, we’ll be able to create a lot of value not just for Smart Parks but also for wildlife conservation and beyond!

The next steps will involve these broad areas of work:

Model experimentation: So far we experimented with only APIs or pre-trained models, but in the future would be interesting to build a dataset of images/videos collected by the camera traps. Label it, or manually or by using the system we have just presented, and then use it to train custom Computer Vision models to achieve better accuracy
Uses cases implementation: Having already a fully automated solution will allow us to focus more on the development of targeted use cases, so really thinking about how to exploit the information retrieved to make an impact and help the rangers and all the volunteers in protecting the wildlife of the parks
Edge AI: For the moment, the execution speed of our prediction loop is satisfactory for our use case (a few minutes). We still have areas of improvement to move closer to a real-time solution. Edge AI, with a model deployed and running closer to the actual camera trap hardware, is an option that would help to avoid round trips to the cloud

In this first article, we discussed how we built our fully automated scalable pipeline in Google Cloud, enabling us to ingest media and use Machine Learning APIs to extract insights from them. It provides a solid, easy, fast-to-implement, baseline for any kind of project that involves media consumption and the use of machine learning to extract insights from them.

Thank you for reading and see you in the next articles of the series where we will explain more in detail how the presented architecture is effectively connected to the camera traps, and where we will go through the web app designed to manage them, so stay tuned!

Special thanks to Maël Deschamps for his help in reviewing this post’s content, and to Tim van Dam from Smart Parks for his support during the project. You rock!

Medium Blog by Artefact.

This article was initially published on Medium.com.
Follow us on our Medium Blog !

Read Our Article