AI Case Study
The U.S. National Oceanic and Atmospheric Administration acoustically detects humpback whales using a convolutional neural network
The U.S. National Oceanic and Atmospheric Administration (NOAA) in collaboration with Google, as part of the latter's AI for Social Good program, have developed algorithms to analyse underwater recordings and identify humpback whale calls. Using 15 years worth of data from underwater recordings in the Pacific, a deep neural network was trained to automatically identify the whales' 'songs'. The information gathered can provide scientists with information on humpback whale presence, seasonality, daily calling behavior, and population structure. The ultimate goal is to be able to possibly protect the endangered - but recovering - species.
Public And Social Sector
"To better protect those animals, the first step is to know where they are and when, so that we can mitigate the risks they face—whether that's putting the right marine protected areas in place or giving warnings to vessels. Since most whales and dolphins spend very little time at the surface of the water, visually finding and counting them is very difficult. This is why NOAA’s Pacific Islands Fisheries Science Center, responsible for monitoring populations of whales and other marine mammals in U.S. Pacific waters, relies instead on listening using underwater audio recorders.
NOAA has been using High-frequency Acoustic Recording Packages (HARPs) to record underwater audio at 12 different sites in the Pacific Ocean, some starting as early as 2005. They have accumulated over 170,000 hours of underwater audio recordings. It would take over 19 years for someone to listen to all of it, working 24 hours a day!
To help tackle this problem, we teamed up with NOAA to train a deep neural network that automatically identifies which whale species are calling in these very long underwater recordings, starting with humpback whales. The effort fits into our AI for Social Good program, applying the latest in machine learning to the world’s biggest social, humanitarian and environmental challenges.
The problem of picking out humpback whale songs underwater is particularly difficult to solve for several reasons. Underwater noise conditions can vary: for example, the presence of rain or boat noises can confuse a machine learning model. The distance between a recorder and the whales can cause the calls to be very faint. Finally, humpback whale calls are particularly difficult to classify because they are not stereotyped like blue or fin whale calls—instead, humpbacks produce complex songs and a variety of vocalizations that change over time.
We decided to leverage Google’s existing work on large-scale sound classification and train a humpback whale classifier on NOAA’s partially annotated underwater data set. We started by turning the underwater audio data into a visual representation of the sound called a spectrogram, and then showed our algorithm many example spectrograms that were labeled with the correct species name. The more examples we can show it, the better our algorithm gets at automatically identifying those sounds."
"In the future, we plan to use our classifier to help NOAA better understand humpback whales by identifying changes in breeding location or migration paths, changes in relative abundance (which can be related to human activity), changes in song over the years and differences in song between populations. This could also help directly protect whales by advising vessels to modify their routes when a lot of whales are present in a certain area. Such work is already being done for right whales, which are easier to monitor because of their relatively simple sounds."
"Supervised Learning: Optimizing an Image Model for Humpback Detection
We made the common choice of treating audio event detection as an image classification problem, where the image is a spectrogram — a histogram of sound power plotted on time-frequency axes.
This is a good representation for an image classifier, whose goal is to discriminate, because the different spectra (frequency decompositions) and time variations thereof (which are characteristic of distinct sound types) are represented in the spectrogram as visually dissimilar patterns. For the image model itself, we used ResNet-50, a convolutional neural network architecture typically used for image classification that has shown success at classifying non-speech audio. This is a supervised learning setup, where only manually labeled data could be used for training (0.2% of the entire dataset — in the next section, we describe an approach that makes use of the unlabeled data.)
To learn the distance function, we used a method described in "Unsupervised Learning of Semantic Audio Representations", based on the idea that closeness in time is related to closeness in meaning. It randomly samples triplets, where each triplet is defined to consist of an anchor, a positive, and a negative. The positive and the anchor are sampled so that they start around the same time. An example of a triplet in our application would be a humpback unit (anchor), a probable repeat of the same unit by the same whale (positive) and background noise from some other month (negative). Passing the 3 samples through the ResNet (with tied weights) represents them as 3 vectors. Minimizing a loss that forces the anchor-negative distance to exceed the anchor-positive distance by a margin learns a distance function faithful to semantic similarity." (ai.googleblog.com)
"Like us, whales sing. But unlike us, their songs can travel hundreds of miles underwater. Those songs potentially help them find a partner, communicate and migrate around the world. But what if we could use these songs and machine learning to better protect them?
Despite decades of being protected against whaling, 15 species of whales are still listed under the Endangered Species Act. Even species that are successfully recovering—such as humpback whales—suffer from threats like entanglement in fishing gear and collisions with vessels, which are among the leading causes of non-natural deaths for whales."
"over 170,000 hours of underwater audio recordings"