Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Have you ever wished you could understand what your dog is trying to say? Researchers at the University of Michigan are developing AI tools that can identify whether a dog’s bark conveys playfulness or aggression.
These AI models can also determine other information from animal vocalizations, such as age, breed, and sex. In collaboration with Mexico’s National Institute of Astrophysics, Optics, and Electronics (INAOE) in Puebla, the study found that AI models originally trained on human speech could be adapted to understand animal communication. The results were presented at the Joint International Conference on Computational Linguistics, Language Resources, and Evaluation.
“By using speech processing models initially trained on human speech, our research opens a new window into how we can leverage what we have built so far in speech processing to start understanding the nuances of dog barks,” said Rada Mihalcea, the Janice M. Jenkins Collegiate Professor of Computer Science and Engineering and director of U-M’s AI Laboratory. “There is so much we don’t yet know about the animals that share this world with us. Advances in AI can revolutionize our understanding of animal communication, and our findings suggest that we may not have to start from scratch.”
One of the main challenges in developing AI models to analyze animal vocalizations is the lack of publicly available data. While there are numerous resources for recording human speech, collecting data from animals is more difficult. “Animal vocalizations are logistically much harder to solicit and record,” said Artem Abzaliev, lead author and U-M doctoral student in computer science and engineering. “They must be passively recorded in the wild or, in the case of domestic pets, with the permission of owners.”
Due to the scarcity of usable data, techniques for analyzing dog vocalizations have been difficult to develop. The researchers overcame this challenge by repurposing an existing model originally designed to analyze human speech. This approach allowed them to use robust models that form the backbone of various voice-enabled technologies, such as voice-to-text and language translation. These models can distinguish nuances in human speech, like tone, pitch, and accent, and convert this information into a format that a computer can use to identify words and recognize speakers.
“These models can learn and encode the incredibly complex patterns of human language and speech,” Abzaliev said. “We wanted to see if we could leverage this ability to discern and interpret dog barks.”
The researchers used a dataset of dog vocalizations recorded from 74 dogs of varying breeds, ages, and sexes in various contexts. Humberto PĂ©rez-Espinosa, a collaborator at INAOE, led the team that collected the dataset. Abzaliev then used these recordings to modify a machine-learning model, specifically a speech representation model called Wav2Vec2, which was initially trained on human speech data.
With this model, the researchers generated representations of the acoustic data collected from the dogs and interpreted these representations. They found that Wav2Vec2 not only succeeded at four classification tasks but also outperformed other models trained specifically on dog bark data, with accuracy figures up to 70%.
“This is the first time that techniques optimized for human speech have been built upon to help with the decoding of animal communication,” Mihalcea said. “Our results show that the sounds and patterns derived from human speech can serve as a foundation for analyzing and understanding the acoustic patterns of other sounds, such as animal vocalizations.”
In addition to establishing human speech models as a useful tool in analyzing animal communication, this research has important implications for animal welfare. Understanding the nuances of dog vocalizations could greatly improve how humans interpret and respond to the emotional and physical needs of dogs, thereby enhancing their care and preventing potentially dangerous situations, the researchers said.