While artificial intelligence has made huge strides in recent years, there’s actually a lot of human intervention going on behind the scenes; most AI relies on training materials—usually massive troves of labeled data painstakingly annotated by humans—to learn what a sunset looks like or to differentiate puppies from kittens.
But for AI to make the next steps, it needs to figure some things out on its own. That’s why Pedro Morgado, who joined the Department of Electrical and Computer Engineering in August 2022 as an assistant professor, is working on ways to teach artificial intelligence to understand unlabeled visual and audio information.
“I really believe that this area, called self-supervised learning, or learning by just looking at the data itself without asking humans for annotations, is the future of computer vision,” he says. “We’ll never be able to tell the computer everything it needs to know. That’s why I’m really interested in pushing this area.”
When Morgado began his undergraduate degree at Instituto Superior Técnico, now part of the University of Lisbon in Portugal, he thought he wanted to be an aerospace engineer. But after a year of tinkering, he realized he was much more interested in the software and electronics side of things. He moved into electrical and computer engineering, eventually completing a master’s degree in machine learning at the same school, focusing on using computer vision to diagnose Alzheimer’s disease from medical images.
He stayed on as a staff scientist for a couple of years before moving to the University of California-San Diego, studying computer vision and machine learning for his PhD. “From the beginning, we were very much into figuring out how we can build the same AI systems, but with much less human supervision so you can scale things up more easily and don’t have this bottleneck of asking humans for all the annotations.”
Rather, in self-supervised machine learning, the AI learns via observation and association, similar to humans. It learns by trying to predict the future, or to predict what sound the object it’s currently seeing should make, or how it should feel to the touch. Morgado, who was also a postdoctoral researcher at Carnegie Mellon University in Pittsburgh in 2021 and 2022, is applying similar self-supervision concepts to audiovisual and multimodal AI as well.
At UW-Madison, Morgado plans to continue both the computer vision and audiovisual research thrusts, and believes the technologies have a lot of potential applications, including reading medical images as well as aiding self-driving vehicles and robots that interact with people. “Self-supervised learning is basically just a backbone technology that will enable a deep understanding of various sensory inputs,” he says. “Your phone can already recognize some objects, and there are already products based on this type of technology. But we’re still just scratching the surface. Advanced real-world applications will become possible as more reliable and complete perception systems are developed.”
Morgado says UW-Madison will be a great place to pursue this research. “One reason I chose Madison is because I saw a vision across different departments on building this area of computer vision and machine learning. I was inspired by that, and thought it would be great to be a part of,” he says.