AI Drone Learns to Detect Brawls
A drone surveillance system trains to watch out for humans stabbing or punching each other
Drones armed with computer vision software could enable new forms of automated skyborne surveillance to watch for violence below. One glimpse of that futureÂ comes fromÂ UK and Indian researchers who demonstrated a drone surveillance system that can automatically detect small groups of people fighting each other.
The seed idea for researchers to develop such a drone surveillance system was first planted in the wake of the
Boston Marathon bombing thatÂ killed three and injured hundreds in 2013. That first attempt petered out. It was not untilÂ the Manchester Arena bombing that killed 23 and wounded 139â€”includingÂ many children leaving an Ariana Grande concertâ€”when the researchers made some progress. This time, theyÂ harnessedÂ a form of the popular artificial intelligence technique known as deep learning.
â€œThis time we were able to doÂ a relatively better job, because the softwareÂ was able toÂ run in realtime and does a relatively good job of detectingÂ violent individuals,â€� saysÂ Amarjot Singh
, aÂ Ph.D. student in deep learning at the University of Cambridge.
The drone surveillance system developed by Singh and his colleagues remainsÂ far from ready for primetime. But their work demonstrates one possibility of combining deep learningâ€™s pattern-recognition capabilities with relatively inexpensive commercial drones and the growing availability of cloud computing services. More details appear in aÂ 3 June 2018Â
paper that was uploaded toÂ theÂ
preprint server arXiv
Â and will appear in theÂ
IEEE Computer Vision and Pattern Recognition (CVPR) Workshops 2018.
A key part of this demonstration involved training deep learning algorithms to recognize violent actions by detecting various combinations ofÂ body and limb poses in video footage.Â To create a training dataset
Â researchers enlisted 25 interns to gather in an open area andÂ mimic violentÂ actions in five categories such as punching, kicking, strangling, stabbing and shooting while being filmed by a Parrot AR drone from various heights ranging from 2 meters to 8 meters.
But that wasnâ€™t all. The research teamÂ also needed to sit down and manually mark 18 coordinates on each person
â€™s body in the video frames. That would have quickly become a labor-intensive and exhausting process for 10,000 or 20,000 images normally needed to train deep learning algorithms. The researchers wanted to cut down the amount of necessary training data to just 2,000 annotated images that included about 5,000 individuals performing violent actions.
An unsupervisedÂ deep learning neural network automatically learns patterns over time by filteringÂ data through its many layers of artificial neurons from end to endâ€”aÂ process that can yield good predictive accuracy if you have enough computing resources and training data on your hands.Â Singhâ€™s workaround solutionÂ came fromÂ his Cambridge University research that has focused on more streamlined and efficient forms of deep learning capable of running with fewer computing resources and less training data.
Singh replaced some of the first neural network layers at the front-end with fixed parameters and used supervised learning toward the back-end. This move effectively replacedÂ some of the deep learning processÂ with human engineering input based on what Singh, the human designer, thought would work best for training the neural networkÂ to recognize different human body poses. That could meanÂ aÂ possible tradeoff in overall accuracy, but it enabled theÂ resultingÂ ScatterNet Hybrid Deep Learning (SHDL) network to learn more quickly with less data andÂ less available computing power.
The overall drone surveillance system relies upon theÂ SHDL network along with two standard deep learning algorithms. The first, called a feature pyramid network,Â is a common componentÂ of object recognition systems and performs the first task ofÂ detecting humans in video images. The second, called a support vector machine, uses the information from theÂ SHDL networkâ€™s body pose estimations to categorize people as either being violent or nonviolent.Â
Initial test results suggestÂ that the drone surveillance system can indeed work in realtimeÂ by having the Parrot drone offloadÂ the heavy-duty data crunching to Amazonâ€™s cloud service. Â Singhâ€™s colleagues at theÂ Indian Institute of ScienceÂ
Bangalore andÂ National Institute of Technology Warangal handled the drone part of the system.
But itâ€™s still early days as far as accuracy goes. The drone surveillance systemâ€™s accuracy steadily declines from 94 percentâ€”success in recognizing one violent individualâ€”as more and more violent individuals fill the images. That dropoff in accuracy may come from the difficulty of recognizing larger numbers ofÂ people who are spread out at varying distances from the drone camera, Singh says. It may also come from miscategorizations of peopleâ€™s poses.
The fact that the systemâ€™s accuracy drops with more people in the video frame raises the question as to how accurate it will be in analyzing large crowds. Performing real-time analyses of many more people than just the 25 interns could strain the system and require even more cloud computing resources and bandwidth.
Furthermore, the initial training dataset based on the simulated brawl among interns may not exactly reflect all the real-life violence that takes place in large crowd riots or terrorist attacks. That means the drone surveillance systemâ€™s accuracy in recognizing real-world violence is anyoneâ€™s guess until the researchers can test it on such video footage. â€œPeople punch in many different ways,â€� Singh says. â€œThere
â€™s not one or two ways of doing it.â€�
Still, the researchers are pressing forward. Theyâ€™re
Â in the process of securing permission from Indian officials to try out their system
at two upcoming music festivals. Such real-world tests could help them figure out the limits of the current drone surveillance systemâ€™s capabilities, given the expected presence of thousands of people in densely-packed crowds. (One student attendee was stabbed at one of the festivals lastÂ year.)
Singh is continuing to develop the deep learning models to possibly incorporate crowd modelling. He also anticipates expanding the systemâ€™s object recognition capabilities to include being able to spot a person carrying a gun or a bag. For example, having a real-time surveillance system capable of trackingÂ suspicious patterns among people carrying bags might have proven usefulÂ in the Boston Marathon bombing case.
There is even the possibility that Singh and his colleagues will move away from trying to have the system recognize specific acts of violence, such as stabbing or kicking, and instead focus on recognizing possible violence in general. Singh wants to see if that may yield a more practical implementation of the system down the line.
If the drone surveillance systemâ€™s accuracy does improve to the point of being commercially viable, Singh still envisions humans being in the loop to check out any suspicious activity or possible outbreaks of violence that the drone highlights. The automated surveillance system would help narrowÂ down the range of places a human security guard should look, so that the human brain and eyes can take over and quickly exercise proper judgment about the situation.Â
“The system is not going to go on its own and fly and kill people,” Singh says.