Keeping 2 billion Android devices safe with machine learning
Posted by Sai Deep Tetali, Software Engineer, Google Play Protect
[Cross-posted from the Android Developers Blog]
At Google I/O 2017, we introduced Google Play Protect, our comprehensive set of security services for Android. While the name is new, the smarts powering Play Protect have protected Android users for years.
Google Play Protect’s suite of mobile threat protections are built into more than 2 billion Android devices, automatically taking action in the background. We’re constantly updating these protections so you don’t have to think about security: it just happens. Our protections have been made even smarter by adding machine learning elements to Google Play Protect.
Security at scale
Google Play Protect provides in-the-moment protection from potentially harmful apps (PHAs), but Google’s protections start earlier.
Before they’re published in Google Play, all apps are rigorously analyzed by our security systems and Android security experts. Thanks to this process, Android devices that only download apps from Google Play are 9 times less likely to get a PHA than devices that download apps from other sources.
After you install an app, Google Play Protect continues its quest to keep your device safe by regularly scanning your device to make sure all apps are behaving properly. If it finds an app that is misbehaving, Google Play Protect either notifies you, or simply removes the harmful app to keep your device safe.
Our systems scan over 50 billion apps every day. To keep on the cutting edge of security, we look for new risks in a variety of ways, such as identifying specific code paths that signify bad behavior, investigating behavior patterns to correlate bad apps, and reviewing possible PHAs with our security experts.
In 2016, we added machine learning as a new detection mechanism and it soon became a critical part of our systems and tools.
Training our machines
In the most basic terms, machine learning means training a computer algorithm to recognize a behavior. To train the algorithm, we give it hundreds of thousands of examples of that behavior.
In the case of Google Play Protect, we are developing algorithms that learn which apps are “potentially harmful” and which are “safe.” To learn about PHAs, the machine learning algorithms analyze our entire catalog of applications. Then our algorithms look at hundreds of signals combined with anonymized data to compare app behavior across the Android ecosystem to find PHAs. They look for behavior common to PHAs, such as apps that attempt to interact with other apps on the device, access or share your personal data, download something without your knowledge, connect to phishing websites, or bypass built-in security features.
When we find apps exhibit similar malicious behavior, we group them into families. Visualizing these PHA families helps us uncover apps that share similarities to known bad apps, but have yet remained under our radar.
After we identify a new PHA, we confirm our findings with expert security reviews. If the app in question is a PHA, Google Play Protect takes action on the app and then we feed information about that PHA back into our algorithms to help find more PHAs.
Doubling down on security
So far, our machine learning systems have successfully detected 60.3% of the malware identified by Google Play Protect in 2017.
In 2018, we’re devoting a massive amount of computing power and talent to create, maintain and improve these machine learning algorithms. We’re constantly leveraging artificial intelligence and our highly skilled researchers and engineers from all across Google to find new ways to keep Android devices safe and secure. In addition to our talented team, we work with the foremost security experts and researchers from around the world. These researchers contribute even more data and insights to keep Google Play Protect on the cutting edge of mobile security.
To check out Google Play Protect, open the Google Play app and tap Play Protect in the left panel.
Acknowledgements: This work was developed in joint collaboration with Google Play Protect, Safe Browsing and Play Abuse teams with contributions from Andrew Ahn, Hrishikesh Aradhye, Daniel Bali, Hongji Bao, Yajie Hu, Arthur Kaiser, Elena Kovakina, Salvador Mandujano, Melinda Miller, Rahul Mishra, Damien Octeau, Sebastian Porst, Chuangang Ren, Monirul Sharif, Sri Somanchi, Sai Deep Tetali, Zhikun Wang, and Mo Yu.