Facebook open sources reinforcement learning platform Horizon | Industry
Work on the framework began two and a half years ago and has been used internally at Facebook for the past year, Facebook engineer and Horizon project lead Jason Gauci told VentureBeat in a phone interview.
Horizon is made for the deployment of AI at scale for companies or research teams to carry out operations that may require thousands of CPUs or GPUs to train with billions of observations. However since it utilizes Apache Spark for preprocessing and PyTorch to train AI systems, Horizon can also be deployed on a single computer.
Product teams at Facebook have used Horizon for things like M Suggestions, a service that can recommend translations, Spotify songs, Food Network recipes, and a myriad of other things based on words used in conversations on Facebook Messenger.
It’s also been used to determine the bitrate of Facebook 360 videos, and to personalize when the Facebook app chooses to send users notifications.
Reinforcement learning uses rewards to drive the activity of agents to reach a desired goal.
Facebook chose to open source Horizon to move forward the field of reinforcement learning and unsupervised learning methods both among novice practitioners and students as well as large research projects that, like Facebook, need thousands of machines to train AI systems.
“I do think reinforcement learning (RL) is kind of the next frontier when it comes to industry wide, widespread adoption when it comes to machine learning, so we wanted to open source this to really provide a good platform for people all around the Bay and all around the world to start using RL,” Gauci said.
Facebook is no stranger to open-source tools for the training or deployment of AI.
Version 1.0 of popular deep learning framework PyTorch was released earlier this month with integrations for Google Cloud, AWS, and Azure Machine Learning. There’s also Caffe2 and Parlai, a platform for training AI models. Research from Facebook AI Research is also open sourced.
In addition to using PyTorch and Apache Spark, TensorBoard X is used for training visualizations, and ONNX for serving up AI models after training.
Unlike other forms of reinforcement learning at large organizations that may operate live, Horizon trains AI systems offline.
Horizon applies a technique known as counterfactual policy evaluation to evaluate the offline performance of an AI system to determine if alternative approaches may improve performance before going live.
“We can counterfactually look at these alternative actions and say oh maybe this alternative action was was better in this circumstance,” he said. “So using this we can as opposed to like a lot of RL where they kind of train online and the models always changing, we train offline and we have a stage where we evaluate the model, and we come up with some confidence on the models performance and then engineers can choose to sort of deploy that model or not, and the Horizon platform open sources all of that and makes it all available.”
Horizon is also made to normalize the training of large datasets, a commonly encountered issue with reinforcement learning, Gauchi said. The platform comes with step-by-step instructions so it can be utilized by anyone with basic computer science knowledge, not just researchers or experts at companies like Facebook.
“Anyone who has any kind of basic Unix experience can generate a dataset and train a model and see how it works and that’s one of the things there’s sort of an educational aspect to this, we want to get a lot of people kind of excited about the field,” he said.