IDG Contributor Network: In-memory computing: enabling continuous learning for the digital enterprise | Computing
Businesses across a range of industries are turning to machine learning and deep-learning-powered artificial intelligence applications to drive their digital transformation and omnichannel customer experience initiatives. These tools can help generate new revenue, improve efficiency, increase customer satisfaction, and deliver business insight. According to IDC, spending on AI and machine learning will reach $57.6 billion by 2021, up from $12 billion in 2017. According to Deloitte Global, machine learning pilot programs and implementations will double in 2018 compared to 2017 and double again by 2020.
However, to optimize many of these use cases, the underlying machine learning and deep learning models need to be updated in real time as new data is added to the system. Consider the following real-time continuous learning use cases:
- Financial risk mitigation. To minimize the spread of new loan scams, a bank must continuously update its machine learning model of what indicates a possible loan fraud attempt with real-time data on new loan applications to prevent these new strategies from spreading. To reduce the risk associated with approving new credit cards during the checkout process, a credit card company must continuously update its fraud machine learning model in real time to score the likelihood of a fraudulent transaction in real time.
- Recommendation engines. To obtain the maximum ROI from recommendations, an e-commerce platform must continuously update its machine learning recommendation model, so it can combine historical data such as webpage visits and purchase patterns with current webpage activity, referral information, product inventory, and newly available products, along with emerging purchase patterns. The resulting continuous learning machine learning model will result in more targeted and effective recommendations.
- Information security. Predictive analytics for network and data security requires a constant comparison between “normal” activity and possible threat activity. In large networks, what constitutes normal activity can change frequently (e.g., new types of network devices and end points and new protocols), so the underlying machine learning model must be continuously retrained.
In each of these cases, a traditional database deployment—separate online transactional processing (OLTP) and online analytical processing (OLAP) databases with an extract, transform, load (ETL) process for moving the operational data to the OLAP system prior to model training—make it impossible to update a machine learning model in real time because of the delays inherent in the ETL process.
Although artificial intelligence use cases are currently not as common as machine learning use cases, the challenge of creating a continuous learning system will be the same. From voice recognition systems to self-driving cars, deep learning systems will need to be equipped with a real-time continuous learning capability for optimal performance.
Achieving continuous learning with in-memory computing
Today’s in-memory computing platforms are deployed on a cluster of servers that can be on-premises, in the cloud, or in a hybrid environment. The platforms leverage the cluster’s total available memory and CPU power to accelerate data processing while providing horizontal scalability, high availability, and ACID transactions with distributed SQL. When implemented as an in-memory data grid, the platform can be easily inserted between the application and data layers of existing applications. In-memory databases are also available for new applications or when initiating a complete rearchitecting of an existing application. The in-memory computing platform also includes streaming analytics to manage the complexity around dataflow and event processing. This allows users to query active data without impacting transactional performance. This design also reduces infrastructure costs by eliminating the need to maintain separate OLTP and OLAP systems.
The latest in-memory computing platforms include two capabilities essential for large-scale continuous learning systems:
- Integrated machine learning and deep learning libraries that have been optimized for massively parallel processing. This allows the system to train machine learning or deep learning models against the distributed data residing on all the nodes in the cluster. A continuous learning framework allows the machine learning or deep learning model to continuously update as new data is added without impacting performance. The integrated machine learning and deep learning libraries used in a continuous learning mode are a key building block of what Gartner calls in-process HTAP (hybrid transactional/analytical processing).
- A memory-centric architecture that keeps the entire, fully operational data set on a distributed ACID and ANSI-99 SQL-compliant disk store (using spinning disks, solid state drives [SSDs[, 3D XPoint, or other storage-class memory technologies). Only a user-defined subset of the data is maintained in the in-memory data grid or in-memory database, which allows organizations to balance application performance against infrastructure costs. This architecture also supports fast recovery following a reboot because the data on disk can be immediately accessed and processed against without waiting for all data to be loaded into memory.
With the growing need for continuous learning systems, enterprises must begin assessing now how they can implement these capabilities cost-effectively and without major disruption. Fortunately, mature open source platforms, decreases in in-memory software and infrastructure costs, and growing third-party expertise in in-memory computing means organizations can begin charting their future today.