With GDPR regulation acting as a catalyst, every government is addressing the export of personal data outside the country and on data collection intent & mechanism. This has suddenly changed the data architecture deployment landscape and specifically for large enterprises who were trying to consolidate data ( data lake initiatives) and create a Center of Excellence of data science team in order to gain strategic advantage in there journey toward a “data-driven organization”. The reality for most of the multi-national organization will be a hub and spoke data architecture. With “Data Scientist” skills already being in high demand replicating workforce at each site is not feasible. Further, with multi-site data collection standardization on common processes and tools is operationally challenging. The problem is further magnified with multiple tools and multi-cloud deployment.
Let’s consider an example of one of the leading retail and commercial bank in the region. I had recently been engaged with them to provide consulting on a digital transformation project. They have centralised operations across multiple countries in the Asia Pacific, a great data science practice and the 3rd generation of data lake deployment. Suddenly regulations have forced them to stop collecting data from local partners and remote subsidiary location. They have models that run on multiple framework and libraries such as SparkML, H2O, scikit-learn, tensorflow, anaconda, SPSS. With these model embedded in day to day processes, there are challenges ahead in terms of replicating and managing these operations remotely at the same time leverage the scale they have built. So now a huge set of data is available at multiple sites owned by an entity they may or may not control & can’t fully trust due to a shared operation or competitive reason.
IBM Remote Machine Learning Deployment provides a solution to the above situation. The technology is embedded in our data science platforms such as IBM Cloud Private for data and Watson Studio. By providing multiple containers it provides a single deployment instance with all the top ML framework and popular open source library to build the model in a collaboration environment. When it comes to deployment by pushing a virtualize environment (container + library) to a remote machine (both offline and online) it provides an optimized process and capabilities to score the data close to the data without copying or moving the data. The vision is to provide a instrumentation and mechanisms to ship the remote metrics to a central repo so that Enterprise can monitor model centrally no matter where its deployed. Here is a video demo that explains what we have in beta.
While this approach seems an extension of edge analytics, in this case, the edge may not be just IoT devices but it can be any machine at a subsidiary in another location. It is not just about real-time data or large distributed data single but the goal is to monetize your ML model and data science capabilities.