Using Blockchain to Build Trust into Federated Analytics
- June 2, 2018
In a recent blog posted to CIO.com, Dell EMC’s Global VP, CTO for Sales, and Distinguished Engineer, Patricia Florist, Ph.D. talked about how the combination of blockchain technology and federated analytics enables organizations to analyze distributed data with trust, transparency and traceability. I invite you to read more from Dr. Florissi in the blog below, “Using Blockchain to Build Trust into Federated Analytics.”
With the rise of the Internet of Things and the explosive growth in data, organizations are increasingly taking computing and analytics to the data, rather than moving the data to a central location for processing and analysis. To address these challenges, Dell EMC has incubated the concept of federated analytics and started developing the World Wide Herd (WWH) platform to make it feasible.
So, what is federated analytics? Dell Technologies defines federated analytics as “the analysis of dispersed data in-place, as close as possible to the data source, while these local results are shared, fused and further analyzed along their path to other locations, enabling higher order learning at scale.” Federated analytics is very much the way of the future.
This brings us to a challenge at the heart of federated analytics. When you decentralize data analytics, you have to take steps to ensure that you can verify the integrity of all the participants and all the data sources in the analytics process. In particular, you need to have the same assurances of trust, transparency and traceability — the “Three Ts” of data analytics — that you would have in a centralized world.
How do you get there? The answer is to bring together the unique capabilities of federated analytics and blockchain technology, which adds a distributed ledger to the federated analytics solution (see sidebar). This is the approach Dell Technologies uses in our WWH.
With WWH, the federated analytics network coexists with a blockchain network. Just as it does in the world of bitcoin, the blockchain provides a distributed digital ledger of transactions among entities in the federated analytics network, along with cryptography to prevent unauthorized changes to data and the transparency to enable all participants in the analytics process to see and verify everything that happens in the blockchain. This gives everyone the assurance that no unauthorized sources are participating in computations or using data in malicious ways.
Let’s drill down a bit more. In the case of WWH, the blockchain acts as a distributed ledger that records all the details related to each of the computations completed in the federated analytics network. WWH hooks into the blockchain via APIs. These APIs pair up WWH nodes with blockchain nodes. Each computation becomes an entry in a blockchain ledger. The blockchain ledger captures information on things like the ID of the federated analytics node requesting the analytics, a reference to the data resources used in doing the computation, when and where the analytics were run, the results of the analytics, and a unique hash of the actual content of the data used in the analytics.
In other words, the blockchain provides a parallel way of logging information and ensuring the integrity of the data and of the participants in the federated analytics network. And this brings us back to the “Three Ts” of federated analytics. Blockchain provides mechanisms for ensuring that you can trust your federated analytics peers, that you have transparency that allows you to see everything that happens to your data to achieve a particular analytics result, and that you have traceability into your sources, via the ability to replay tasks captured in the blockchain.
To make this story more tangible, consider, for example, the case of autonomous driving using cloud robotics. Local computations happen in the cars as well as in private and public clouds, and a tremendous amount of these local analytics results are shared through vehicle-to-vehicle and vehicle-to-cloud communications. In the unfortunate event of an accident, litigation procedures will demand full traceability of how the calculations that led to the accident were achieved. All nodes in the federated analytics, including vehicles and clouds, will need to demonstrate the source of the data, the content of the data used, the analytics performed, the local results achieved and the nodes with which they shared results. This is an ideal use case for the Dell Technologies WWH that couples federated analytics with blockchain.
As I noted in an earlier series of posts on CIO.com, federated analytics overcomes some of the challenges inherent in centralized analysis of distributed data. These challenges include the need to analyze data at scale in near real-time before the raw data has time to be transmitted from its collection area to a central location, the existence of data in hard-to-reach places, government regulations that restrict the movement of data, the distribution of data over multiple endpoints and clouds, and bandwidth constraints that make it difficult to move data.
With federated analytics, data can be analyzed locally, and only the local results are shared — the data itself stays put. This approach enables higher-order learning at scale while conserving bandwidth, accelerating time to insight, and preserving privacy, as the individual data points used to calculate the local results cannot be reverse-engineered from the local results themselves.
At the end of the day, federated analytics solves the problem of analyzing data that can’t be moved and the demand to preserve their privacy. Blockchain solves the problem of ensuring that the data can be trusted, the analytics can be performed, and the data used can be transparent, and all distributed and parallel execution flows can be traced and repeated.
By integrating federation analytics and blockchain, analytics at a worldwide scale can be achieved in a fully decentralized, peer-to-peer collaboration mode, without including a third party to coordinate the process. Everything is trusted, transparent and traceable, and it all comes together in the Dell Technologies World Wide Herd.
Patricia Florissi, Ph.D., is vice president and global CTO for sales and a distinguished engineer for Dell EMC. Twitter link: @florissidelltec