Building the First Data Confidence Fabric
- October 29, 2019
I recently worked with a team of Dell Technologies specialists to finish building the first-ever Data Confidence Fabric (DCF for short). Today, Dell Technologies announced that our prototype code will be contributed to the Linux Foundation to seed Project Alvarium.
I’d like to share some of the history behind the seeding of Project Alvarium and how our team at Dell Technologies came to initiate the overall effort.
For several years, the CTO of the Dell Technologies Edge and IoT business unit has been touting a vision of data monetization. However, it’s hard to monetize untrusted Edge and IoT data. As he likes to say, “It’s midnight. Do you know where your data has been?”
Enterprise storage systems have delivered trusted data to applications for a long time. We started our initial investigation wondering if these same trust principles could be applied to Edge and IoT ecosystems. Recent developments in data valuation, distributed ledgers, and data marketplaces facilitated everything coming together.
We observed that as Edge and IoT data and applications travel toward each other, they cross multiple boundaries such as networks, trust zones, stakeholders, organizations, firewalls, and geographies. We realized that in order to make this work – no single entity can own the trust – after all, imagine if one company owned the internet. Instead, an open-framework must be created in which trust can be inserted and confidence scores calculated. This would enable applications to not only analyze data but also calculate confidence scores that reflect how credible the data is and it became evident to us that it was time to write some code.
1st Level of Trust
We started with the EdgeX Foundry chair of the Core Working Group, Trevor Conn. Trevor wrote the first-ever Data Confidence Fabric software using Go Lang, the same programming language EdgeX is written in. His Data Confidence Fabric software registered with EdgeX as a client and began processing simulated device data. The initial confidence score for this data was “0” (no trust was inserted).
Dell Technologies then hired three computer science interns from Texas A&M to deploy EdgeX and the Data Confidence Fabric software on a Dell Gateway 3000 with a Trusted Platform Module (TPM) chip. Suffice it to say, the keyboards were smoking hot and the Mountain Dew was flowing freely. The first level of trust insertion used the TPM chip to “sign” simulated data. Then we modified EdgeX to validate the signature by using the TPM’s public key.
2nd Level of Trust
EdgeX was then adjusted to support N-S-E-W authentication by using VMware’s open-source Lightwave technology. The second level of trust insertion occurred when EdgeX rejected all requests for data except for those coming from the Data Confidence Fabric software.
3rd Level of Trust
Dell Boomi software was invoked by the Data Confidence Fabric software to gather provenance and appended this metadata to the sensor reading. This third level of trust insertion gives an application increased confidence in the history of the data.
4th Level of Trust
The Data Confidence Fabric software then stored the data locally using IPFS (an immutable, open-source storage system). This fourth level of trust insertion gives an application confidence that the data/provenance has not been tampered with. It also has the additional benefit of enabling analytics to access data closer to the source.
5th Level of Trust
The Data Confidence Fabric software then registered the data into VMware’s blockchain (based on the open-source Project Concord consensus algorithm). This fifth level of trust insertion contains the pointer to the data, as well as the confidence history/score.
Creating a Trust Score
How was the score calculated? For the sake of demonstration, addition was used to try and shoot for a “Perfect 10”.
Our first Data Confidence Fabric uses a configuration file, but going forward, the industry can create a dynamic framework in which trust insertion components register themselves and are inserted on-the-fly. We believe there is not single DCF, rather each organization decides what works for them and confidence scores are generated by the open algorithms that take different factors into consideration.
I mentioned before that Dell Boomi software played a big role in this Data Confidence Fabric and I wanted to share some thoughts on the project from Dell Boomi’s CTO, Michael J. Morton. According to Michael, “The concept of a trust fabric will increasingly become critical in order to make reliable and non-damaging business decisions due to the ever-increasing volume and velocity of Edge data, as well as the increasing risk of tainted data going undetected. In order to securely collect the metadata that is used in producing confidence scores, the Dell Boomi integration platform-as-a service was used to demonstrate how to accomplish this necessity, as well as a technology option of the loosely-coupled Project Alvarium framework.”
In closing, I’d like to say that coding the first Data Confidence Fabric was a fulfilling experience. We strived to use open source technologies whenever and wherever possible, but we also demonstrated that all vendors can benefit from Project Alvarium in that trust fabrics can be built from a mix of open source and commercial technologies.