TL;DR CipherCore introduces a new way of accessing and using data in which data remains confidential and allows for a robust and secure collaboration between many data owners, without disclosing their data to each other.
In a nutshell, CipherCore is a secure computation engine written in Rust (with a Python wrapper) that forms the foundation of the CipherMode secure data sharing platform that operates over encrypted data without decrypting it.
Efficiency and genericity of CipherCore allow us, among other things, to train a neural network to predict time to failure on the NASA Turbofan Jet Engine dataset (which consists of tens of thousands of training data points) in under 5 minutes without access to the plaintext training data (if one compares it with the best available implementation of Homomorphic Encryption, very conservatively, the latter will be at least 5000x slower).
Check out CipherCore on GitHub and if you like what you see, consider giving it a star!
Some quick links:
- Install CipherCore
- CipherCore video hands-on tutorial: private set intersection
- CipherCore Slack
At CipherMode, we are building a secure data sharing platform that enables analytics (think joins) and machine learning (think training decision trees or neural networks) over datasets that are distributed between parties that are not willing to trust each other or any third party with sharing their data. But how can we run any computation if we can't bring all the necessary data into one place and aren't allowed to leak any information about the said data? Isn't this obviously impossible?
Turns out it is possible if one is willing to use modern cryptography. Specifically, Secure Multi-Party Computation (SMPC). SMPC allows several parties to compute jointly any function of their inputs in a way that no information about the inputs (other than what can be inferred from the output) leaks to any other party. This is incredibly powerful: the function computed can be anything from "train this machine learning model" to "run this SQL query".
You can think of SMPC as a "black box", where the parties contribute their data and after that the black box gives back the result of a computation. Normally, such a black box is achieved using a trusted third party, but with SMPC there is no need for it.
SMPC is a very active area of academic research with many exciting developments: see this GitHub repo for an overview and pointers. However, so far it has not had mainstream, mass adoption by organizations despite providing a particularly clean, general and powerful solution for private data collaboration. There are two main reasons for this:
- SMPC introduces computational overhead;
- Systems based on SMPC are typically not very convenient to use.
At CipherMode, we are actively working on mitigating these two barriers. A part of these efforts is CipherCore: an SMPC engine that we decided to share with the community.
CipherCore: Everything is a Computation Graph
When we were building CipherCore, we spent lots of time thinking what's a good level of abstraction that would allow us, at the same time,
- Achieve great efficiency and as little computational overhead as possible;
- Make the framework as user-friendly and easy to use as possible for a wide range of applications (including analytics and machine learning);
- Make it as protocol-agnostic as possible: there are many different SMPC protocols depending on the number of parties and a particular threat model, and we'd like to have a general solution that allows for an easy plug-in of a new protocol.
Turns out, a good intermediate representation that satisfies all of the above property is that of a computation graph. Computation graphs are ubiquitous in the context of machine learning, massive data processing, databases etc.
In CipherCore, any user-defined computation is a computation graph. For example, if you want to multiply two matrices of sizes 10x20 and 20x30 securely using SMPC, you can write:
And this code produces the following computation graph:
Not only the original computation is represented as a graph, but also a secure protocol obtained by applying SMPC is also simply a graph (albeit a larger one). For instance, if we "compile" the above graph using the ABY3 SMPC three-party protocol, we get the following:
The new graph has functionality identical to the original user-defined computation (we still multiply two matrices), but this time we do it in a way that the first matrix is provided by "party 0", the second matrix is provided by "party 1", and the product is revealed to "party 2", and that's the only piece of information about the inputs anyone learns along the way.
The consequences of that in CipherCore, "everything is a computation graph" are the following:
- The process of compilation of a user-defined computation into a secure protocol can be represented as a multi-step pipeline (we currently have seven compilation steps), where each step has a global access to a current graph and can work with it as a whole. This allows for nice optimizations (for instance, we can systematically trade between the size and the "depth" of the graph) and results in overall great efficiency;
- Graphs are easy to visualize (at least, until they become way too big), and this gives an additional level of confidence in the correctness of the implementation;
- During the execution of a graph, we have quite a bit of freedom in memory allocation and the order of computation, which, again, results in improved performance.
- From the user perspective, creating computation graphs is fairly straightforward, similar to what one would do when creating a machine learning model in any popular ML framework (such as TensorFlow or PyTorch);
- Related to the previous item, it becomes particularly easy to translate computation graphs produced by an ML framework (e.g., in the form of ONNX graphs) or a query language (e.g., LINQ) into a form ready for consumption by CipherCore.
To build a sufficiently generic yet efficient secure computation engine, we also need to think about what is a basic set of types and operations we would like to support.
After some iterations, we arrived to the following instruction set:
- For types, we support scalars (bits and integers), multi-dimensional scalar arrays (akin to NumPy), vectors and tuples (think lists and tuples in Python);
- At the lowest level, we support a carefully chosen list of 29 operations that we tried to make as compatible to NumPy as possible;
- More complicated operations -- comparisons, non-linearities needed for machine learning, sorting, conditional selection, etc. -- are implemented in CipherCore using the notion of a custom operation, which gets mapped into a graph of basic operations during the compilation.
The Layer Above CipherCore
CipherCore can be used directly, but it is still might be quite low-level for an everyday use, so using CipherCore as a foundation, we at CipherMode are building two higher level frameworks:
- SecureAI, which enables privacy-preserving training and inference of machine learning models. On a technical level, this is done by compiling ONNX graphs into CipherCore graphs with various optimizations.
- For secure analytics (think join + compute), we are developing a query language that is easy to use and that allows compilation into CipherCore graphs.
We are planning to blog about both of these directions in detail soon (Will we be able to fine-tune a transformer on encrypted data??), stay tuned!
If you are interested to dig deeper and get some hands-on experience working with CipherCore, we provide a few resources: