OpenMined


GSoC'20 is approaching its finish line. Time to sum up what has completed!. In this blog, I will share about the project I have worked on during the Google Summer of Code '20 @ OpenMined.

PyZKP is a Python wrapper for open-source Zero Proof Knowledge Library, libsnark. This library provides a set of zkSNARK schemes which are part of a cryptographic method for proving/verifying, in zero-knowledge, the integrity of computations.

Today, there is a lot of useful technologies coming out. Often these technologies need your data, which may be pretty private or sensitive.

In this scope, privacy-enhancing technology aims to let you have both. So that you can benefit from modern technology without having to give back your data. One excellent example of privacy-enhancing technologies is zero-knowledge proof. These are protocols which let a prover, let's say me, prove you a statement about a secret, without actually giving up that secret. So I can attest to you that I know a secret, and something about the secret, without revealing that secret.

Zero-knowledge proof has a vital role to play in the future of verified machine learning prediction. However, no deep learning framework can perform the verified computation of neural networks using ZKPs. Since, Python is the language of choice of Machine Learning Engineers and Data Scientists. It became essential to make Zero Proof Knowledge Library available to them.

So, in this project I port an existing ZKP library, libsnark into PyTorch operations because there is no doubt that C++ is faster than Python and can perform a colossal number-crunching job so fast and efficiently. Here, we have wrapped libraries that are written in C++ to provide a performance boost there in Python code which acts as an interface to this C++ code. We can then call C++ functions using Python syntax where the actual processing happens in C++ behind the scene, and the result returns as a Python object.

Zk-SNARK

[The zk-SNARK Protocol]

Let's talk about PyZKP:

PyZKP is a Python wrapper for open-source Zero Proof Knowledge Library, libsnark. This library provides a set of zkSNARK schemes which is a cryptographic method for proving/verifying, in zero-knowledge, the integrity of computations.

Zero-Knowledge Succinct Non-interactive Arguments of Knowledge (Zk-SNARKs, a type of non-interactive ZKP) are Zero-Knowledge because they don't reveal any knowledge to the verifier, succinct because the proof can be verified quickly, non-interactive because repeated interaction is not required between prover and verifier and arguments of knowledge because they present soundproof.

In a zero-knowledge “Proof of Knowledge” the prover can convince the verifier not only that the number exists, but that they in fact know such a number – again, without revealing any information about the number. The difference between “Proof” and “Argument” is quite technical and we don’t get into it here.

Libsnark is a library that provides a programming framework for zk-SNARK (Zero-Knowledge Succinct Non-interactive Argument of Knowledge). Moreover, it includes two libraries Gadgetlib1 and Gadgetlib2 that have already implemented several common NP problems, e.g. the challenge of finding the pre-image of an SHA-256 output. Gadgetlib1 contains more useful functions while gadgetlib2 is simply a fancier refactorization of the first one.

PyZKP includes General-purpose proof systems
1. A preprocessing zkSNARK for the NP-complete language "R1CS" (Rank-1 Constraint Systems), which is a language that is similar to arithmetic circuit satisfiability.
2. A preprocessing SNARK for a language of arithmetic circuits, "BACS" (Bilinear Arithmetic Circuit Satisfiability). This simplifies the writing of NP statements when the additional flexibility of R1CS is not needed. Internally, it reduces to R1CS.
3. A preprocessing SNARK for the language "USCS" (Unitary-Square Constraint Systems).
4. A preprocessing SNARK for a language of Boolean circuits, "TBCS" (Two-input Boolean Circuit Satisfiability). Internally, it reduces to USCS. This is much more efficient than going through R1CS.
5. A simulation-extractable preprocessing SNARK for R1CS. For arithmetic circuits.
6. ADSNARK, preprocessing SNARKs for proving statements on authenticated data.
7. Proof-Carrying Data (PCD). This uses the recursive composition of SNARKs.

The ppzkSNARK supports proving/verifying membership in a specific NP-complete language: R1CS (rank-1 constraint systems). An instance of the language is specified by a set of equations over a prime field F, and each equation looks like: < A, (1,X)> * < B , (1,X)> = < C, (1,X)> where A,B,C are vectors over F, and X is a vector of variables.

There are three function which describe the basic workflow of most of the zk-SNARK protocols:
1. Given a R1CS (Rank-1 constraint system), aka example, the function r1cs_gg_ppzksnark_generator generates a keypair one for the prover and the other for the verifier.

2. The prover takes his key and together with the inputs of the example R1CS, builds a proof with r1cs_gg_ppzksnark_prover. [The inputs include both public values (primary_input, known also to the verifier) and private “witness” values (auxiliary_input, not revealed to the verifier)].

3. Together with the public inputs and the verification key, the verifier checks the proof with r1cs_gg_ppzksnark_verifier_strong_IC, which should return true if the proof was indeed provided a satisfying witness by the prover.

PyZKP follows the Directory structure of libsnark [Link]

Code Contribution during GSoC

Commits/ Pull requests | Issues Resolved

Challenges/Learnings

Choice of Binding framework, first I checked using SWIG, but ran into problems with C++ templates. Libsnark library is implemented using templates so, here I used pybind11. Pybind uses C++ templates with compile-time introspection to generate wrapper code, where Swig uses a custom C++ parser and Pybind11 is similar in syntax and use to Boost Python, but more modern and actively maintained.

Choice of the Build system, Here I got two options, one the Bazel build system, and second the CMake. Here, Bazel speeds up builds and tests and rebuilds only what is necessary, so, after some experimentation, it was pretty clear that as Libsnark uses CMake build system so, CMake build system will help better then Bazel.

Future Work

There will still be a lot more features left to add to PyZKP. I would do my best and keep working on this. I will enhance the functionality of PyZKP modules in the next versions of it. Also, post-GSoC, I will work on creating various examples and use cases of this library. Create blogs so users will understand this library, and users of PySyft will use it to generate, evaluate, and verify tensor computations. Also, I will provide full-time support to the users interested in this on slack.

To conclude, most of the tasks were accomplished in a required volume and on time, the original goals met, so I consider GSoC as completed but, I will continue contributing to the project. I hope the community will appreciate my work and contributions :)

Acknowledgements

I really would like to thank my mentor José Benardi de Souza Nunes for helping me throughout the project, PyDP Team Lead Chinmay Shah for his valuable inputs during initial issues as well as throughout the project, PyDP Team, Mentorship Team, OpenMined Community and OpenMined Leader Andrew Trask and all GSoC'20 students connected with on WhatsApp/Telegram group and #gsoc IRC channel.