GSoC'20 Project Summary
GSoC'20 is approaching its finish line. Time to sum up what has completed!. In this blog, I will
share
about the project I have worked on during the Google Summer of Code '20 @ OpenMined.
PyZKP is a Python wrapper for open-source
Zero
Proof Knowledge Library, libsnark. This library
provides
a set of zkSNARK schemes which are part of a cryptographic method for proving/verifying, in
zero-knowledge,
the
integrity of computations.
Today, there is a lot of useful technologies coming out. Often these technologies need your data, which may
be pretty private or sensitive.
In this scope, privacy-enhancing technology aims to let you have both. So that you can benefit from modern
technology without having to give back your data. One excellent example of privacy-enhancing technologies is
zero-knowledge proof. These are
protocols which let a prover, let's say me, prove you a statement about a secret, without actually
giving up that secret. So I can attest to you that I know a secret, and something about the secret, without
revealing that secret.
Zero-knowledge proof has a vital
role to play in the future of verified machine learning prediction.
However, no deep learning framework can perform the verified computation of neural networks using ZKPs.
Since,
Python is the language of choice of Machine Learning Engineers and Data Scientists. It became essential to
make Zero Proof Knowledge Library available to them.
So, in this project I port an existing ZKP library, libsnark into PyTorch operations because there is no
doubt that C++ is faster than Python and can perform a colossal number-crunching job so fast and
efficiently. Here, we have wrapped libraries that are written in C++ to provide a performance boost there in
Python code which acts as an interface to this C++ code. We can then call C++ functions using Python syntax
where the actual processing happens in C++ behind the scene, and the result returns as a Python object.
Let's talk about PyZKP:
PyZKP is a Python wrapper for open-source
Zero
Proof Knowledge Library, libsnark. This library
provides
a set of zkSNARK schemes which is a cryptographic method for proving/verifying, in zero-knowledge,
the
integrity of computations.
Zero-Knowledge Succinct Non-interactive Arguments of Knowledge (Zk-SNARKs, a type of non-interactive
ZKP)
are Zero-Knowledge because they don't reveal any knowledge to the verifier, succinct because the proof can
be verified quickly, non-interactive because repeated interaction is not required between prover and
verifier and arguments of knowledge because they present soundproof.
In a zero-knowledge “Proof of Knowledge” the prover can convince the verifier not only that the
number
exists, but that they in fact know such a number – again, without revealing any information about the
number. The difference between “Proof” and “Argument” is quite technical and we don’t get into it
here.
Libsnark is a library that provides a programming
framework for zk-SNARK (Zero-Knowledge Succinct
Non-interactive Argument of Knowledge). Moreover, it includes two libraries Gadgetlib1 and
Gadgetlib2 that
have already implemented several common NP problems, e.g. the challenge of finding the pre-image of an
SHA-256 output. Gadgetlib1 contains more useful functions while gadgetlib2 is simply a fancier
refactorization of the first one.
PyZKP includes General-purpose proof systems
1. A preprocessing zkSNARK for the NP-complete language "R1CS" (Rank-1 Constraint Systems), which is a
language
that is similar to arithmetic circuit satisfiability.
2. A preprocessing SNARK for a language of arithmetic circuits, "BACS" (Bilinear Arithmetic Circuit
Satisfiability). This simplifies the writing of NP statements when the additional flexibility of R1CS is not
needed. Internally, it reduces to R1CS.
3. A preprocessing SNARK for the language "USCS" (Unitary-Square Constraint Systems).
4. A preprocessing SNARK for a language of Boolean circuits, "TBCS" (Two-input Boolean Circuit
Satisfiability).
Internally, it reduces to USCS. This is much more efficient than going through R1CS.
5. A simulation-extractable preprocessing SNARK for R1CS. For arithmetic circuits.
6. ADSNARK, preprocessing SNARKs for proving statements on authenticated data.
7. Proof-Carrying Data (PCD). This uses the recursive composition of SNARKs.
The ppzkSNARK supports proving/verifying membership in a specific NP-complete language: R1CS (rank-1
constraint systems). An instance of the language is specified by a set of equations over a prime field F,
and each equation looks like: < A, (1,X)> * < B , (1,X)> = < C, (1,X)> where A,B,C are vectors over F, and X
is a vector of variables.
There are three function which describe the basic workflow of most of the zk-SNARK protocols:
1. Given a R1CS (Rank-1 constraint system), aka example, the function
r1cs_gg_ppzksnark_generator generates a keypair one for the prover and
the other for the verifier.
2. The prover takes his key and together with the inputs of the example R1CS,
builds a proof with r1cs_gg_ppzksnark_prover. [The inputs include both
public values (primary_input, known also to the verifier) and
private “witness” values (auxiliary_input, not revealed to the verifier)].
3. Together with the public inputs and the verification key, the verifier checks the
proof with r1cs_gg_ppzksnark_verifier_strong_IC, which should return true if the proof
was indeed provided a satisfying witness by the prover.
PyZKP follows the Directory structure of libsnark [Link]
Code Contribution during GSoC
Commits/ Pull requests
| Issues Resolved
Challenges/Learnings
Choice of Binding framework, first I checked using SWIG, but ran into problems with C++ templates.
Libsnark
library is implemented using templates so, here I used pybind11. Pybind uses C++ templates with compile-time
introspection to generate wrapper code, where Swig uses a custom C++ parser and Pybind11 is similar in
syntax
and use to Boost Python, but more modern and actively maintained.
Choice of the Build system, Here I got two options, one the Bazel build system, and second the CMake.
Here,
Bazel speeds up builds and tests and rebuilds only what is necessary, so, after some experimentation, it was
pretty clear that as Libsnark uses CMake build system so, CMake build system will help better then
Bazel.
Future Work
There will still be a lot more features left to add to PyZKP. I would do my best and keep working on this. I
will enhance the functionality of PyZKP modules in the next versions of it. Also, post-GSoC, I will work on
creating various examples and use cases of this library. Create blogs so users will understand this
library, and users of PySyft will use it to generate, evaluate, and verify tensor computations. Also, I will
provide full-time support to the users interested in this on slack.
To conclude, most of the tasks were accomplished in a required volume and on time, the original goals met,
so
I consider GSoC as completed but, I will continue contributing to the project. I hope the community will
appreciate my work and contributions :)
Acknowledgements
I really would like to thank my mentor José Benardi de Souza Nunes for helping me throughout the project, PyDP Team Lead Chinmay Shah for his valuable inputs during initial issues as well as throughout the project, PyDP Team, Mentorship Team, OpenMined Community and OpenMined Leader Andrew Trask and all GSoC'20 students connected with on WhatsApp/Telegram group and #gsoc IRC channel.