Detailed technical explanation of zero-knowledge proof implementation method: take zk-SNARK as an example

Detailed technical explanation of zero-knowledge proof implementation method: take zk-SNARK as an example

Loading

How does zk-SNARK realize zero-knowledge proof? Detailed interpretation from a technical point of view.

Original Title: “A Thorough Understanding of Zero-Knowledge Proof and Its Implementation Method: Analysis of zk-SNARK”
Written by: Li Hua Thanks: Dongze, even
Reference: Maksym Petkus, Why and How zk-SNARK Works: Definitive Explanation

We have seen zk-SNARK more and more frequently, and we will encounter it more and more.

Everyone knows that it is a way to achieve zero-knowledge proof, and also knows that it is a weapon for expansion and privacy, but what is zero-knowledge proof, and how does zk-SNARK realize zero-knowledge proof? This is the question this article is trying to answer.

Reading this article requires maintaining a high degree of concentration and proper thinking; but this article tries to be free of ambiguity and obscurity, so if you stay focused, you can understand how the zero-knowledge proof is done through zk-SNARK after reading it. Instead of staying at the stage where you only know the concept.

So now, let’s start.

Detailed technical explanation of zero-knowledge proof implementation method: Take zk-SNARK as an example

The first stage: prepare for the use of zk-SNARK

Convert the need for zero-knowledge proof into a mathematical operation circuit

The insurance institution needs to know whether a user’s physical examination data meets the standard. The user only wants the insurance institution to know whether the data meets the standard, but does not disclose the specific data to them. How to do it

Assuming that the private data of this physical examination is x, the value of x between 0 and 7 (including 0 and 7) is considered to be up to standard, otherwise it is not up to standard. The first thing to do is to convert this requirement into a calculable problem. That is, there is a program whose input is x. When x is between 0 and 7, it outputs “true”, and when x is not in the range, it outputs ” false”.

The calculation process of the program can be expressed as the verification process of the following set of equations. After inputting x, if all 4 equations are satisfied, that is, the calculation results on the left side of the equation are all 0, it can be proved that x is between 0 and 7, then output “true”, otherwise output “false”. The 4 equations are:

 b0 * (b0-1) = 0 

b1 * (b1-1) = 0

b2 * (b2-1) = 0

2^0 * b0 + 2^1 * b1+2^2 * b2 – x = 0

Note: The values ​​of b0, b1, and b2 are determined by x, which are also inputs; the first 3 equations constrain the values ​​of b0, b1, and b2 to be 0 or 1; the last equation verifies that the value of x is between 0-7.

Such a set of equations can be expressed as a mathematical operation circuit. Take b0 * (b0-1) as an example. The circuit of this part is shown in the figure below.

Detailed technical explanation of zero-knowledge proof implementation method: Take zk-SNARK as an example

The calculation process of the entire program can be abstractly expressed as the following figure:

Detailed technical explanation of zero-knowledge proof implementation method: Take zk-SNARK as an example

In this way, the problem of zero-knowledge proof is transformed into “input + mathematical operation circuit used for proof + output “; if the output is true, the input can be believed to be true. In the example of this article, if the output is 0, you can believe that the user’s physical examination data meets the standard.

Convert mathematical operation circuits to polynomials

After transforming the requirement of zero-knowledge proof into a mathematical operation circuit, we can separate the two roles of verifier and prover. When the verifier is not the prover, it creates an opportunity to hide the input (original data).

Then there is only one question: Why does the verifier believe that this “output” is the “input” calculated by the “mathematics circuit used for proof”? The easiest way is for the verifier to look at the prover’s input and calculations, but then the input cannot be hidden.

Therefore, we have to perform conversion on the basis of the mathematical operation circuit, and bind the input, the mathematical operation circuit used for proof, and the output together. That is to say, we can verify whether the “output” is ” “Input” is calculated by “Mathematical Operation Circuit for Proof”.

In this way, under the premise that the output is true, as long as the verification passes, the prover can believe that the input is true.

In order to complete this binding, the mathematical operation circuit needs to be converted into a polynomial.

(The following is the mathematical conversion process, if you are not interested, you can skip to the paragraph below the next picture)

First, the mathematical operation circuit must be converted into a first-order constraint system (RICS).

Vitalik introduced this conversion process in detail in the article “Quadratic Arithmetic Programs: from Zero to Hero”. This article omits the detailed description and will directly quote the example used by Vitalik.

This is a mathematical process. We only need to know that this conversion is equivalent. If you want to know the details, you can read Vitalik’s article.

 Vitalik’s equations are:




x * x-sym_1 = 0  


sym_1 * x – y = 0

(y + x) * 1-sym_2 = 0

(sym_2 + 5) * 1-~out = 0

Taking the first equation x * x-sym_1 = 0 as an example, the form after it is converted to R1CS is as follows:

 a = [0, 1, 0, 0, 0, 0]  


b = [0, 1, 0, 0, 0, 0]

c = [0, 0, 0, 1, 0, 0]

s = [~one,x,~out,sym_1,y,sym_2]

R1CS is a sequence consisting of the above three vectors (a, b, c). The solution vector is s and satisfies s. a * sb-sc = 0, which is also called a constraint. You can try to calculate, you will find that sa * sb-sc = x * x-sym_1, they are equivalent.

Vitalik’s example contains 4 flattened equations, and each equation corresponds to a RICS constraint, so there are a total of 4 constraints.

The next step is to convert R1CS to a polynomial. There is still no need to understand how to convert, just know that the conversion is equivalent. The conversion process is as follows:

There are a total of 4 RICS constraints, each of which contains a vector a, so there are a total of 4 vectors a. They are:

 a = [0, 1, 0, 0, 0, 0]  


a = [0, 0, 0, 1, 0, 0]

a = [0, 1, 0, 0, 1, 0]

a = [5, 0, 0, 0, 0, 1]

Through Lagrangian interpolation, these 4 vectors can be converted into 6 polynomials (because each vector contains 6 values), they are:

 –5+ 9.166* x- 5* x^2+0.833* x^3  


8- 11.333 * x + 5 * x^2-0.666 * x^3

0 + 0 * x + 0 * x^2 + 0 * x^3

–6 + 9.5 * x-4 * x^2 + 0.5 * x^3

4-7 * x + 3.5 * x^2-0.5 * x^3

–1+ 1.833 * x- 1 * x^2+ 0.166 * x^3

You can try to substitute x = 1 into these 6 polynomial calculations, and you will get the 6 values ​​[0, 1, 0, 0, 0, 0], and you will find that they are exactly the vector a under the first constraint; x is equal to 2, 3, and 4 respectively. Bring in these 6 polynomials, and you will get the vector a under the other 3 constraints.

The vector a corresponds to 6 polynomials, the same is true for the vector b and the vector c, so there will be 18 polynomials in total. When x=1, the 18 solutions of these 18 polynomials are the first constraint; when x is equal to 2, 3, and 4, the second, third, and fourth constraints are obtained respectively. The polynomial is equivalent to RICS.

At this point, the conversion from mathematical operation circuit to polynomial is completed:

  • Through RICS conversion, the proof of multiple original equations is transformed into multiple sa * sb-sc = 0;

  • Through polynomial conversion, the proof of multiple sa * sb-sc = 0 is transformed into a proof that when x=1, x=2, x=3, x=4, A(x) * B(x) – C(x) = 0, where A(x) = sa, B(x) = sb, C(x) = sc. As shown below.

The number in the leftmost row of the figure [1, 3, 35, 9, 27, 30] is the solution vector s in Vitalik’s example. A1(x) is the value of the first polynomial of a at x, for example, A1(1)=0; A1(x)~A6(x) form the vector a of the xth constraint, for example, when x=1, a=[0, 1, 0, 0, 0, 0]; others are the same.

Detailed technical explanation of zero-knowledge proof implementation method: Take zk-SNARK as an example

(If you skip the conversion, you can continue reading from here)

We seem to be taking a cumbersome step to turn the originally simple proof into a complex proof. Why? If you observe carefully, you will find that a magical thing has happened:

Originally what the verifier needs to verify is: the input passes through the mathematical operation circuit used as the proof, and the output is true; but now the verifier needs to verify that when x=1, x=2, x=3, x=4, A (x) * B(x)–C(x) = 0.

A(x) * B(x)–C(x) contains the understanding vector s, which binds the input, output, and circuit together; verify that A(x) * B(x)–C(x) = 0, is to verify whether the input and output meet the mathematical operation circuit used as a proof.

This solves the question raised at the beginning of this section: Why does the verifier believe that the “output” is the “input” calculated by the “mathematical circuit used for proof”.

In this way, the verifier does not need to pay attention to the specific input and calculation process, but only needs to verify whether A(x) * B(x)-C(x) = 0 itself holds. This gives us a real opportunity to hide input.

You can breathe a sigh of relief here. Although we haven’t started entering zk-SNARK yet, we have already completed the hardest part. It doesn’t matter if the conversion part is not understood clearly, you just need to know:

To achieve zero-knowledge proof through zk-SNARK, in fact, with the help of zk-SNARK, to verify that when x = 1, 2, 3,…, n, A(x) * B(x)–C(x ) = 0, where n is the number of RICS constraints; and different zero-knowledge proof problems differ only in their A(x), B(x), and C(x).

Convert multiple verifications into one verification

For a zero-knowledge proof problem containing n constraints, it is necessary to verify whether A(x) * B(x)–C(x) is zero or not. Can we turn n verifications into one verification? can.

  • Define t(x) = (x-1) * (x-2) * …… * (xn), t(x) must be equal to zero at x=1, x=2,……, x=n;

  • Then if there is a polynomial h(x) such that A(x) * B(x)–C(x) = t(x) * h(x) holds, it means A(x) * B(x)– The polynomial C(x) can be divisible by t(x), so this polynomial must be equal to zero at x=1, x=2,…, x=n.

In other words, verifying A(x) * B(x)–C(x) = t(x) * h(x) can complete the verification of all RICS constraints at one time; and once this verification is passed, the input can be trusted Is true.

Now we can finally hand over the baton to zk-SNARK, and its job is to help verify A(x) * B(x)–C(x) = t(x) * h(x).

What is the verification process? It’s simple. The verifier chooses a random number x to initiate the challenge, and the prover proves that at this x, A(x) * B(x)–C(x) = t(x) * h(x).

Unconsciously, a key and interesting thing happened:

The prover originally needs to prove that when x=1, x=2,…, x=n, A(x) * B(x)–C(x) = 0, this is a kind of inference proof, just like we Solving math problems requires step-by-step derivation and calculation. In this proof process, it is difficult to hide the knowledge of solving the problem.

But now, the reasoning proof has become an interactive proof: the verifier challenges at a random point, and the prover responds to the challenge with a solution at this point; the prover needs “knowledge” to calculate the solution at the random point, but this The solution itself does not reveal “knowledge”.

And this is the premise for the establishment of zero-knowledge proof, or in other words, if there is no proof method of interactive proof, the field of zero-knowledge proof would not exist.

Only one question remains: Why can we determine whether two polynomials A(x) * B(x)–C(x) and t(x) * h(x) are equal by verifying a point on the polynomial?

This is determined by the characteristics of the polynomial. “The calculation result of a polynomial at any point can be regarded as the expression of its unique identity.” (Quoted from Maksym)

It is more intuitive to illustrate with graphics. The figure below shows two polynomials, the blue curve is f(x) = x^3– 6 * x^2+ 11 * x– 6, the green curve is f(x) = x^3– 6 * x^2+ 10 * x-5, it is not difficult to find that the two polynomials are only slightly different, but their curves are very different.

Detailed technical explanation of zero-knowledge proof implementation method: Take zk-SNARK as an example

Just as there are no two identical leaves in the world, and there are no two different curves that overlap in a certain area. They will only intersect at a limited point. For two polynomials of degree d (the largest exponent of x), the number of points they intersect will not exceed d.

In the application of this article, there is a polynomial A(x) * B(x)–C(x), a polynomial t(x) * h(x), if they are not equal, they will only be at most d points Intersect, that is, the solutions at d points are the same.

This means that only when the random challenge point x selected by the verifier is exactly one of these d points, misjudgment may occur, that is, when A(x) * B(x)–C(x) and t( x) * h(x) If they are not the same, mistakenly think they are the same.

The probability of misjudgment exists, but we don’t need to worry about it. Comparing d points with the value range of x, the probability of this event is extremely low; combined with the subsequent cryptographic methods used in the zk-SNARK protocol, this event It can be considered almost impossible.

So now, after all the preparatory work is completed, this article enters the next stage. The abstract summary of the first stage is: the problem of zero-knowledge proof is to verify that the calculation satisfies multiple constraints, and polynomials can verify multiple constraints at one point.

The second stage: use zk-SNARK to complete the zero-knowledge proof

Thanks to the interactive proof and the characteristics of polynomials, we can verify whether the solution of the polynomial A(x) * B(x)–C(x) at the challenge point x is equal to the solution of t(x) * h(x) To verify whether A(x) * B(x)–C(x) = t(x) * h(x) is true, this method can hide A(x) * B(x)–C(x) this The coefficient of the polynomial.

The coefficient of the polynomial is the “knowledge” in the zero-knowledge proof. The most obvious, this coefficient contains the secret input (solution vector s) that you want to hide.

Using zk-SNARK to complete the zero-knowledge proof is to complete the work of verifier giving random points to challenge and the prover to give solutions on random points to complete the proof work. This is actually the one between the verifier and the prover that do not trust each other. In an offensive and defensive battle, their weapon is cryptography_.

Verifier encryption challenge points

The analysis needs to be verified A(x) * B(x)–C(x) = t(x) * h(x):

  • t(x) is a polynomial known to both verifier and prover, t(x) = (x-1) * (x-2) * …… * (xn);

  • A(x) * B(x)-C(x) is only known to the prover, and its coefficient contains knowledge;

  • h(x) is only known by the prover. It is calculated by dividing A(x) * B(x) – C(x) by t(x). h(x) cannot be known by the verifier, because it can be calculated by t(x) ) And h(x) to calculate A(x) * B(x) – C(x).

For the sake of clarity, first write A(x) * B(x)–C(x) as p(x), and simplify the problem to be proved to prove p(x) = t(x) * h(x).

The proof process consists of the following 3 steps:

  • verifier chooses a random challenge point x, assuming x = r;

  • After the prover receives r, it calculates p(r) and h(r), and gives these two values ​​to the verifier;

  • verifier calculates t(r), and then calculates t(r) * h(r), and judges whether t(r) * h(r) = p(r) is established, if so, the verification is passed.

This is actually the basic workflow of zk-SNARK, but there is a problem with doing so directly:

The prover knows t(x). If r is given to him, he can calculate t(r), then he can construct a pair of false p(x) and h(x) so that p(r) = t(r) * h(r), the result is that although he does not know the real p(x), he can fool the verifier.

The solution to this problem is to encrypt r, so that the prover can calculate p(r) and h(r) in a certain form, but cannot construct a false p(r) that satisfies the verification requirements through t(r) in this form. x) and h(x).

Observing the proving process of the prover, he needs to calculate the two polynomials p(x) and h(x), but no matter which one, the form is: c0+ c1*x^1+ ……+ cn*x^n; prover I know all the coefficients of c0, c1, etc.

If the verifier gives x to the prover, assuming x = s, the prover can complete all calculations; but if the verifier gives the encrypted s to the prover, the prover cannot complete the calculation because he cannot calculate s^2 with the encrypted s ,…, s^n.

So when the verifier determines the random point s, it needs to give all the encrypted s, s^2,…, s^n to the prover, so that the prover can complete the calculation.

In practical applications, verifier is to give E(s), E(s^2),…, E(s^n) to the prover, where E(s) is the encrypted value of s, E(s) = g ^s (mod n), this is a homomorphic encryption method.

A property of homomorphic encryption is that the solution calculated after encryption is the same as the solution and encrypted after calculation. In other words:

  • c0+ c1*E(s)+ ……+ cn E(s^n) = g^(c0+ c1 s^1+ ……+ cn*s^n);

  • That is, p(E(s)) = E(p(s)) = g^p(s), h(E(s)) = E(h(s)) = g^h(s);

  • Then, the prover can obtain g^p(s) and g^h(s) by calculating p(E(s)) and h(E(s)).

Here you only need to know that this kind of calculation is indeed feasible. If you want to understand the working principle of the cryptography part, you can read my previous article, ” An article to understand the simple logic behind the zero-knowledge proof “.

So now, the certification process has become the following 3 steps:

  • verifier chooses a random challenge point x, assuming x = s, and then gives the verifier a set of encrypted values ​​of s;

  • After the prover receives it, it calculates g^p(s) and g^h(s), and gives these two values ​​to the verifier;

  • verifier calculates t(s), then calculates (g^h(s))^t(s), and judges whether (g^h(s))^t(s) = g^p(s) is true, if so , The verification is passed, because it means h(s) * t(s) = p(s).

By encrypting the random point s, the prover can be prevented from cheating, but there is a loophole in the prover, that is, he does not use the challenge point s given by the verifier to calculate h(s) and t(s), but uses other values ​​to calculate and Cheating verifier.

So the verifier also needs a method that can prove that the value given by the prover is indeed calculated with s.

This article will not expand this part specifically. Simply put, in addition to choosing a random number s, the verifier also chooses a random number α. The solution given by the prover needs to maintain the relationship between s and α. If this relationship is destroyed, It proves that he did not use s in the calculation.

This method is called exponential knowledge hypothesis. If you want to understand the specific content, you can read Maksym’s article.

After verifier completed his defensive battle, it was the prover’s turn.

prover encrypted calculation result

Although the prover only gives the verifier the solution of the polynomial at the challenge point x, if the value range of the polynomial coefficient is small, it is possible for the verifier to calculate the polynomial coefficient from the solution through brute force, that is, knowledge. Therefore, the prover needs to decrypt the encryption.

The certification process becomes the following 3 steps:

  • verifier selects a random challenge point x, assuming x = s, and a random number α, and then sends a set of encrypted values ​​of s and a set of encrypted values ​​of α * s to the prover;

  • After the prover receives it, calculate g^p(s), g^h(s), g^(α*p(s)); choose a random number δ to decrypt the encryption and become (g^p(s))^ δ, (g^h(s))^δ, (g^(α*p(s)))^δ; Then give these 3 encrypted values ​​to the verifier;

  • verifier calculates t(s), then calculates ((g^h(s))^δ)^t(s), and judges ((g^h(s))^δ)^t(s) = (g^ Whether p(s))^δ holds, if it holds, it means h(s) * t(s) = p(s);

  • At the same time, the verifier also needs to verify whether ((g^p(s))^δ)^α = (g^(α*p(s)))^δ holds, if so, prove the solution given by the prover It is calculated with s.

In this way, the prover has completed his own offensive and defensive battle.

From interactive to non-interactive

The N of zk-SNARK refers to Non-Interactive, that is, non-interactive, but it is not difficult to find that in the above proof process, the verifier and the prover are required to interact, and we all know that the interactive proof itself is a zero-knowledge proof. premise. What does non-interaction mean?

Quite simply, the so-called non-interaction is just that the verifier selects the random number to be completed by the “trusted setting”, that is, a trusted third party selects a random challenge point x before the certification starts.

This change has no effect on the prover, because what he needs is always a set of encrypted values ​​related to x, regardless of whether these encrypted values ​​come from the verifier or a trusted third party. But this change has an impact on the verifier, because he knew x and could use x to calculate t(x), but now he doesn’t know it.

So from interactive to non-interactive, the most important change is to give t(x) to verifier in the trusted setting so that it can complete the verification work.

t(x) needs to be encrypted, because the prover can use the value of t(x) to cheat; but the encrypted t(x) must be used for calculation, because the verifier needs to calculate h(x) * t(x) . This is the difficulty of this part: h(x) and t(x) are both encrypted values, but the homomorphic encryption method used before does not support the multiplication of two encrypted values.

zk-SNARK uses pairing operations to solve this problem. The formula is e(g^a, g^b)= a^g * b^g=(a * b)^g, where g^a and g^b Is the encrypted value. The pairing operation can map two encrypted values ​​to their product.

The trusted setting uses this method to give t(x) to the verifier. Therefore, the proof process becomes the following 3 steps:

  • Trusted setting: A trusted third party chooses a random challenge point x, assuming x = s, and a random number α, and then gives a set of encrypted values ​​of s and a set of encrypted values ​​of α * s to the prover; then put t(s The encrypted value g^t(s) of) and the encrypted value g^α of α are given to verifier;

  • After the prover receives it, calculate g^p(s), g^h(s), g^(α*p(s)); choose a random number δ to decrypt the encryption and become (g^p(s))^ δ, (g^h(s))^δ, (g^(α*p(s)))^δ; Then give these 3 encrypted values ​​to the verifier;

  • verifier calculates and judges whether e(g^p, g) = e(g^t(s), g^h) is true. If it is true, it means h(s) * t(s) = p(s); g^p, g^h, g^p’ are the abbreviations of the three encrypted values ​​provided by the prover.

  • At the same time, the verifier also needs to verify whether e(g^p′, g) = e(g^p, g^α) is true. If it is true, prove that the solution given by the prover is calculated with s.

At this point, we have basically completed the construction of the zk-SNARK protocol, which can prove polynomials without revealing polynomial coefficients, that is, realize zero-knowledge proofs.

In addition, the encrypted value prepared for the prover and verifier in the trusted setting stage is called CRS (common reference string). The random number used to generate CRS is to be destroyed. They can be used for cheating; the trusted setting needs to be able to Trust a third party, and what mechanism should be used to select a trusted third party is an issue.

Trusted settings affect the versatility of zk-SNARK, so zk-SNARKs that do not require trusted settings have also been developed. It is of great significance but not difficult to understand: it is just another way to provide random challenge points x.

No matter how many zk-SNARKs there are, what we need to know is: zero-knowledge proof is an interactive proof system, and an important job for it, which is also related to security and resources, is how to provide Random challenge points.

Restore p(x) to A(x) * B(x)–C(x)

The polynomial that the zk-SNARK protocol needs to prove is A(x) * B(x)–C(x). If p(x) is reduced to A(x) * B(x)–C(x), compared to The main difference of the agreement at p(x) is:

  • The prover needs to provide the encrypted values ​​of A(s), B(s), and C(s) respectively;

  • verifier needs to verify A(s) * B(s) = h(s) * t(s)+ C(s);

  • When constraining the calculation of the prover (for example, it must be calculated with s), there are three different αs corresponding to A(s), B(s), and C(s); when the prover encrypts the calculation result, it is necessary There are 3 different δ encryptions A(s), B(s), C(s) respectively.

In the specific zk-SNARK protocol, some designs will be used to improve the protocol. This article will not discuss them one by one. The journey of zk-SNARK is all over here.

At the end of the journey, there are a few notes:

1. zk-SNARK has different combinations of implementation methods. This article is mainly based on the Pinocchio protocol as a clue; cryptography covers a wide range, and the article will inevitably have omissions and improper understanding. Please don’t believe it. In-depth research requires paper reference.
2. This article uses the framework of Maksym’s article in the construction of the zk-SNARK protocol part (part 2); Maksym’s article introduces zk-SNARK in an orderly manner, but it may still be useful for readers who do not have relevant background knowledge A certain degree of difficulty in understanding, which is why I wrote this article.

So, if you read this, thank you; if you are moved by the beauty of mathematics after knowing the entire realization process of zero-knowledge proof, then I am so happy.

reference:

1. Toze; On Zero Knowledge Proof: Background and Origin

2. Vitalik Buterin; ” Quadratic Arithmetic Programs: from Zero to Hero “,

3. Maksym Petkus; “Why and How zk-SNARK Works: Definitive Explanation”, https://arxiv.org/pdf/1906.07221.pdf ;

Chinese version: Learn zk-SNARK from scratch (1)-The properties and proofs of polynomials , translation: even@安比LAB, proofreading: valuka@安比LAB

4. Li Hua; understand the simple logic behind zero-knowledge proof in one article

Source link: mp.weixin.qq.com