The easiest way to cheat on your math homework is to find the answers somewhere, such as the internet or an answer key. The best way for a teacher to prevent this kind of cheating is to issue the widely dreaded requirement to “show your work.”
Showing the work usually reveals whether that work is legitimate and original, but it’s a burden: the teacher must now review every student’s every line of work. This may be a burden some teachers are willing to accept, but what if, instead of the answer to a math problem, one were trying to prove the veracity of satellite imagery (which could have been intercepted and faked)? Or the proper manufacture and assembly of parts for a nuclear weapon (which could have been subtly sabotaged)? Or the output from a computer network that controls critical infrastructure (which could have been hacked)? Or the accuracy of scientific experiments, data, and analysis (which could have been biased)?
In such cases, evidence of tampering, or some other manner of inauthenticity, may be expertly hidden in one small corner of the greater whole. And even if a “show your work” approach were sufficient in theory, finding one incongruous detail in the endless terabytes needed to prove the validity of such complex information would be a nearly hopeless task. Moreover, it would require actually showing—sharing, transmitting, publishing—“your work,” thus putting sensitive information at even greater risk.
Is it possible to prove, even to yourself, that you know some piece of information without personally witnessing its entire detailed history? Is it possible to prove to someone else that you know some piece of information without ever revealing the information itself—or, indeed, without revealing anything at all, other than the fact that you know it?
Trust me—or better yet, don’t
The solution to this rather esoteric challenge is something called a “zero-knowledge proof” (ZKP). It’s a technique that cryptographically encodes a small, robust, random sample of what “showing your work” would entail—evidence of a spot check, of sorts, but without including any of the actual work—together with a clever mathematical manipulation designed to ensure that any data tampering, no matter how small or well hidden, is overwhelmingly likely to get picked up.
“The math of it has already been worked out,” says Michael Dixon, a scientist in Los Alamos’s Advanced Research in Cyber Systems group. “The biggest challenges today are scalability and integration for real-world application, which require really understanding precisely what you are trying to prove in the first place.”
Dixon likes to boil it down to its fundamentals.
“Remember those ‘Where’s Waldo?’ books?” Dixon asks. “The ones that show intricate cartoon scenes with tons of characters, and you’re supposed to find one particular character called Waldo? Well, suppose five kids are looking for Waldo, and it’s a race—who will find him first? Who will find him second? But even if you manage to find him in that complicated illustrated jumble, you can’t just say you found him; you have to prove it somehow. Yet how can you prove it to the other kids without giving away his location (say, by pointing) or any information that might help them find him (such as ‘upper-left side of the page’ or ‘near the circus elephant’)?”
Zero-knowledge proofs are trustless yet trustworthy.
There are multiple ways. One way, for example, is to create an answer key: write down Waldo’s location and share it only after everyone has found him. This type of solution works if the information loses its sensitivity or importance after a short amount of time—once the game ends, in this case. There may be a limited set of real-world situations in which that sort of solution would suffice—proving some information was previously known by revealing it later—but it wouldn’t be the majority of them.
Perhaps a better way would be to involve a third party, someone all the kids trust—a parent maybe—and then one-by-one whisper Waldo’s location into the parent’s ear. Each time, the parent affirms that a particular kid has found Waldo. But deep in the information age, when the task at hand is not a game and the data to be verified is more complicated or sensitive than the location of a cartoon character in a striped shirt and a beanie hat, finding a suitable “parent” to trust is not always a simple matter. It turns out that the legitimate authorities we rely on for this task now may not be entirely trustworthy, despite their best intentions. In recent years, hacking attacks have seen vast quantities of sensitive personal data stolen from large retail companies, a major credit agency, and even a key federal office dedicated to national security.
“We want to shift the nature of the task,” says Dixon. “Instead of vetting particular organizations to verify their trustworthiness—federal agencies, banks, defense contractors, and so on—we assume everyone is equal. We don’t want to have to trust anyone—or suspect anyone, for that matter.” He explains that it’s a little like using a notary, only without the notary. A software-based “stamp” of authenticity proves that the information is valid without revealing the information itself, just as a notary stamp proves a document was legally signed by the proper people without attaching their photo IDs. The difference lies in not having to license and trust a human notary, who could be impersonated, bribed, or simply distracted. It’s trustless yet trustworthy.
It’s not what you know, it’s that you know
A proof could take the form of an entire “show your work” explanation. (“I started looking for Waldo in the upper-left corner and worked from left to right in one-inch-tall rows, pausing every time I saw a pattern of red and white; when the first row turned up nothing, I started on the second row…”) It could also require an interactive interrogation in which the party requiring proof poses a series of ancillary challenge questions intended to check if the prover really did the work (“What kind of animals did you notice in your search for Waldo? Were there characters wearing red-and-white stripes who were not Waldo? Were there any children in the lower half of the scene?”). But this extra information is undesirable. It requires the transmission and examination of large amounts of data or back-and-forth interaction (either of which may be impossible if the information being discussed is classified, for example).
A better solution is based on an important result in computational complexity theory known as the PCP theorem (“probabilistically checkable proof”). The theorem essentially states that if you have a mathematical proof of a particular length, it is possible to reconfigure that proof into a longer one that can be verified, with great accuracy, using only a random sampling of a very small subset of that longer-form proof. That’s the spot check. The resulting cryptographic proof is something called a zk-SNARK: a zero-knowledge succinct non-interactive argument of knowledge. Here, the terms “succinct” and “non-interactive” eliminate the cumbersome varieties of ZKP. Succinct, in absolute file size, is typically only a few hundred kilobytes, or about one-tenth the size of a typical QR code. Non-interactive means one-way communication: the zk-SNARK is provided, and that’s that; no interrogation is necessary.
It’s difficult to overstate the public good that will ultimately come from this technology.
“In practice, the zk-SNARK is a form of encrypted data that hides where the proof was spot-checked,” Dixon says. “If an adversary intercepted it and managed to break the encryption, then he or she could create fake proofs—not ideal, obviously—but nothing more is at stake. It really is zero-knowledge and contains no sensitive content. It simply proves that the sensitive content—whatever it is and wherever it resides—is legit.”
With zk-SNARKs, a complicated data product can be passed along between numerous parties—even explicitly distrusted parties—before it reaches its final recipient, and yet that recipient can be assured of the data’s veracity in the end. Dixon refers to this as a “chain of provenance”: at each stage of working with a data product, a new zk-SNARK combines the verification associated with the current stage with the previous zk-SNARK, which verified all prior stages, confirming that the information was processed properly by all the appropriate computational procedures and not touched by anything else.
For example, one possible application of this technology is to verify remotely that a uranium enrichment facility is producing uranium suitable for power production. To do this, sensors at the facility might collect data that carries a cryptographic signature from the sensor itself. The signed sensor data could then be combined with enrichment claims made by the facility and fed into an algorithm that combines results from different sensors in order to confirm that they correspond to the specified enrichment level. That algorithm could output a zk-SNARK to be presented to international regulators. At every point, a single zk-SNARK is generated and passed forward, confirming everything that happened prior: the sensor data is valid, the correct algorithm analyzed that exact data, and no one tampered with the results. The final zk-SNARK doesn’t contain any sensitive information—no actual sensor data, no algorithms, not even the enrichment levels themselves, necessarily; rather, it simply attests that the policy-compliant enrichment-level claims made by the facility operator are confirmed (or not).
Private information, public security
From a national security perspective, ZKPs have immediate value. The U.S. military, for example, needs to verify that the complex defense systems it purchases from large numbers of private-sector contractors and subcontractors have been built precisely according to approved specifications. They must have properly sourced parts and be correctly assembled, carefully transported, and faithfully deployed—all succinctly and convincingly verified at every step of the way without having to take the whole thing apart or, in the case of nuclear weapons, perform a weapons test, which is inhibited by international treaty. (Similarly, complex civilian products involving foreign or domestic private-sector subcontractors can be reliably tracked and verified by sharing only zk-SNARKs and thereby not sharing any proprietary information.)
The military needs to convince not only itself, but also its adversaries, that it has robust and reliable defense capabilities. There may be times, for example, when the United States may wish to outwardly demonstrate and prove that it has the ability to defend itself against a particular class of cyber weapon, such as one designed to disrupt electrical power or other infrastructure systems. In such a case, it is important to trustlessly prove that the defensive capability is real—without revealing how it works or how it can be circumvented.
In addition, numerous important applications today, whether for defense or private industry, require machine learning: computers or vast networks of computers that learn how to do a task via experience and goal-seeking, just as human beings learn tasks. Want to teach a computer how to drive a car? Show it thousands of photos containing intersections, street signs, other vehicles, and pedestrians; show it thousands of videos of merging and parking, what to do and what not to do. Then let it attempt to replicate good driving in a simulation until it masters the task. Want to teach a computer to recognize faces? Show it millions of faces from every possible angle and in all sorts of lighting. Then allow the algorithm to self-adjust with a set of parameters that leads to correct identifications.
Dixon and his student, Zachary DeStefano, who is entering a doctoral program in computer security, set out to secure output from machine-learning systems called neural networks. Neural networks are complex algorithms that use many nodes, each with parameters that must be learned through experience, to simulate processing by neurons. Dixon and DeStefano’s solution, unsurprisingly, involves ZKPs.
“The most powerful neural-network models are based on datasets much larger than what any one organization can house or efficiently collect,” says DeStefano. “We live in an era of cloud computing, and neural-network computations are often subcontracted out. To validate critical results—or just to make sure you got what you paid for—you need a succinct, non-interactive way to prove that the correct neural network, set up exactly as you specified, processed the inputs you provided and no others.”
Last year, Dixon and DeStefano wrote a software program to do exactly that. It produces a zk-SNARK to validate both the inputs and the specific network execution for a system trained to recognize handwritten numbers. Without supplying a massively cumbersome log of all the scribbles in the dataset, of every single value of every variable the neural network adjusted, of what outputs the countless iterations produced, or of how the network learned to improve its accuracy, the proper processing was nonetheless successfully confirmed.
But with neural networks, there’s another concern: their training. A ZKP is needed to account for the authenticity of the training set originally supplied to the neural network, which, often, the customer has no part in. Dixon and DeStefano are working on that now.
For example, imagine if, at the outset of the COVID-19 pandemic, a neural network were set up to analyze medical data from millions of patients from all over the country to determine if their lab tests and scans indicated COVID-19 and to assess which treatments proved most effective on which patients. The trouble is, medical data are confidential. How could the neural network be trained with huge quantities of valid patient data if the data can’t be shared outside of the hospitals, labs, imaging centers, and doctors’ offices where they were obtained? Encryption is a limited approach, since the data still move over the wire; a better approach involves transmitting only the ZKP-verified information necessary to train the neural network and not—true to zero-knowledge principles—the actual patient data.
Any data tampering, no matter how small or well hidden, is overwhelmingly likely to show up.
“We’re working to help a distributed system for crowd-sourced training data audit itself,” Dixon says. “The idea is to delegate neural-network training to the edge nodes, such as hospitals, and require them to furnish proofs that they did their jobs correctly.” The neural network still seeks to improve its parameters through iteration, but instead of asking edge nodes for patient data explicitly, it only asks in what direction and by how much to shift the current-iteration parameters to achieve an improvement. Earlier research proved that the edge nodes can perform that computation, and Dixon and DeStefano are now demonstrating that zk-SNARKs can verify that the edge nodes did (or didn’t do) exactly what they were supposed to. The result is a perfectly reliable neural network computation and a hugely valuable medical resource constructed from enormous quantities of verified patient data that were never actually shared. And by virtue of the zk-SNARKs, the system automatically recognizes and eliminates training information derived from faulty patient data, whether the error comes from a mistake or deliberate sabotage.
“With many commercial technologies, there’s this classic tradeoff between security and privacy,” says Dixon. “But with ZKP technology, it’s no longer either-or. We can have the best of both worlds.”
Eventually, Dixon hopes to see specific hardware designed to produce inherently self-verified data for neural-network training and general computation. It will ultimately be possible to use quantum effects to verify electronic recordings of all kinds (such as from x-rays or blood tests, in the COVID-diagnostic example). The data would be quantum mechanically guaranteed and cryptographically certified upon creation, with subsequent zk-SNARKs generated as needed to verify the proper processing and the absence of tampering.
Better informed with zero knowledge
“Where we are now—verifying inputs and execution, securing data sources and supply chains—is just the beginning,” says Dixon. “In a world concerned with massive-scale misinformation, it’s difficult to overstate the public good that will ultimately come from this technology.”
For example, Dixon sees many opportunities to improve the election security infrastructure that supports the American democracy. Since 2016, the threat of external interference in American elections has been thrust into the national consciousness. And a major channel for such activity, potentially misleading information shared on social media, while problematic already, is about to get much harder to control with the proliferation of deliberately deceptive but visually convincing videos known as deepfakes. These might show prominent politicians saying embarrassing things they never actually said, yet in terms of video quality, they are virtually impossible to distinguish from actual recordings.
But at its heart, the deepfakes problem is really a chain-of-provenance problem, which is exactly what ZKP technology addresses. A video file’s history needs to be verified—when, where, and by whom was the video recorded? What level of video editing has taken place since—color adjustments only, or splicing in additional footage? It’s unreasonable to track all this information in detail for every video on the internet, but rather than “showing your work,” a zk-SNARK would be sufficient to verify that the video’s veracity has been confirmed. Web browsers, perhaps, could implement a content filter based on valid ZKP stamps (or at least display a warning message when the stamps are missing or invalid). And if all of that sounds overly technical, not to worry: Dixon is always ready and willing to break it down.
“Our democracy is vulnerable to disinformation shared online,” he says. “But hopefully soon, a straightforward mathematical operation, carried in a tiny computer file, will counter that vulnerability.”
Digital information, national security, financial transactions, critical supply chains, facial recognition, scientific data, infrastructure controls, and democracy itself: across the board, zero-knowledge proofs can make us stronger.
Now you know. LDRD
Blockchain without the blockchain
In the civilian regime, perhaps the best-known example of a verification system without a central trusted authority is cryptocurrency, such as Bitcoin. Cryptocurrencies are designed to be exceptionally secure against the vulnerabilities of trust-based systems: bank errors or insolvency, identity theft, inflation from “printing money,” or having assets frozen because of austerity measures or suspicion of a crime. Such security, however, comes at a price.
Different cryptocurrencies use different protocols, but they usually involve a blockchain: a list of every transaction ever processed since the cryptocurrency was invented. Maintaining the blockchain across a trustless network of computers has its challenges. In some cases, a copy of the entire blockchain is located, unencrypted, on every computer in the network—a clear strike against the privacy of transactions. And adding new transactions to the existing blockchain might require each computer in the network to grind away on some deliberately difficult math problem in order to prove the validity of its work, with new transactions affirmed only when the overall network of computers achieves a consensus answer. This represents a tremendous increase in system-wide information-processing requirements relative to a traditional trust-based system. Without ZKPs, blockchains have a steep tradeoff between verifiability and privacy.
In principle, however, a system using zk-SNARKs—and several are in various stages of development—eliminates these kinds of difficulties. The system is still trustless, but to become so, it does not sacrifice privacy, amplify information processing, or require consensus across a global network. Instead, each new transaction or piece of information is convolved with the previous zk-SNARK to create a new zk-SNARK, and that file, about one-tenth the size of a typical selfie, affirms that everything is on the up and up.