Replicable, verifiable, validated builds (VGO)

Source: Internet
Author: User
Tags stack trace
This is a creation in Article, where the information may have evolved or changed.

This article is translated from the 5th part of reproducible, verifiable, verified Builds, Go & Versioning, Copyright @ the original.

Once the Go developers and tools share a Glossary of package versioning, it is relatively straightforward to add replicable, verifiable, and validated constructs to the toolchain. In fact, this is basically already in the VGO prototype.

Since people sometimes disagree with the exact definition of these terms, let's build some basic terminology. This article:

    • A replicable build that produces the same results when repeated builds.
    • A verifiable build that records enough information to accurately describe how to repeat it.
    • A validated build that can check if the intended source code is used.

Vgo provides a replicable build by default. The generated binaries are verifiable because they record the version of the exact source code that entered the build. And you can configure your code base so that the user rebuilds your software, verifies that their build matches your build, and uses encrypted hashes, regardless of how they get the dependencies.

Replicable (duplicated) builds

At the very least, we want to make sure that when you build my program, the build system decides to use the same Code version. Minimum version selection is delivered by default. Using only the Go.mod file is sufficient to determine which version of the module should be used (assuming dependencies are available), and even if a new version of the module is introduced into the ecosystem, This decision is also stable. This is different from most other systems, which automatically adopt a new version and require restrictions to generate a replicable build. I mentioned this in the minimum version selection article, but it's an important, subtle detail, so I'll try to repeat it briefly here.

To make this materialized, let's look at a few real packages from Rust's package manager Cargo. Obviously, I'm not picky about Cargo, I think Cargo is an example of managing the current art level, and there's a lot to learn from it. If we can get Go package management like Cargo Kind of smooth, then I am very happy. But I also think it is worth exploring whether we will benefit from choosing a different default value when choosing a version.

Cargo prefers the largest version in the following sense. When I write this article, the crates.io latest version of TOML is 0.4.5. It lists dependencies on Serde 1.0 or later; the latest serde is 1.0.27. If you start a new project and add a dependency on toml 0.4.1 or later, Cargo can make a choice. Depending on the constraints, either 0.4.1, 0.4.2, 0.4.3, 0.4.4, or 0.4.5 are acceptable. Cargo tends to use the latest acceptable version of 0.4.5 in the same circumstances as all other conditions. Likewise, from 1.0.0 to 1.0.2 Any one of the 7 Serde is acceptable, Cargo chooses 1.0.27. These choices change as new versions are introduced. If Serde 1.0.28 is released tonight and I add toml 0.4.5 to the project tomorrow, I will get 1.0.28 instead of 1.0.27. As described so far, the Cargo build is not replicable. Cargo's (perfectly reasonable) answer to this question is not only a constraint file (manifest, cargo.toml), but also a list of the exact components (artifacts) used in the build (lock file Cargo.lock). Lock Files to prevent future upgrades; Once written, even if 1.0.28 is released, your build remains on the Serde 1.0.27.

In contrast, the minimum version selection preference allows for the minimum version, which is the exact version required by some go.mod in the project. This answer will not change as new versions are added. Comparing the choices given in the Cargo example, VGo chooses Toml 0.4.1 (which you require) and then selects Serde 1.0 (toml required). These options are stable and do not have a lock file. That's what I said VGO's build defaults to the meaning of replication.

Verifiable builds

Go binaries always contain a string that represents their go version. Last year, I wrote a tool rsc.io/goversion that gets this information from a given executable or executable tree. For example, on my Ubuntu Linux laptop, I can see which system utilities are the GO implementation:

$ go get -u rsc.io/goversion$ goversion /usr/bin/usr/bin/containerd go1.8.3/usr/bin/containerd-shim go1.8.3/usr/bin/ctr go1.8.3/usr/bin/go go1.8.3/usr/bin/gofmt go1.8.3/usr/bin/kbfsfuse go1.8.3/usr/bin/kbnm go1.8.3/usr/bin/keybase go1.8.3/usr/bin/snap go1.8.3/usr/bin/snapctl go1.8.3$

Now that the VGO prototype understands the module version, it also includes this information in the final binaries, and the new goversion -m logo prints it out. Use our "Hello, World" program from the Tour:

$ go get -u rsc.io/goversion$ goversion ./hello./hello go1.10$ goversion -m hello./hello go1.10    path  github.com/you/hello    mod   github.com/you/hello  (devel)    dep   golang.org/x/text     v0.0.0-20170915032832-14c0d48ead0c    dep   rsc.io/quote          v1.5.2    dep   rsc.io/sampler        v1.3.0$

The main module Github.com/you/hello, there is no version information because it is a locally developed copy, not a specific version that we download. But if we build the command directly from a version of the module, the list will report the versions of all the modules:

$ vgo build -o hello2 rsc.io/hellovgo: resolving import "rsc.io/hello"vgo: finding rsc.io/hello (latest)vgo: adding rsc.io/hello v1.0.0vgo: finding rsc.io/hello v1.0.0vgo: finding rsc.io/quote v1.5.1vgo: downloading rsc.io/hello v1.0.0$ goversion -m ./hello2./hello2 go1.10    path  rsc.io/hello    mod   rsc.io/hello       v1.0.0    dep   golang.org/x/text  v0.0.0-20170915032832-14c0d48ead0c    dep   rsc.io/quote       v1.5.2    dep   rsc.io/sampler     v1.3.0$

When we integrate the version into the Go toolchain, we will add the API to access this information from the runtime library, as if [runtime.Version](https://golang.org/pkg/runtime/#Version) providing access to the restricted Go version information.

To try to refactor the binaries, the goversion -m information listed is sufficient: Put the version into the Go.mod file and build the target named on the path line. But if the result is not the same binary file, you may want to know how to narrow the different methods. What changed?

When VGo downloads each module, it calculates the hash value of the file tree corresponding to the module. The hash is also included in the binary, with version information, and goversion -mh it can be printed out:

  $ goversion-mh./hellohello go1.10 path Github.com/you/hello mod Github.com/you/hello (devel) DEP Golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c h1:qgoy6wgzoatkiimivjbqcw93erbe4m30ibm00nkl0i8= DEP RSC. Io/quote v1.5.2 h1:w5fcysjrx7yqtd/ao+qwrjyzoknam9uh2b40telts3y= DEP rsc.io/samp Ler v1.3.1 h1:f0c3j2nqcdk9odsnhu3selnvpixm/xv1c/qzuaezmac=$ goversion-mh./hello2hell o go1.10 path rsc.io/hello mod Rsc.io/hello v1.0.0 h1:cdmhdoarcor1wuruvme46pk9 1ahrsoejqicbf7fa56u= DEP Golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c h1:qgoy6wgzoatkiimivjbqcw93erbe4m30i bm00nkl0i8= DEP rsc.io/quote v1.5.2 h1:w5fcysjrx7yqtd/ao+qwrjyzoknam9uh2b40telts3y = Dep Rsc.io/sampler v1.3.0 h1:7uvkifmebqhfdjd+gzwtxxi+rodj2wc4o7mpeh/qiw4=$  /pre>

h1:The prefix indicates which hash is being reported. Today, there is only "hash 1", the SHA-256 hash of the file list and its SHA-256 hash of its contents. If we need to update a new hash later, this prefix will help us to tell the old one from the newest hash.

I must emphasize that these hashes are self-reported by the build system. If someone gives you a binary file with a specific hash value in the build information, it is not guaranteed to be accurate. They are very useful information that supports verification later, rather than a signature that you can trust.

Validated builds

Authors who publish programs in source code may want to let users verify that they are building it using the expected dependencies. We know that VGO will make the same decision as which version to use, but there are still problems with mapping v1.5.2 and other versions to the actual source tree. If the author of v1.5.2 will label (tag To point to a different file tree, what do I do? What if the malicious middleware intercepts the download request and provides a different zip file? What if the user accidentally edits the source file in the local copy of v1.5.2? This validation is also supported by the VGO prototype.

The final form may be different, but if you create a file named Go.modverify next to Go.mod, the build will keep the file up-to-date with a known hash of the specific version of the module:

$ echo >go.modverify$ vgo build$ tcat go.modverify  # go get rsc.io/tcat, or use catgolang.org/x/text  v0.0.0-20170915032832-14c0d48ead0c  h1:qgOY6WgZOaTkIIMiVjBQcw93ERBE4m30iBm00nkL0i8=rsc.io/quote       v1.5.2                              h1:w5fcysjrx7yqtD/aO+QwRjYZOKnaM9Uh2b40tElTs3Y=rsc.io/sampler     v1.3.0                              h1:7uVkIFmeBqHfdjD+gZwtXXI+RODJ2Wc4O7MPEh/QiW4=$

The go.modverify file is a hash log of all the versions that have ever been encountered: Only rows are added, not deleted. If we rsc.io/sampler update to v1.3.1, the log will now contain two versions of the hash value:

$ vgo get rsc.io/[email protected]$ tcat go.modverifygolang.org/x/text  v0.0.0-20170915032832-14c0d48ead0c  h1:qgOY6WgZOaTkIIMiVjBQcw93ERBE4m30iBm00nkL0i8=rsc.io/quote       v1.5.2                              h1:w5fcysjrx7yqtD/aO+QwRjYZOKnaM9Uh2b40tElTs3Y=rsc.io/sampler     v1.3.0                              h1:7uVkIFmeBqHfdjD+gZwtXXI+RODJ2Wc4O7MPEh/QiW4=rsc.io/sampler     v1.3.1                              h1:F0c3J2nQCdk9ODsNhU3sElnvPIxM/xV1c/qZuAeZmac=$

When go.modverify exists, VGO checks to see if all the download modules used in the given build match the entries already in the file. For example, if we rsc.io/quote change the first number of hashes from W to V:

$ vgo buildvgo: verifying rsc.io/quote v1.5.2: module hash mismatch    downloaded:   h1:w5fcysjrx7yqtD/aO+QwRjYZOKnaM9Uh2b40tElTs3Y=    go.modverify: h1:v5fcysjrx7yqtD/aO+QwRjYZOKnaM9Uh2b40tElTs3Y=$

Or suppose we fixed that, but modified the v1.3.0 hash. Now our build succeeds because the build does not use v1.3.0, so its rows are (rightly) ignored. However, if we try to downgrade to v1.3.0, the build validation will fail:

$ vgo build$ vgo get rsc.io/[email protected]vgo: verifying rsc.io/sampler v1.3.0: module hash mismatch    downloaded:   h1:7uVkIFmeBqHfdjD+gZwtXXI+RODJ2Wc4O7MPEh/QiW4=    go.modverify: h1:8uVkIFmeBqHfdjD+gZwtXXI+RODJ2Wc4O7MPEh/QiW4=$

Developers who want to make sure that others rebuild their programs with the same source as themselves can store go.modverify in their code base. Then other builds use the same code base to automatically get the validation build. Currently, only the go.modverify in the top-level module that is built is applicable. Note, however, that go.modverify lists all dependencies, including indirect dependencies, so the entire build will be validated.

The Go.modverify feature helps to detect mismatches between dependencies on different machine downloads. It compares the hash value in go.modverify and the hash value that is computed and saved when the module is downloaded. You can also check whether the downloaded module has changed on the local machine. This is not a security attack, it's more about avoiding errors. For example, because the source file path appears in the stack trace, it is common to open these files during debugging. If you are in the middle of debugging (or I think intentionally) modifying the file, it will be good to detect it later. vgo verify command to perform this operation:

$ go get -u golang.org/x/vgo  # fixed a bug, sorry! :-)$ vgo verifyall modules verified$

If the source file changes, you vgo verify are notified:

$ echo >>$GOPATH/src/v/rsc.io/[email protected]/quote.go$ vgo verifyrsc.io/quote v1.5.2: dir has been modified (/Users/rsc/src/v/rsc.io/[email protected])$

If we recover the files, everything is fine:

$ gofmt -w $GOPATH/src/v/rsc.io/[email protected]/quote.go$ vgo verifyall modules verified$

If the cached zip file is modified after the download, vgo verify it will also be notified, although I cannot reasonably explain what might happen:

$ zip $GOPATH/src/v/cache/rsc.io/quote/@v/v1.5.2.zip /etc/resolv.conf  adding: etc/resolv.conf (deflated 36%)$ vgo verifyrsc.io/quote v1.5.2: zip has been modified (/Users/rsc/src/v/cache/rsc.io/quote/@v/v1.5.2.zip)$

Because VGo retains the original zip file after extracting it, you vgo verify can even print the difference if you determine that only one of the zip files and the directory tree has been modified.

What's next?

This is already implemented in VGO. You can try it and use it. As with the rest of vgo, be thankful for feedback that doesn't work properly (or works well).

The features shown here are more about the beginning of something than a finished function. The encrypted hash of the file tree is a building block. Go.modverify on top of it checks if the developer uses the exact same dependencies to build a particular module, but does not validate when the new version of the module is downloaded (unless someone else has added it to go.modverify). The expected hash value is not shared between the modules.

The exact details of how to solve these two shortcomings are not obvious. It is meaningful to allow the encryption signature of some type of file tree, and to verify that the upgrade discovers the same version as the previous version. Alternatively, it would make sense to adopt a method in the update framework (TUF), although it is impractical to use their network protocols directly. Or, do not use each code base Go.modverify logs, establishing a shared global log might make sense, sort of like certificate transparency, or use a common identity server like Upspin. We may explore many ways, but these are a little bit ahead of us. For now, our focus is on successfully integrating version control into the Go command.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.