The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
When deleting files of previous layers, the whiteout files
were not added to the tar file in a consistent order.
This change adds a stable sorting to the whiteout files and
adds unit tests to check for stable sorting.
During a snapshot, when a file changed and not its parent directories,
the parent directories weren't added to the layer. This is inconsistent
with Docker's behavior which always add parent directories to the layer.
In some edge-cases, it could lead to problems with docker considering
that parent directories where owned by root in forthcoming layers
although they shouldn't (see #1163).
Also, Docker seems to be POSIX compliant regarding the name of
directories in the archive, which always have a slash appended. This
commit also fixes this.
Fixes#1163
When using cache, the rootfs may not have been extracted. This prevents uname/gname from resolving
as there is no /etc/password or /etc/group. This makes this layer unnecessarily differ from
a cached layer which does contain this information. Omitting these should be consistent with Docker's
behavior.
filesToAdd is sorted in TakeSnapshotFS, but not here. This makes ordering unpredictable within the layer's tarball,
causing the SHA to differ even if layer contents haven't changed
When a Dockerfile command requires using the TakeSnapshotFS function,
the resulting layer has a random ordering of files. This causes the
layer to have a non-deterministic hash defeating the reproducible flag.
Issue #710 appears to document this issue as well.
To fix, always sort the list of files to be added in scanFullFilesystem.
This avoids trying to sort the file list during execution, and takes
almost no time to complete.
* Add parent directories of adding files
* Add integration Dockerfile to test parent directory permissions
* Remove unnecessary helper method
* Use a file on the internet for integration Dockerfile
When building Docker images, layers were previously stored in memory.
This caused obvious issues when manipulating large layers, which could
cause Kaniko to crash.
This will return a string representaiton of the current filesystem to be
used with caching.
Whenever a file is explictly added (via ADD or COPY), it will be stored
in "added" in the LayeredMap. The file will map to a hash created by
CacheHasher (which doesn't take into account mtime, since that will be
different with every build, making the cache useless)
Key() will returns a sha of the added files which will be used in
determining the overall cache key for a command.
This test had previously (before #231) been making a change to a file in
the kaniko dir, then checking that it isn't being snapshotted. This was
to test the whitelisting logic, which makes sure that changes to /kaniko
aren't included in images. However the test creates a temporary dir, so
the kaniko dir is actually in /tmp/<some temp dir>/kaniko, and
in #231 the logic was simplified to no longer have a special case for
tests. The test continued to pass because `MaybeAdd` noticed that the
kaniko file wasn't changing, and didn't add it. After changing this to
always add the files, it revealed that this was left behind by accident.
I also opened #307 to add integration test coverage for this logic.
I also marked `CheckErrorAndDeepEqual` as a helper function so that when
it fails, the line number reported is where that was called.
Kaniko uses mtime (as well as file contents and other attributes) to
determine if files have changed. COPY and ADD commands should _always_
update the mtime, because they actually overwrite the files. However it
turns out that the mtime can lag, so kaniko would sometimes add a new
layer when using COPY or ADD on a file, and sometimes would not. This
leads to a non-deterministic number of layers.
To fix this, we have updated the kaniko commands to be more
authoritative in declaring when they have changed a file (e.g. WORKDIR
will now only create the directory when it doesn't exist) and we will
trust those files and _always_ add them, instead of only adding them if
they haven't changed.
It is possible for RUN commands to also change the filesystem, in which
case kaniko has no choice but to look at the filesystem to determine
what has changed. For this case we have added a call to `sync` however
we still cannot guarantee that sometimes the mtime will not lag, causing the
number of layers to be non-deterministic. However when I tried to cause
this behaviour with the RUN command, I couldn't.
This changes the snapshotting logic a bit; before this change, the last
command of the last stage in a Dockerfile would always scan the whole
file system and ignore the files returned by the kaniko command. Instead
we will now trust those files and assume that the snapshotting
performed by previous commands will be adequate.
Docker itself seems to rely on the storage driver to determine when
files have changed and so doesn't have to deal with these problems
directly.
An alternative implementation would use `inotify` to track which files
have changed. However that would mean watching every file in the
filesystem, and adding new watches as files are added. Not only is there
a limit on the number of files that can be watched, but according to the
man pages a) this can take a significant amount of time b) there is
complication around when events arrive (e.g. by the time they arrive,
the files may have changed) and lastly c) events can be lost, which
would mean we'd run into this non-deterministic behaviour again anyway.
Fixes#251