Commit Graph

117 Commits

Author SHA1 Message Date
Cole Wippern 2755ae4470 Final cachekey for stage
Store the last cachekey generated for each stage
If the base image for a stage is present in the map of digest
and cachekeys use the retrieved cachekey instead of the base image
digest in the compositecache
2019-11-27 14:40:05 -08:00
Victor Noel db12a77e6c Fix #776 2019-10-03 17:53:14 +02:00
Tejal Desai 469fdaa50d test 2019-09-13 11:49:30 -07:00
Tejal Desai f0e571839d add unit tests 2019-09-13 11:21:43 -07:00
Prashant 0158cbf70c Setting PATH for empty image as well 2019-09-04 17:23:59 +05:30
Prashant 17d1059ec4 Setting PATH to default PATH if PATH is missing 2019-09-04 16:44:55 +05:30
Prashant 2c44539151 Setting PATH 2019-08-30 13:33:36 +05:30
Deniz Zoeteman c425f02866 Reverted not including build args in cache key 2019-08-16 15:09:52 +02:00
Nao YONASHIRO 75fdad7319 chore: fix typo 2019-05-17 03:17:08 +09:00
Johannes 'fish' Ziemke 8c732f6f52 Fix kaniko caching (#639)
* Revert "Change cache key calculation to be more reproducible. (#525)"

This reverts commit 1ffae47fdd.

* Add logging of composition key back

* Do not include build args in cache key

This should be save, given that the commands will have the args included
when the cache key gets built.
2019-05-10 09:57:03 -05:00
dlorenc c8fabdf6e4
Fix arg handling for multi-stage images in COPY instructions. (#621) 2019-03-22 12:24:43 -05:00
dlorenc 246cc92a33
Optimize file copying and stage saving between stages. (#605)
This change calculates the exact files and directories needed between
stages used in the COPY command. Instead of saving the entire
stage as a tarball, we now save only the necessary files.
2019-03-13 07:47:28 -07:00
dlorenc 378a3f9573
Look for manifests in the local cache next to the full images. (#570)
Calculating a manifest from a v1.tarball is very expensive. We can
store those locally as well, and use them if they exist.

This should eventually be replaced with oci layout support once that exists
in ggcr.
2019-02-19 13:54:41 -06:00
dlorenc 8179c47f0d
Refactor the build loop to fetch stagebuilders earlier. (#558)
This will help with optimizations.
2019-02-12 20:43:22 -06:00
dlorenc 9047ccf7cc
This fixes a bug in the interaction between volumes in base images (#555)
and our snapshot optimizations.

If a previous base image has a volume, the directory is added to the
list of files to snapshot. That directory may not actually exist in the image.
2019-02-08 14:40:37 -06:00
dlorenc c2514305ef
Fix a bug in snapshotting with multi-stage images. (#546)
We previously had an optimization that would skip snapshotting mutli-stage images
when in an intermediate stage, until the very end.

This conflicted with another optimization to avoid snapshotting when no files had changed.
2019-01-30 13:57:02 -06:00
Daisuke Taniwaki f8f59ea4c6 Add insecure-registry and tls-skip-verify-registry flags (#537) 2019-01-29 13:29:47 -06:00
dlorenc 1ffae47fdd
Change cache key calculation to be more reproducible. (#525)
Before we were using the full image digest, but that contains a timestamp. Now
we only use the layers themselves and the image config (env vars, etc.).

Also fix a bug in unpacking the layers themselves. mtimes can change during unpacking,
so set them all once at the end.
2019-01-23 13:46:12 -06:00
dlorenc fcd1976d3b
Make the Digest calculation faster for locally-cached images. (#534)
Right now when we find a v1.Tarball in the local disk cache, we
recompute the digest. This is very expensive and redundant, because
we store tarballs by their digest and use that as a key to look them up.
2019-01-22 13:28:21 -06:00
dlorenc 170e0a2d94
Add a lot more timing data. (#518) 2019-01-10 13:27:55 -07:00
dlorenc 2255837142
Tighten up the timing around Dockerfile commands. (#514)
Right now this timing also includes the snapshot time.
2019-01-09 10:34:23 -08:00
dlorenc 5f6fbfe74f
Add support for timing data in JSON format. (#510) 2019-01-08 17:24:47 -08:00
MMeent e3bb8bc71a Adds COPY --from=previous stage name/number validation (#491)
* Adds COPY --from=previous stage name/number validation

This fixes an issue in which COPY --from=previous-stage-name would try to download docker image previous-stage-name instead of checking that previous-stage-name could be a named stage.

* Fix linting issues

goimports is implemented as 'gofmt + extras', so this should fix import warnings as well.

* Fix linting issues

Fixes linting issues introduced in the merge

* Fix linting issues.
2019-01-02 11:42:36 -06:00
dlorenc 8ced0930f4
Add more benchmarks (#487) 2018-12-17 10:23:40 -06:00
Priya Wadhwa a34ba5c233 Fixed merge conflict 2018-12-11 13:53:19 -08:00
Priya Wadhwa 7fd164deab Only parse .dockerignore once 2018-12-11 13:31:51 -08:00
Andrew Rynhard 01329d5ac1 Fix intermediate layer caching (#474)
* Fix intermediate layer caching

* Move the if statement into the ShouldTakeSnapshot function.

Also add some unit tests.
2018-12-10 11:34:06 -08:00
Sharif Elgamal 7f9ea39bf7
Avoid the cachedImage/remoteImage call loop (#483)
* Avoid the cachedImage/remoteImage call loop

* missed one function

* fix unit tests

* proper bool comparison
2018-12-10 10:11:05 -08:00
dlorenc 7611ea7a1d
Add support for COPY --from=<an unrelated image>. (#479)
Right now kaniko only supports COPY --from=<another stage>.
This commit adds support for the case where the referenced image is a remote image
in a registry that has not been used as a stage yet in the build.
2018-12-06 12:44:03 -06:00
Sharif Elgamal 7cde036f44
Add benchmarking code (#448)
* adding benchmarking code

* enable writing to file

* fix build

* time more stuff

* adding benchmarking to integration tests

* compare docker and kaniko times in integration tests

* Switch to setting benchmark file with an env var

* close file at the right time

* fix integration test with environment variables

* fix integration tests

* Adding benchmarking documentation to DEVELOPEMENT.md

* human readable benchmarking steps
2018-11-28 11:54:12 -08:00
Priya Wadhwa 5df363a0f6 Check if command is nil before optimizing
MAINTAINER returns nil since it's deprecated, so we should make sure we
don't add to the list of commands to optimize.
2018-11-13 10:12:03 -08:00
dlorenc 8408c53aa8
Improve cache layer uploads. (#443)
This change only uploads layers that were created from cache misses on RUN commands.
It also improves the cache-checking logic to handle this case.
Finally, it makes cache layer uploads happen in parallel with the rest of the build, logging
a warning if any fail.
2018-11-12 16:22:04 -06:00
dlorenc 063663e17b
Skip unpacking the base FS if there are no run commands (or only cached ones). (#440)
This is the final part of an optimization that I've been refactoring towards for awhile.
If the Dockerfile consists of no RUN commands, or cached RUN commands, followed by metadata-only
operations, we can skip downloading and unpacking the base image.
2018-11-12 12:51:45 -06:00
dlorenc 58b607b4d0
Fix caching for multi-step builds. (#441)
This change fixes that by properly "replaying" the Dockerfile and mutating the config when
calculating cache keys. Previously we were looking at the wrong cache key for each command
when there was more than one.
2018-11-09 12:28:18 -06:00
Sharif Elgamal 224b7e2b41
parse arg commands at the top of dockerfiles (#404)
* parse arg commands at the top of dockerfiles

* fix pointer reference bug and remove debugging

* fixing tests

* account for meta args with no value

* don't take fs snapshot if / is the only changed path

* move metaArgs inside KanikoStage

* removing unused property

* check for any directory instead of just /

* remove unnecessary check
2018-11-06 15:27:09 -08:00
dlorenc fc43e218f0
Buffer layers to disk as they are created. (#428)
When building Docker images, layers were previously stored in memory.
This caused obvious issues when manipulating large layers, which could
cause Kaniko to crash.
2018-11-06 09:26:54 -06:00
dlorenc 52a6ce6685
More cache cleanups: (#397)
- move the layer cache to an interface
- refactor the DockerCommand implementations to support Cached and non-cached implementations.
2018-11-01 09:11:21 -07:00
dlorenc 5ac29a9773
Use only the necessary files in the cache keys. (#387) 2018-10-15 08:56:34 -05:00
Sharif Elgamal effac9dfc3
Persistent volume caching for base images (#383)
* comments

* initial commit for persisent volume caching

* cache warmer works

* general cleanup

* adding some debugging

* adding missing files

* Fixing up cache retrieval and cleanup

* fix tests

* removing auth since we only cache public images

* simplifying the caching logic

* fixing logic

* adding volume cache to integration tests. remove auth from cache warmer image.

* add building warmer to integration-test

* move sample yaml files to examples dir

* small test fix
2018-10-11 13:38:05 -07:00
dlorenc 9a0e29c441
Refactor the build loop. (#385)
This change refactors the build loop a bit to make cache optimization easier in the future. Some notable changes:

The special casing around volume snapshots is removed. Every volume is added to the snapshotFiles list for every command that will snapshot anyway.
Snapshot saving was extracted to a sub-function
The decision on whether or not to snapshot was extracted
2018-10-09 12:15:17 -05:00
dlorenc 734ffe65ce
Rework cache key generation a bit. (#375)
* Rework cache key generation a bit.

Cache keys are now based on the previous commands, rather than the previous state
of the filesystem.

* Refactor command interface a bit, only cache the context for commands that use it.
2018-10-03 16:16:12 -05:00
priyawadhwa 8f0d257134
Merge pull request #334 from peter-evans/fix-volume-cmd
Fix handling of the volume directive
2018-10-01 14:49:33 -07:00
Jason Hall 5a0c9b2a13 Update go-containerregistry dep and remove unnecessary Options 2018-10-01 14:11:26 -04:00
Vincent Behar 49ab8e4979
Add a new flag to cleanup the filesystem at the end
Currently, kaniko can only build a single image per container run, because the filesystem is full of the content of the first image.
When running kaniko in Jenkins, where we need to start the container "doing nothing" first (using the debug kaniko container), and then exec /kaniko/executor, this is a limitation because it means that if we want to build multiple images, we need to start multiple containers - see https://groups.google.com/forum/#!topic/kaniko-users/_7LivHdMdy0 for more details

A solution to fix this issue is to add a new flag to cleanup the filesystem at the end - the same way it is done between stages when building a multi-stages image. This way, the same (debug) container can be used to build multiple images.
2018-09-28 10:25:33 +02:00
peter-evans b1e28ddb4f Fix handling of volume directive 2018-09-28 11:16:25 +09:00
priyawadhwa 1a13c81be8
Merge pull request #348 from priyawadhwa/entrypoint
Review config for cmd/entrypoint after building a stage
2018-09-26 21:26:42 +01:00
Priya Wadhwa e2ca1152f4 Rename flags and default caching to false
Rename --use-cache to --cache, and --cache to --cache-repo to clarify
what the flags are used for. Default caching to false.
2018-09-24 13:18:42 -07:00
Priya Wadhwa 1e1c98229c Merged master, fixed merge conflict 2018-09-17 11:12:29 +01:00
Priya Wadhwa cd1b957e43 Address code review comments; review unnecessary error check 2018-09-17 11:11:51 +01:00
Priya Wadhwa c216fbf91b Add layer caching to kaniko
To add layer caching to kaniko, I added two flags: --cache and
--use-cache.

If --use-cache is set, then the cache will be used, and if --cache is
specified then that repo will be used to store cached layers. If --cache
isn't set, a cache will be inferred from the destination provided.

Currently, caching only works for RUN commands. Before executing the
command, kaniko checks if the cached layer exists. If it does, it pulls
it and extracts it. It then adds those files to the snapshotter and
append a layer to the config history.  If the cached layer does not exist, kaniko executes the command and
pushes the newly created layer to the cache.

All cached layers are tagged with a stable key, which is built based off
of:

1. The base image digest
2. The current state of the filesystem
3. The current command being run
4. The current config file (to account for metadata changes)

I also added two integration tests to make sure caching works

1. Dockerfile_test_cache runs 'date', which should be exactly the same
the second time the image is built
2. Dockerfile_test_cache_install makes sure apt-get install can be
reproduced
2018-09-13 18:32:53 -07:00
Priya Wadhwa 7a6dfb6d8b Removed incorrect FS extraction from earlier merge with master, and fixed linting errors 2018-09-12 17:10:03 -07:00
Priya Wadhwa da6f099820 Merge branch 'master' of github.com:GoogleContainerTools/kaniko into entrypoint 2018-09-12 16:45:25 -07:00
Priya Wadhwa ee9aa954ac merged master, fixed conflicts 2018-09-12 16:43:32 -07:00
Priya Wadhwa bf72328611 Addressed code review comment, removed stuttering variable names 2018-09-12 16:36:53 -07:00
Priya Wadhwa d923d5ef02 Fix integration test 2018-09-11 10:07:54 -07:00
Tejal Desai 06defa6552
Merge pull request #337 from priyawadhwa/hasher
Add Key() to LayeredMap and Snapshotter
2018-09-11 09:29:50 -07:00
Priya Wadhwa 5d2d2829d0 Review config for cmd/entrypoint after building a stage
As mentioned in #346, if only ENTRYPOINT is set in a stage then any
CMD inherited from a parent should be cleared.

If both entrypoint and cmd are set then nothing should change.

I added a function and unit test to review the config file after building a stage
which clears out config.Cmd if ENTRYPOINT was declared but CMD wasn't.

I also added an integration test to make sure this works, which should
be tested by the preexisting container-diff --metadata test.
2018-09-10 18:15:51 -07:00
Priya Wadhwa d9022dd7de Refactor build into stageBuilder type
Refactoring builds by stage will make it easier to generate cache keys
for layers, since the stageBuilder type will contain everything required
to generate the key:

1. Base image with digest
2. Config file
3. Snapshotter (which will provide a key for the filesystem)
4. The current command (which will be passed in)
2018-09-07 17:17:32 -07:00
priyawadhwa 4dc34343b6
Merge pull request #320 from priyawadhwa/stages
Added a KanikoStage type for each stage of a Dockerfile
2018-09-07 16:19:40 -07:00
Priya Wadhwa 13accbaf32 Add Key() to LayeredMap and Snapshotter
This will return a string representaiton of the current filesystem to be
used with caching.

Whenever a file is explictly added (via ADD or COPY), it will be stored
in "added" in the LayeredMap. The file will map to a hash created by
CacheHasher (which doesn't take into account mtime, since that will be
different with every build, making the cache useless)

Key() will returns a sha of the added files which will be used in
determining the overall cache key for a command.
2018-09-04 13:42:33 -07:00
Priya Wadhwa 4f3ab61b96 Add CacheCommand to DockerCommand interface
CacheCommand returns true if the command should be cached. Currently,
it's only true for RUN but can be added to ADD/COPY later on (these are
different since the contents of files for ADD/COPY need to be included
in the cache key generation).

I also changed CreatedBy to String so that we can log each command
before cache extraction or regular execution takes place.
2018-09-04 13:16:05 -07:00
Priya Wadhwa 1db7fc2a61 Rebased 2018-08-30 10:16:08 -07:00
Priya Wadhwa 3dddb82bed Updated created by time for built image
Should fix #312
2018-08-29 16:56:53 -07:00
Priya Wadhwa 935d322f1d Rebased on master 2018-08-27 14:18:24 -07:00
Priya Wadhwa 64a0b1d75f Added a KanikoStage type for each stage of a Dockerfile
I added a KanikoStage to hold each stage of the Dockerfile along with
information about each stage that would be useful later on.

The new KanikoStage type holds the stage itself, along with some
additional information:

1. FinalStage -- whether the current stage is the final stage
2. BaseImageStoredLocally/BaseImageIndex -- whether the base image for
this stage is stored locally, and if so what the index of the base image
is
3. SaveStage -- whether this stage needs to be saved for use in a future
stage

This is the first part of a larger refactor for building stages, which
will later make it easier to add layer caching.
2018-08-27 14:15:04 -07:00
Christie Wilson 607af5f7a6 Always snapshot files in COPY and RUN commands
Kaniko uses mtime (as well as file contents and other attributes) to
determine if files have changed. COPY and ADD commands should _always_
update the mtime, because they actually overwrite the files. However it
turns out that the mtime can lag, so kaniko would sometimes add a new
layer when using COPY or ADD on a file, and sometimes would not. This
leads to a non-deterministic number of layers.

To fix this, we have updated the kaniko commands to be more
authoritative in declaring when they have changed a file (e.g. WORKDIR
will now only create the directory when it doesn't exist) and we will
trust those files and _always_ add them, instead of only adding them if
they haven't changed.

It is possible for RUN commands to also change the filesystem, in which
case kaniko has no choice but to look at the filesystem to determine
what has changed. For this case we have added a call to `sync` however
we still cannot guarantee that sometimes the mtime will not lag, causing the
number of layers to be non-deterministic. However when I tried to cause
this behaviour with the RUN command, I couldn't.

This changes the snapshotting logic a bit; before this change, the last
command of the last stage in a Dockerfile would always scan the whole
file system and ignore the files returned by the kaniko command. Instead
we will now trust those files and assume that the snapshotting
performed by previous commands will be adequate.

Docker itself seems to rely on the storage driver to determine when
files have changed and so doesn't have to deal with these problems
directly.

An alternative implementation would use `inotify` to track which files
have changed. However that would mean watching every file in the
filesystem, and adding new watches as files are added. Not only is there
a limit on the number of files that can be watched, but according to the
man pages a) this can take a significant amount of time b) there is
complication around when events arrive (e.g. by the time they arrive,
the files may have changed) and lastly c) events can be lost, which
would mean we'd run into this non-deterministic behaviour again anyway.

Fixes #251
2018-08-23 18:23:39 -07:00
Priya Wadhwa cfa822f178 Refactor command line arguments and the executor
In this refactor I:

1. Created KanikoOptions to make it easier to pass around arguments
passed in through the command line
2. Reorganized executor.go by putting the logic for pushing the image in
a new file push.go
3. Made some error messages clearer
4. Fixed a mistake in the README for pushing to AWS
5. Marked the --bucket flag as hidden since we want people to use
--context instead, and marked an aws flag as hidden which is set in a
vendored directorya
2018-08-23 13:30:36 -07:00