From 9557ef6b15e9441f752f8b8660ac0f932b2fd18b Mon Sep 17 00:00:00 2001 From: cvgw Date: Wed, 12 Feb 2020 16:19:36 -0800 Subject: [PATCH 1/6] design proposal 01: filesystem resolution --- .../filesystem-resolution-proposal-01.md | 110 ++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 docs/design_proposals/filesystem-resolution-proposal-01.md diff --git a/docs/design_proposals/filesystem-resolution-proposal-01.md b/docs/design_proposals/filesystem-resolution-proposal-01.md new file mode 100644 index 000000000..3ab3f46c2 --- /dev/null +++ b/docs/design_proposals/filesystem-resolution-proposal-01.md @@ -0,0 +1,110 @@ +# Filesystem Resolution 01 + +* Author(s): cgwippern@google.com +* Reviewers: +* Date: 2020-02-12 +* Status: [Reviewed/Cancelled/Under implementation/Complete] + +Here is a brief explanation of the Statuses + +1. Reviewed: The proposal PR has been accepted, merged and ready for + implementation. +2. Under implementation: An accepted proposal is being implemented by actual work. + Note: The design might change in this phase based on issues during + implementation. +3. Cancelled: During or before implementation the proposal was cancelled. + It could be due to: + * other features added which made the current design proposal obsolete. + * No longer a priority. +4. Complete: This feature/change is implemented. + +## Background + +Kaniko builds Docker image layers as overlay filesystem layers; specifically it +creates a tar file which contains the entire content of a given layer in the +overlay filesystem. Each overlay layer corresponds to one image layer. + +Overlay filesystems should only contain the objects changed in each layer; +meaning that if only one file changes between some layer A and some B, layer B +would only contain a single file (the one that changed). + +To accomplish this, Kaniko walks the entire filesystem to discover every object. +Some of these objects may actually be a symlink to another object in the +filesystem; in these cases we must consider both the link and the target object. + +Kaniko also maintains a set of whitelisted (aka ignored) filepaths. Any object +which matches one of these filepaths should be ignored by kaniko. + +This results in a 3 dimensional search space + +* changed relative to previous layer +* symlink +* whitelisted + +Kaniko must also track which objects are referred to by multiple stages; this +functionality is out of scope for this proposal. + +This search space is currently managed in an inconsistent and somewhat ad-hoc +way; code that manages the various search dimensions is spread out and +duplicated. There are also a number of poorly handled edge cases which continue +to cause bugs. + +The search space dimensions cannot be reduced or substituted. + +Currently there are a number of bugs around symlinks incorrectly resolved, +whitelists not respected, and unchanged files added to layers. + +## Design + +Resolution of the filesystem and the objects it contains should be handled +consistently and with a single API. + +* Callers of this API should not be concerned with the type of object at a given filepath (e.g. symlink or not). +* Callers of this API should not be concerned with whether a given path is whitelisted. +* This API should return a set of filepaths which should be added to the layer + without further processing. + +The API should take a limited set of arguments +* The root path in the filesystem +* The whitelist + +The API should return only two arguments +* A set of filepaths which should be added to the layer without further + processing +* error + +### Open Issues/Questions + +\ + +Resolution: Not Yet Resolved + +## Implementation plan + +* Write the new API +* Write tests for the new API +* Integrate the new API into existing code + +## Integration test plan + +Add integration tests to the existing suite which cover the known bugs + +## Notes + +Given some path `/usr/lib/foo` which is a link to `/etc/foo/` + +And `/etc/foo` contains `/etc/foo/bar.txt` + +Adding a link `/usr/lib/foo/bar.txt` => `/etc/foo/bar.txt` will break the image + +In a linux shell this raises an error +``` +$ ls /usr/lib/bar +=> /usr/lib/bar/foo.txt +$ ln -s /usr/lib/bar barlink +$ ln -s /usr/lib/bar/foo.txt barlink/foo.txt +=> ERROR +``` + +Given some path `/usr/foo/bar` which is a link to `/dev/null`, and `/dev` is +whitelisted `/dev/null` should not be added to the image. From 002642ef113ac8bc7f1a159e32bcd746b99bbad8 Mon Sep 17 00:00:00 2001 From: cvgw Date: Thu, 13 Feb 2020 17:54:43 -0800 Subject: [PATCH 2/6] proposal 01 updates --- .../filesystem-resolution-proposal-01.md | 41 +++++++++++++++++-- 1 file changed, 37 insertions(+), 4 deletions(-) diff --git a/docs/design_proposals/filesystem-resolution-proposal-01.md b/docs/design_proposals/filesystem-resolution-proposal-01.md index 3ab3f46c2..09aa55e61 100644 --- a/docs/design_proposals/filesystem-resolution-proposal-01.md +++ b/docs/design_proposals/filesystem-resolution-proposal-01.md @@ -46,7 +46,7 @@ functionality is out of scope for this proposal. This search space is currently managed in an inconsistent and somewhat ad-hoc way; code that manages the various search dimensions is spread out and -duplicated. There are also a number of poorly handled edge cases which continue +duplicated. There are also a number of edge cases which continue to cause bugs. The search space dimensions cannot be reduced or substituted. @@ -56,8 +56,32 @@ whitelists not respected, and unchanged files added to layers. ## Design -Resolution of the filesystem and the objects it contains should be handled -consistently and with a single API. +During snapshotting resolution of the filesystem and the objects it contains should be handled +consistently and with a pair of APIs. + +### First API +The first API will generate a list of filepaths to resolve + +The API should take a limited set of arguments +* A filesystem root + +The API should return only two arguments +* A set of filepaths to resolve +* error + +The API will walk the filesystem similar to how Snapshotting of the full +filesystem is currently handled. + +It will optimize the walk by skipping any whitelisted directories. + +It will not resolve symlinks. + +### Second API +The second API will resolve filepaths to be added to the layer + +Callers of the first API can pass the output to the second API + +Some callers of the second API will not use the first API (e.g. Copy commands) * Callers of this API should not be concerned with the type of object at a given filepath (e.g. symlink or not). * Callers of this API should not be concerned with whether a given path is whitelisted. @@ -65,7 +89,7 @@ consistently and with a single API. without further processing. The API should take a limited set of arguments -* The root path in the filesystem +* A list of absolute filepaths to scan * The whitelist The API should return only two arguments @@ -73,10 +97,19 @@ The API should return only two arguments processing * error +The API will iterate over the set of filepaths and for each item +* check whether it is whitelisted; if it is, skip it +* check whether it is a symlink; if it is, resolve the link ancestor and the + target, add the link ancestor to the output, check whether the target is whitelisted and if + not add the target to the output + ### Open Issues/Questions \ +Given some link `/foo/link/bar` whose target is a whitelisted path such as +`/var/run`, should `/foo/link/bar` be added to the layer? + Resolution: Not Yet Resolved ## Implementation plan From c7148eba3a49db3d9b94435de6d6eea47e9a1675 Mon Sep 17 00:00:00 2001 From: cvgw Date: Fri, 14 Feb 2020 08:40:26 -0800 Subject: [PATCH 3/6] proposal 01 add more questions --- .../filesystem-resolution-proposal-01.md | 24 +++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/docs/design_proposals/filesystem-resolution-proposal-01.md b/docs/design_proposals/filesystem-resolution-proposal-01.md index 09aa55e61..9dabfd43d 100644 --- a/docs/design_proposals/filesystem-resolution-proposal-01.md +++ b/docs/design_proposals/filesystem-resolution-proposal-01.md @@ -112,6 +112,30 @@ Given some link `/foo/link/bar` whose target is a whitelisted path such as Resolution: Not Yet Resolved +\ + +According to [this comment](https://github.com/GoogleContainerTools/kaniko/blob/1e9f525509d4e6a066a6e07ab9afbef69b3a3b2c/pkg/snapshot/snapshot.go#L193) +the ancestor directories (parent, grandparent, etc) must also be added to the +layer to preserve the permissions on those directories. This brings into +question whether any filtering needs to happen on these ancestors. IIUC the +current whitelist logic it is possible for `/some/dir` to be whitelisted but not +`/some/dir/containing-a-file.txt`. If filtering needs to be applied to these +ancestors does it make most sense to handle this within the proposed filtering +API? + +Resolution: Not Yet Resolved + +\ + +The proposal currently states that the list of files returned from the API +should be immediately added to the layer, but this would imply that diff'ing +existing files, finding newly created files, and handling deleted files would +have already been done. It may be advantageous to handle these outside of the +API in order to reduce scope and complexity. If these are handled outside of the +API how can we decouple and encapsulate these two functions? + +Resolution: Not Yet Resolved + ## Implementation plan * Write the new API From e84b91125f808d128065b86fd0192089f4e9219d Mon Sep 17 00:00:00 2001 From: cvgw Date: Mon, 17 Feb 2020 11:23:32 -0800 Subject: [PATCH 4/6] proposal updates --- .../filesystem-resolution-proposal-01.md | 55 +++++++------------ 1 file changed, 19 insertions(+), 36 deletions(-) diff --git a/docs/design_proposals/filesystem-resolution-proposal-01.md b/docs/design_proposals/filesystem-resolution-proposal-01.md index 9dabfd43d..76508a24a 100644 --- a/docs/design_proposals/filesystem-resolution-proposal-01.md +++ b/docs/design_proposals/filesystem-resolution-proposal-01.md @@ -56,52 +56,31 @@ whitelists not respected, and unchanged files added to layers. ## Design -During snapshotting resolution of the filesystem and the objects it contains should be handled -consistently and with a pair of APIs. - -### First API -The first API will generate a list of filepaths to resolve - -The API should take a limited set of arguments -* A filesystem root - -The API should return only two arguments -* A set of filepaths to resolve -* error - -The API will walk the filesystem similar to how Snapshotting of the full -filesystem is currently handled. - -It will optimize the walk by skipping any whitelisted directories. - -It will not resolve symlinks. - -### Second API -The second API will resolve filepaths to be added to the layer - -Callers of the first API can pass the output to the second API - -Some callers of the second API will not use the first API (e.g. Copy commands) +During snapshotting, filepaths should be resolved using a consitent API which +takes into account both symlinks and whitelist. * Callers of this API should not be concerned with the type of object at a given filepath (e.g. symlink or not). * Callers of this API should not be concerned with whether a given path is whitelisted. -* This API should return a set of filepaths which should be added to the layer - without further processing. +* This API should return a set of filepaths which can be checked for changes + without further link resolution or whitelist checking. The API should take a limited set of arguments * A list of absolute filepaths to scan * The whitelist The API should return only two arguments -* A set of filepaths which should be added to the layer without further - processing -* error +* A set of filepaths +* error or nil The API will iterate over the set of filepaths and for each item * check whether it is whitelisted; if it is, skip it -* check whether it is a symlink; if it is, resolve the link ancestor and the - target, add the link ancestor to the output, check whether the target is whitelisted and if - not add the target to the output +* check whether it is a symlink + * if it is a symlink + * resolve the link ancestor (nearest ancestor which is a symlink) and the + target + * add the link ancestor to the output + * check whether the target is whitelisted and if + not add the target to the output ### Open Issues/Questions @@ -110,7 +89,9 @@ The API will iterate over the set of filepaths and for each item Given some link `/foo/link/bar` whose target is a whitelisted path such as `/var/run`, should `/foo/link/bar` be added to the layer? -Resolution: Not Yet Resolved +Resolution: Resolved + +Yes, it should be added. \ @@ -134,7 +115,9 @@ have already been done. It may be advantageous to handle these outside of the API in order to reduce scope and complexity. If these are handled outside of the API how can we decouple and encapsulate these two functions? -Resolution: Not Yet Resolved +Resolution: Resolved + +The API will not handle file diffing or whiteouts. ## Implementation plan From 84047d4bcd21734d3f508484db2a40dbaf919c47 Mon Sep 17 00:00:00 2001 From: cvgw Date: Mon, 17 Feb 2020 11:27:19 -0800 Subject: [PATCH 5/6] add example signature for API --- docs/design_proposals/filesystem-resolution-proposal-01.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/design_proposals/filesystem-resolution-proposal-01.md b/docs/design_proposals/filesystem-resolution-proposal-01.md index 76508a24a..741bd59c7 100644 --- a/docs/design_proposals/filesystem-resolution-proposal-01.md +++ b/docs/design_proposals/filesystem-resolution-proposal-01.md @@ -72,6 +72,11 @@ The API should return only two arguments * A set of filepaths * error or nil +The signature of the API should look similar to +``` + ResolveFilePaths(inputPaths []string, whitelist []WhitelistEntry) (resolvedPaths []string, err error) +``` + The API will iterate over the set of filepaths and for each item * check whether it is whitelisted; if it is, skip it * check whether it is a symlink From d31d2bc74c884ff5d7932b1f6e7209c6fc63cc5e Mon Sep 17 00:00:00 2001 From: cvgw Date: Mon, 17 Feb 2020 13:32:28 -0800 Subject: [PATCH 6/6] update about filesWithParentDirs --- .../filesystem-resolution-proposal-01.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/design_proposals/filesystem-resolution-proposal-01.md b/docs/design_proposals/filesystem-resolution-proposal-01.md index 741bd59c7..205c2a882 100644 --- a/docs/design_proposals/filesystem-resolution-proposal-01.md +++ b/docs/design_proposals/filesystem-resolution-proposal-01.md @@ -87,6 +87,11 @@ The API will iterate over the set of filepaths and for each item * check whether the target is whitelisted and if not add the target to the output +All ancestors of each filepath will also be added to the list, but the previous +checks will not be applied to the ancestors. This maintains the current behavior +which we believe is needed to maintain correct permissions on the ancestor +directories. + ### Open Issues/Questions \ @@ -109,7 +114,9 @@ current whitelist logic it is possible for `/some/dir` to be whitelisted but not ancestors does it make most sense to handle this within the proposed filtering API? -Resolution: Not Yet Resolved +Resolution: Resolved + +Yes, this should be handled in the API \