Skip to content
This repository was archived by the owner on Mar 27, 2024. It is now read-only.

Layered analysis for single version packages #248

Conversation

davidcassany
Copy link
Contributor

This PR implements the interfaces to perform package differs on layers. For diff command it compares the packages of each image on each layer, one by one. For analyze command it diffs the packages of each layer within the previous one (layers without packages changes are omitted).

Fixes #246

This commit implements the interfaces to perform package differs
on layers. For diff command it compares the packages of each image on each
layer, one by one. For analyze command it diffs the packages of each
layer within the previous one (layers without packages changes are
omitted).

Signed-off-by: David Cassany <[email protected]>
This commit implements layer analysis for single version package analyzer for
images based on apt package manager
Copy link
Contributor

@nkubala nkubala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidcassany thanks for this PR! A few comments, in general this looks right though. Can you post some example output as well?

return analysis, err
}

func (a AptLayerAnalyzer) getPackages(image pkgutil.Image) ([]map[string]util.PackageInfo, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this shares a lot of code with AptAnalyzer.getPackages() in apt_diff.go. Could you pull this out into a shared method between the two implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure you are right.

}

if len(image1.Layers) != len(image2.Layers) {
logrus.Infof("%s and %s have different number of layers, please consider using container-diff analyze to view the contents of each image in each layer", image1.Source, image2.Source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what does the output look like for this case? We should maybe consider either a) not running the diff at all, or b) trying to do something smart to figure out which layers "match up" between the images. I could also be convinced to just run it as normal, but I'm not sure how useful the output will be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this diff is likely to be useless unless you know in advance the images are somehow comparable and they are build using an equivalent procedure (same number of layers, layers organized in the same way, etc.). This is pretty unlikely to happen, agree. I included it for completeness and because this is the approach that is followed in the layer diff.

I understand the concerns, IMHO this is the same issue exposed in #239. As I already suggested in the mentioned issue, I believe that probably rather than trying find out some smart algorithm to choose the layers to diff it could be easier to just let user choose which layers to diff and let user determine which diff is actually meaningful or meaningless.

I am also opened to just not perform this diff at all as you suggest in a).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just go ahead and not run it here, the output will likely just be meaningless so I'd rather not add functionality that can be potentially confusing. I do like the idea of letting the user choose the specific layers to diff as an alternative though, I'll comment on the other issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I fully agree. So for now I'll modify it to not run the diff here and in case #239 we would be still on time to discuss again if it makes sense to run this diff or not.

}

pkgDiffs = append(pkgDiffs, pkgDiff)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each layer with modified packages includes a complete list of packages in its package database. Thus we diff the current layer with the previous one.

Could you explain this a bit more? Are you saying that each layer with modified packages contains all packages installed in the previous layer, so diffing those two gives you only the packages modified between layer i and layer i-1?

I think this block would also be a little more clear if it looked something like

for i := range pack {
	if len(pack[i] == 0) {
		continue
	}
	if i == 0 {
		pkgDiffs = append(pkgDiffs, util.GetMapDiff(make(map[string]util.PackageInfo), pack[i]))
	} else {
		pkgDiffs = append(pkgDiffs, util.GetMapDiff(pack[i-1], pack[i]))
	}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I haven't coded it as you suggest because we can't expect to have different packages in each layer. Image there are some layers that do not modify the package database, because they configure files or run some scripts or whatever; for those layers the reuslt of the getPackages is an empty map. I don't want to diff those empty maps with any other layer that actually modifies the package database, because the result will be the same as treating any package (regardless the layer in which they were appended) as a new addition.

So what I do here to compare only layers which actually include a modified package database. That means the diff is not with the immediate previous layer, but with the previous layer that actually includes some package database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh ok, that makes sense now. Could you just update your comment slightly to explain that it's not the previous layer, but the previous layer that contains a package db?

cmd/root.go Outdated
@@ -268,7 +268,7 @@ func getExtractPathForName(name string) (string, error) {

func includeLayers() bool {
for _, t := range types {
if t == "layer" {
if strings.Contains(t, "layer") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this the way it was, this won't match up with the logic to retrieve the analyzers from the map in checkIfValidAnalyzer(). I'm guessing you did this so we could allow --types layers? We could add that as a valid string as well, but this would also match --types asdfasdfbnlayer which we probably shouldn't do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I included it to distinguish the Analyzers that require layer extraction from the ones that don't require it. I noticed the layers were only extracted for the layer analyzer, thus I though that enabling the layer extraction for any analyzer including layer in its name could be a generic way to distinguish them from the rest in a simple string test. Obviously that assumes that any analyzer requiring extracted layers is named using the .*layer.* pattern. At the time of writing it I was having in mind layer, aptlayer and rpmlayer analyzers. After all checkIfValidAnalyzer runs first, thus at the time of running includeLayers we can realy on having an already valid list of types to check if there is one requiring layers extraction.

It would be also fine for me to use a static list of types that do require layer extraction.

@davidcassany
Copy link
Contributor Author

@nkubala this is the result of container-diff analyze daemon://gcr.io/gcp-runtimes/multi-modified:latest --type aptlayer. I have cut some packages lists to shorten the diff, just to let you see how the output looks like.

-----AptLayer-----

For Layer 0:
Packages added in this layer:
NAME                           VERSION                         SIZE
-acl                           2.2.52-2                        258K
-adduser                       3.113 nmu3                      1M
-apt                           1.0.9.8.4                       3.1M
-base-files                    8 deb8u9                        413K
-base-passwd                   3.5.37                          185K
-bash                          4.3-11 deb8u1                   4.9M
-bsdutils                      1:2.25.2-6                      181K
-coreutils                     8.23-4                          13.9M
-dash                          0.5.7-4 b1                      191K
-debconf                       1.5.56 deb8u1                   614K
(...)

For Layer 1:
Deleted packages from previous layers: None

Packages added in this layer:
NAME                        VERSION                                 SIZE
-ca-certificates            20141019 deb8u3                         367K
-curl                       7.38.0-4 deb8u5                         325K
-libcurl3                   7.38.0-4 deb8u5                         586K
-libffi6                    3.1-2 deb8u1                            43K
-libgmp10                   2:6.0.0 dfsg-6                          556K
-libgnutls-deb0-28          3.3.8-6 deb8u7                          1.8M
-libgssapi-krb5-2           1.12.1 dfsg-19+deb8u2                   393K
-libhogweed2                2.7.1-5 deb8u2                          223K
-libicu52                   52.1-8 deb8u5                           26.7M
-libidn11                   1.29-1 deb8u2                           319K
-libk5crypto3               1.12.1 dfsg-19+deb8u2                   281K
-libkeyutils1               1.5.9-5 b1                              55K
-libkrb5-3                  1.12.1 dfsg-19+deb8u2                   973K
-libkrb5support0            1.12.1 dfsg-19+deb8u2                   137K
-libldap-2.4-2              2.4.40 dfsg-1+deb8u3                    471K
-libnettle4                 2.7.1-5 deb8u2                          331K
-libp11-kit0                0.20.7-1                                299K
-libpsl0                    0.5.1-1                                 510K
-librtmp1                   2.4 20150115.gita107cef-1+deb8u1        160K
-libsasl2-2                 2.1.26.dfsg1-13 deb8u1                  171K
-libsasl2-modules-db        2.1.26.dfsg1-13 deb8u1                  82K
-libssh2-1                  1.4.3-4.1 deb8u1                        229K
-libssl1.0.0                1.0.1t-1 deb8u6                         3M
-libtasn1-6                 4.2-3 deb8u3                            131K
-openssl                    1.0.1t-1 deb8u6                         1.1M
-wget                       1.16-1 deb8u2                           1.7M

Version differences: None

For Layer 2:
Deleted packages from previous layers: None

Packages added in this layer:
NAME                           VERSION                         SIZE
-acl                           2.2.52-2                        258K
-adduser                       3.113 nmu3                      1M
-apt                           1.0.9.8.4                       3.1M
-base-files                    8 deb8u9                        413K
-base-passwd                   3.5.37                          185K
-bash                          4.3-11 deb8u1                   4.9M
-bsdutils                      1:2.25.2-6                      181K
-bzr                           2.6.0 bzr6595-6                 100K
-coreutils                     8.23-4                          13.9M
-dash                          0.5.7-4 b1                      191K
-debconf                       1.5.56 deb8u1                   614K
-debconf-i18n                  1.5.56 deb8u1                   1.1M
(...)

For Layer 3:
Deleted packages from previous layers: None

Packages added in this layer:
NAME                                 VERSION                                 SIZE
-autoconf                            2.69-8                                  1.9M
-automake                            1:1.14.1-4 deb8u1                       1.7M
-autotools-dev                       20140911.1                              129K
-binutils                            2.25-5 deb8u1                           20.1M
-bzip2                               1.0.6-7 b3                              119K
-ca-certificates                     20141019 deb8u3                         367K
-comerr-dev                          2.1-1.42.12-2 b1                        82K
-cpp                                 4:4.9.2-2                               65K
-cpp-4.9                             4.9.2-10                                16M
(...)

For Layer 4:
Deleted packages from previous layers: None

Packages added in this layer:
NAME                           VERSION                         SIZE
-acl                           2.2.52-2                        258K
-adduser                       3.113 nmu3                      1M
-apt                           1.0.9.8.4                       3.1M
-base-files                    8 deb8u9                        413K
-base-passwd                   3.5.37                          185K
-bash                          4.3-11 deb8u1                   4.9M
-bsdutils                      1:2.25.2-6                      181K
(...)

For Layer 5:
Deleted packages from previous layers: None

Packages added in this layer:
NAME                                 VERSION                                 SIZE
-autoconf                            2.69-8                                  1.9M
-automake                            1:1.14.1-4 deb8u1                       1.7M
-autotools-dev                       20140911.1                              129K
-binutils                            2.25-5 deb8u1                           20.1M
-bzip2                               1.0.6-7 b3                              119K
(...)

For Layer 6: No package changes 

For Layer 7: No package changes 

For Layer 8:
Deleted packages from previous layers: None

Packages added in this layer:
NAME                                VERSION                         SIZE
-acl                                2.2.52-2                        258K
-adduser                            3.113 nmu3                      1M
-apt                                1.0.9.8.4                       3.1M
-base-files                         8 deb8u9                        413K
-base-passwd                        3.5.37                          185K
-bash                               4.3-11 deb8u1                   4.9M
-bsdutils                           1:2.25.2-6                      181K
(...)

* Add a shared method in getPackages for AptAnalyzer and AptLayerAnalyzer
* Explicit validation of the type flag in includeLayers method
@nkubala
Copy link
Contributor

nkubala commented Jul 20, 2018

@davidcassany output looks good! Just a few small comments then LGTM

Remove Diff implementation of single version packages on layers, print
a not supported warning message instead and raise error.
@nkubala nkubala merged commit 509496e into GoogleContainerTools:master Jul 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants