-
Notifications
You must be signed in to change notification settings - Fork 236
Layered analysis for single version packages #248
Layered analysis for single version packages #248
Conversation
This commit implements the interfaces to perform package differs on layers. For diff command it compares the packages of each image on each layer, one by one. For analyze command it diffs the packages of each layer within the previous one (layers without packages changes are omitted). Signed-off-by: David Cassany <[email protected]>
This commit implements layer analysis for single version package analyzer for images based on apt package manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidcassany thanks for this PR! A few comments, in general this looks right though. Can you post some example output as well?
return analysis, err | ||
} | ||
|
||
func (a AptLayerAnalyzer) getPackages(image pkgutil.Image) ([]map[string]util.PackageInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this shares a lot of code with AptAnalyzer.getPackages()
in apt_diff.go
. Could you pull this out into a shared method between the two implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure you are right.
differs/package_differs.go
Outdated
} | ||
|
||
if len(image1.Layers) != len(image2.Layers) { | ||
logrus.Infof("%s and %s have different number of layers, please consider using container-diff analyze to view the contents of each image in each layer", image1.Source, image2.Source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what does the output look like for this case? We should maybe consider either a) not running the diff at all, or b) trying to do something smart to figure out which layers "match up" between the images. I could also be convinced to just run it as normal, but I'm not sure how useful the output will be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this diff is likely to be useless unless you know in advance the images are somehow comparable and they are build using an equivalent procedure (same number of layers, layers organized in the same way, etc.). This is pretty unlikely to happen, agree. I included it for completeness and because this is the approach that is followed in the layer
diff.
I understand the concerns, IMHO this is the same issue exposed in #239. As I already suggested in the mentioned issue, I believe that probably rather than trying find out some smart algorithm to choose the layers to diff it could be easier to just let user choose which layers to diff and let user determine which diff is actually meaningful or meaningless.
I am also opened to just not perform this diff at all as you suggest in a).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just go ahead and not run it here, the output will likely just be meaningless so I'd rather not add functionality that can be potentially confusing. I do like the idea of letting the user choose the specific layers to diff as an alternative though, I'll comment on the other issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I fully agree. So for now I'll modify it to not run the diff here and in case #239 we would be still on time to discuss again if it makes sense to run this diff or not.
} | ||
|
||
pkgDiffs = append(pkgDiffs, pkgDiff) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each layer with modified packages includes a complete list of packages in its package database. Thus we diff the current layer with the previous one.
Could you explain this a bit more? Are you saying that each layer with modified packages contains all packages installed in the previous layer, so diffing those two gives you only the packages modified between layer i
and layer i-1
?
I think this block would also be a little more clear if it looked something like
for i := range pack {
if len(pack[i] == 0) {
continue
}
if i == 0 {
pkgDiffs = append(pkgDiffs, util.GetMapDiff(make(map[string]util.PackageInfo), pack[i]))
} else {
pkgDiffs = append(pkgDiffs, util.GetMapDiff(pack[i-1], pack[i]))
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I haven't coded it as you suggest because we can't expect to have different packages in each layer. Image there are some layers that do not modify the package database, because they configure files or run some scripts or whatever; for those layers the reuslt of the getPackages is an empty map. I don't want to diff those empty maps with any other layer that actually modifies the package database, because the result will be the same as treating any package (regardless the layer in which they were appended) as a new addition.
So what I do here to compare only layers which actually include a modified package database. That means the diff is not with the immediate previous layer, but with the previous layer that actually includes some package database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh ok, that makes sense now. Could you just update your comment slightly to explain that it's not the previous layer, but the previous layer that contains a package db?
cmd/root.go
Outdated
@@ -268,7 +268,7 @@ func getExtractPathForName(name string) (string, error) { | |||
|
|||
func includeLayers() bool { | |||
for _, t := range types { | |||
if t == "layer" { | |||
if strings.Contains(t, "layer") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep this the way it was, this won't match up with the logic to retrieve the analyzers from the map in checkIfValidAnalyzer()
. I'm guessing you did this so we could allow --types layers
? We could add that as a valid string as well, but this would also match --types asdfasdfbnlayer
which we probably shouldn't do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I included it to distinguish the Analyzers that require layer extraction from the ones that don't require it. I noticed the layers were only extracted for the layer
analyzer, thus I though that enabling the layer extraction for any analyzer including layer
in its name could be a generic way to distinguish them from the rest in a simple string test. Obviously that assumes that any analyzer requiring extracted layers is named using the .*layer.*
pattern. At the time of writing it I was having in mind layer
, aptlayer
and rpmlayer
analyzers. After all checkIfValidAnalyzer
runs first, thus at the time of running includeLayers
we can realy on having an already valid list of types to check if there is one requiring layers extraction.
It would be also fine for me to use a static list of types that do require layer extraction.
@nkubala this is the result of
|
* Add a shared method in getPackages for AptAnalyzer and AptLayerAnalyzer * Explicit validation of the type flag in includeLayers method
@davidcassany output looks good! Just a few small comments then LGTM |
Remove Diff implementation of single version packages on layers, print a not supported warning message instead and raise error.
This PR implements the interfaces to perform package differs on layers. For diff command it compares the packages of each image on each layer, one by one. For analyze command it diffs the packages of each layer within the previous one (layers without packages changes are omitted).
Fixes #246