-
Notifications
You must be signed in to change notification settings - Fork 236
Add rpmlayer differ #252
Add rpmlayer differ #252
Conversation
This commit adds the rpmlayer differ that allows to perform analysis at rpm package level on each layer. Lists new, deleted and modified (different version) packages on each layer. Signed-off-by: David Cassany <[email protected]>
cmd/root.go
Outdated
@@ -59,7 +59,7 @@ const ( | |||
RemotePrefix = "remote://" | |||
) | |||
|
|||
var layerAnalyzers = [...]string{"layer", "aptlayer"} | |||
var layerAnalyzers = [...]string{"layer", "aptlayer", "rpmlayer"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you're in here, would you mind making these strings constants somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will include the constants somewhere
|
||
// rpmDataFromLayerFS runs a local rpm binary, if any, to query the layer | ||
// rpmdb and returns an array of maps of installed packages. | ||
func rpmDataFromLayerFS(image pkgutil.Image) ([]map[string]util.PackageInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in this function is really similar to rpmDataFromImageFS
. Could you pull out some of the shared code between them and reuse it in both functions? Apart from iterating over layers here I think they're almost identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking about it, I can try to move some part to another function an see how it looks like. I think only lines 435->444 are clear candidates to be moved to another function.
} | ||
|
||
packages, err := rpmDataFromLayerFS(image) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care about the error here? This fallback pattern makes sense to me here, but I feel like we might want to be logging this error (I know we're not doing it in RPMAnalyzer.getPackages()
either). At the very least we should add something to the log message saying something like Couldn't retrieve RPM data from extracted filesystem; running query in container
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, lets include some better logs
// Append layers one by one to an empty image and query rpm | ||
// database on each iteration | ||
for _, layer := range layers { | ||
tmpImage, err = mutate.AppendLayers(tmpImage, layer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're trying to retrieve a map of packages installed in each layer here, right? I'm wondering if we actually want to be appending each layer onto the previous layers before running rpmDataFromContainer()
, and instead create a single-layer image for each layer. Won't we be getting package information for the combination of all layers before the one we're actually examining here?
e.g. if I have layer A
with package foo==1.0.0
, and layer B
with package bar==1.0.0
, my maps here will look like
A
: {
foo
: 1.0.0
},
B
: {
foo
: 1.0.0
, <-- don't want this here right?
bar
: 1.0.0
,
}
To get around this you could just create a new random.Image
for each layer, append it, and then make the rpmDataFromContainer()
call. Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get around this you could just create a new random.Image for each layer, append it, and then make the rpmDataFromContainer() call. Does this make sense?
This is the same situation we have in APT packages, the package database reports all the installed packages, thus in each layer we will get always the list of all packages included in current and previous layers. To work around this there is the method singleVersionLayerAnalysis
container-diff/differs/package_differs.go
Lines 118 to 141 in c255953
func singleVersionLayerAnalysis(image pkgutil.Image, analyzer SingleVersionPackageLayerAnalyzer) (*util.SingleVersionPackageLayerAnalyzeResult, error) { | |
pack, err := analyzer.getPackages(image) | |
if err != nil { | |
return &util.SingleVersionPackageLayerAnalyzeResult{}, err | |
} | |
var pkgDiffs []util.PackageDiff | |
// Each layer with modified packages includes a complete list of packages | |
// in its package database. Thus we diff the current layer with the | |
// previous one that contains a package database. Layers that do not | |
// include a package database are omitted. | |
preInd := -1 | |
for i := range pack { | |
var pkgDiff util.PackageDiff | |
if preInd < 0 && len(pack[i]) > 0 { | |
pkgDiff = util.GetMapDiff(make(map[string]util.PackageInfo), pack[i]) | |
preInd = i | |
} else if preInd >= 0 && len(pack[i]) > 0 { | |
pkgDiff = util.GetMapDiff(pack[preInd], pack[i]) | |
preInd = i | |
} | |
pkgDiffs = append(pkgDiffs, pkgDiff) | |
} |
which basically performs a diff between layers.
Also I am appending the layers one by one because in this case the rpm
binary used to parse the database is the one included inside the image, thus at least the layer including this binary (most probably the base layer) should be also part of the container to run. In order to make things simpler without having to include some kind of logic to guess which layers should be appended and which aren't needed I opted to simply stack them one by one. Note that this is the fallback in case the host does not have the RPM tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh right, ok I think I follow that. Thanks for the explanation!
* Refactored RPMLayerAnalyzer for better code reuse * Updated differs.go to initialize Analyzers map with constant keys * moved layerAnalyzers vector from root.go to differs.go Signed-off-by: David Cassany <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cleanup, and for contributing this!
This commit adds the rpmlayer differ that allows to perform analysis
at rpm package level on each layer. Lists new, deleted and modified
(different version) packages on each layer.
Signed-off-by: David Cassany [email protected]