Skip to content
This repository was archived by the owner on Mar 27, 2024. It is now read-only.

Add rpmlayer differ #252

Merged
merged 2 commits into from
Aug 10, 2018

Conversation

davidcassany
Copy link
Contributor

This commit adds the rpmlayer differ that allows to perform analysis
at rpm package level on each layer. Lists new, deleted and modified
(different version) packages on each layer.

Signed-off-by: David Cassany [email protected]

This commit adds the rpmlayer differ that allows to perform analysis
at rpm package level on each layer. Lists new, deleted and modified
(different version) packages on each layer.

Signed-off-by: David Cassany <[email protected]>
cmd/root.go Outdated
@@ -59,7 +59,7 @@ const (
RemotePrefix = "remote://"
)

var layerAnalyzers = [...]string{"layer", "aptlayer"}
var layerAnalyzers = [...]string{"layer", "aptlayer", "rpmlayer"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you're in here, would you mind making these strings constants somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will include the constants somewhere


// rpmDataFromLayerFS runs a local rpm binary, if any, to query the layer
// rpmdb and returns an array of maps of installed packages.
func rpmDataFromLayerFS(image pkgutil.Image) ([]map[string]util.PackageInfo, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in this function is really similar to rpmDataFromImageFS. Could you pull out some of the shared code between them and reuse it in both functions? Apart from iterating over layers here I think they're almost identical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about it, I can try to move some part to another function an see how it looks like. I think only lines 435->444 are clear candidates to be moved to another function.

}

packages, err := rpmDataFromLayerFS(image)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about the error here? This fallback pattern makes sense to me here, but I feel like we might want to be logging this error (I know we're not doing it in RPMAnalyzer.getPackages() either). At the very least we should add something to the log message saying something like Couldn't retrieve RPM data from extracted filesystem; running query in container.

Copy link
Contributor Author

@davidcassany davidcassany Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, lets include some better logs

// Append layers one by one to an empty image and query rpm
// database on each iteration
for _, layer := range layers {
tmpImage, err = mutate.AppendLayers(tmpImage, layer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to retrieve a map of packages installed in each layer here, right? I'm wondering if we actually want to be appending each layer onto the previous layers before running rpmDataFromContainer(), and instead create a single-layer image for each layer. Won't we be getting package information for the combination of all layers before the one we're actually examining here?

e.g. if I have layer A with package foo==1.0.0, and layer B with package bar==1.0.0, my maps here will look like
A: {
foo: 1.0.0
},

B: {
foo: 1.0.0, <-- don't want this here right?
bar: 1.0.0,
}

To get around this you could just create a new random.Image for each layer, append it, and then make the rpmDataFromContainer() call. Does this make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get around this you could just create a new random.Image for each layer, append it, and then make the rpmDataFromContainer() call. Does this make sense?

This is the same situation we have in APT packages, the package database reports all the installed packages, thus in each layer we will get always the list of all packages included in current and previous layers. To work around this there is the method singleVersionLayerAnalysis

func singleVersionLayerAnalysis(image pkgutil.Image, analyzer SingleVersionPackageLayerAnalyzer) (*util.SingleVersionPackageLayerAnalyzeResult, error) {
pack, err := analyzer.getPackages(image)
if err != nil {
return &util.SingleVersionPackageLayerAnalyzeResult{}, err
}
var pkgDiffs []util.PackageDiff
// Each layer with modified packages includes a complete list of packages
// in its package database. Thus we diff the current layer with the
// previous one that contains a package database. Layers that do not
// include a package database are omitted.
preInd := -1
for i := range pack {
var pkgDiff util.PackageDiff
if preInd < 0 && len(pack[i]) > 0 {
pkgDiff = util.GetMapDiff(make(map[string]util.PackageInfo), pack[i])
preInd = i
} else if preInd >= 0 && len(pack[i]) > 0 {
pkgDiff = util.GetMapDiff(pack[preInd], pack[i])
preInd = i
}
pkgDiffs = append(pkgDiffs, pkgDiff)
}

which basically performs a diff between layers.

Also I am appending the layers one by one because in this case the rpm binary used to parse the database is the one included inside the image, thus at least the layer including this binary (most probably the base layer) should be also part of the container to run. In order to make things simpler without having to include some kind of logic to guess which layers should be appended and which aren't needed I opted to simply stack them one by one. Note that this is the fallback in case the host does not have the RPM tool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh right, ok I think I follow that. Thanks for the explanation!

* Refactored RPMLayerAnalyzer for better code reuse
* Updated differs.go to initialize Analyzers map with constant keys
* moved layerAnalyzers vector from root.go to differs.go

Signed-off-by: David Cassany <[email protected]>
Copy link
Contributor

@nkubala nkubala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanup, and for contributing this!

@nkubala nkubala merged commit 750860d into GoogleContainerTools:master Aug 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants