Add rpmlayer differ #252

davidcassany · 2018-07-27T13:03:48Z

This commit adds the rpmlayer differ that allows to perform analysis
at rpm package level on each layer. Lists new, deleted and modified
(different version) packages on each layer.

Signed-off-by: David Cassany [email protected]

This commit adds the rpmlayer differ that allows to perform analysis at rpm package level on each layer. Lists new, deleted and modified (different version) packages on each layer. Signed-off-by: David Cassany <[email protected]>

nkubala · 2018-07-30T17:29:19Z

cmd/root.go

@@ -59,7 +59,7 @@ const (
 	RemotePrefix = "remote://"
 )

-var layerAnalyzers = [...]string{"layer", "aptlayer"}
+var layerAnalyzers = [...]string{"layer", "aptlayer", "rpmlayer"}


since you're in here, would you mind making these strings constants somewhere?

I will include the constants somewhere

nkubala · 2018-07-30T17:34:41Z

differs/rpm_diff.go

+
+// rpmDataFromLayerFS runs a local rpm binary, if any, to query the layer
+// rpmdb and returns an array of maps of installed packages.
+func rpmDataFromLayerFS(image pkgutil.Image) ([]map[string]util.PackageInfo, error) {


The logic in this function is really similar to rpmDataFromImageFS. Could you pull out some of the shared code between them and reuse it in both functions? Apart from iterating over layers here I think they're almost identical.

I was also thinking about it, I can try to move some part to another function an see how it looks like. I think only lines 435->444 are clear candidates to be moved to another function.

nkubala · 2018-07-30T17:36:53Z

differs/rpm_diff.go

+	}
+
+	packages, err := rpmDataFromLayerFS(image)
+	if err != nil {


Do we care about the error here? This fallback pattern makes sense to me here, but I feel like we might want to be logging this error (I know we're not doing it in RPMAnalyzer.getPackages() either). At the very least we should add something to the log message saying something like Couldn't retrieve RPM data from extracted filesystem; running query in container.

Agree, lets include some better logs

nkubala · 2018-07-30T17:45:29Z

differs/rpm_diff.go

+	// Append layers one by one to an empty image and query rpm
+	// database on each iteration
+	for _, layer := range layers {
+		tmpImage, err = mutate.AppendLayers(tmpImage, layer)


We're trying to retrieve a map of packages installed in each layer here, right? I'm wondering if we actually want to be appending each layer onto the previous layers before running rpmDataFromContainer(), and instead create a single-layer image for each layer. Won't we be getting package information for the combination of all layers before the one we're actually examining here?

e.g. if I have layer A with package foo==1.0.0, and layer B with package bar==1.0.0, my maps here will look like
A: {
foo: 1.0.0
},

B: {
foo: 1.0.0, <-- don't want this here right?
bar: 1.0.0,
}

To get around this you could just create a new random.Image for each layer, append it, and then make the rpmDataFromContainer() call. Does this make sense?

To get around this you could just create a new random.Image for each layer, append it, and then make the rpmDataFromContainer() call. Does this make sense?

This is the same situation we have in APT packages, the package database reports all the installed packages, thus in each layer we will get always the list of all packages included in current and previous layers. To work around this there is the method singleVersionLayerAnalysis

container-diff/differs/package_differs.go

Lines 118 to 141 in c255953

func singleVersionLayerAnalysis(image pkgutil.Image, analyzer SingleVersionPackageLayerAnalyzer) (*util.SingleVersionPackageLayerAnalyzeResult, error) {

pack, err := analyzer.getPackages(image)

if err != nil {

return &util.SingleVersionPackageLayerAnalyzeResult{}, err

}

var pkgDiffs []util.PackageDiff

// Each layer with modified packages includes a complete list of packages

// in its package database. Thus we diff the current layer with the

// previous one that contains a package database. Layers that do not

// include a package database are omitted.

preInd := -1

for i := range pack {

var pkgDiff util.PackageDiff

if preInd < 0 && len(pack[i]) > 0 {

pkgDiff = util.GetMapDiff(make(map[string]util.PackageInfo), pack[i])

preInd = i

} else if preInd >= 0 && len(pack[i]) > 0 {

pkgDiff = util.GetMapDiff(pack[preInd], pack[i])

preInd = i

}

pkgDiffs = append(pkgDiffs, pkgDiff)

}

which basically performs a diff between layers.

Also I am appending the layers one by one because in this case the rpm binary used to parse the database is the one included inside the image, thus at least the layer including this binary (most probably the base layer) should be also part of the container to run. In order to make things simpler without having to include some kind of logic to guess which layers should be appended and which aren't needed I opted to simply stack them one by one. Note that this is the fallback in case the host does not have the RPM tool.

Ahh right, ok I think I follow that. Thanks for the explanation!

* Refactored RPMLayerAnalyzer for better code reuse * Updated differs.go to initialize Analyzers map with constant keys * moved layerAnalyzers vector from root.go to differs.go Signed-off-by: David Cassany <[email protected]>

nkubala

Thanks for the cleanup, and for contributing this!

Add rpmlayer differ

23d0231

This commit adds the rpmlayer differ that allows to perform analysis at rpm package level on each layer. Lists new, deleted and modified (different version) packages on each layer. Signed-off-by: David Cassany <[email protected]>

nkubala reviewed Jul 30, 2018

View reviewed changes

Improving coding style

6751a8c

* Refactored RPMLayerAnalyzer for better code reuse * Updated differs.go to initialize Analyzers map with constant keys * moved layerAnalyzers vector from root.go to differs.go Signed-off-by: David Cassany <[email protected]>

nkubala approved these changes Aug 10, 2018

View reviewed changes

nkubala merged commit 750860d into GoogleContainerTools:master Aug 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rpmlayer differ #252

Add rpmlayer differ #252

davidcassany commented Jul 27, 2018

nkubala Jul 30, 2018

davidcassany Jul 31, 2018

nkubala Jul 30, 2018

davidcassany Jul 31, 2018

nkubala Jul 30, 2018

davidcassany Jul 31, 2018 •

edited

Loading

nkubala Jul 30, 2018

davidcassany Jul 31, 2018

nkubala Aug 10, 2018

nkubala left a comment

	func singleVersionLayerAnalysis(image pkgutil.Image, analyzer SingleVersionPackageLayerAnalyzer) (*util.SingleVersionPackageLayerAnalyzeResult, error) {
	pack, err := analyzer.getPackages(image)
	if err != nil {
	return &util.SingleVersionPackageLayerAnalyzeResult{}, err
	}
	var pkgDiffs []util.PackageDiff

	// Each layer with modified packages includes a complete list of packages
	// in its package database. Thus we diff the current layer with the
	// previous one that contains a package database. Layers that do not
	// include a package database are omitted.
	preInd := -1
	for i := range pack {
	var pkgDiff util.PackageDiff
	if preInd < 0 && len(pack[i]) > 0 {
	pkgDiff = util.GetMapDiff(make(map[string]util.PackageInfo), pack[i])
	preInd = i
	} else if preInd >= 0 && len(pack[i]) > 0 {
	pkgDiff = util.GetMapDiff(pack[preInd], pack[i])
	preInd = i
	}

	pkgDiffs = append(pkgDiffs, pkgDiff)
	}

Add rpmlayer differ #252

Add rpmlayer differ #252

Conversation

davidcassany commented Jul 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidcassany Jul 31, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nkubala left a comment

Choose a reason for hiding this comment

davidcassany Jul 31, 2018 •

edited

Loading