WIP: Semi-concrete IR interpreter #44660

Keno · 2022-03-17T19:23:15Z

Background

in #43852, the compiler learned how to make use of the codegen'd version of a function in order to speed up constant propagation by 1000x, assuming two conditions are fulfilled:

All arguments are known to be constant at inference time
The evaluation is legal according to appropriate effect conditions

This mechanism works well, but leaves some very sharp performance edges in inference. For example, adding an unused argument to an otherwise consteval-eligible function will fall back to inference-based constant propagation and thus incur the 1000x penalty (even though the argument is entirely unused).

This PR attempts to address this by adding a new middle-of-the-road fallback that sits between the super fast constant, concrete evaluation from #43852 and the original inference based constant propagation.

This PR

The core idea is to use inference's existing abstract interpretation lattice, but rather than performing abstract interpretation over untyped source, we perform it over the typed/optimized julia IR that the non-constant inference pass produced.

Now, this is not legal to do in general, but the :consistent and :terminates_globally effects from #43852, do ensure legality (in the currently implementation we additionally require :effect_free for consistency with consteval, but that could be easily relaxed).

Why is this a good idea

The are several benefits to performing abstract evaluation over optimized IR rather than untyped IR:

We do not need to perform any method lookups and thus can also avoid inference's relatively complicated tracking of potential recursions. Method lookups were performed during the non-constant inference pass and the :termiantes_globally effect ensures the absence of recursion (at least when proven by the compiler - the property I documented for manual annotation is slightly too weak than what we need, so we may need to split out these effects).
Optimized IR is likely to be smaller (in terms of total number of instructions that need to be visited for a given set of input arguments) because some optimizations/simplification/DCE have been performed.
Optimized IR does not have any slots in it, so we avoid Very WIP: Refactor core inference loops to use less memory #43999-like problems around inference memory usage.
The same optimized IR can be re-used many times for different sets of constants.

Threats to validity

Now, there are some challenges also. For one, it is not guaranteed that this will have the same precision as the inference-based constant propagation. I've done some testing here and precision is decent, but there are some cases where precision is worse than the inference-based constant propagation:

In the current implementation, we never perform any method matching, instead relying on methods annotated in :invoke statements. This mostly works, but there are certain situations (e.g. the cases that optimizer: inline abstract union-split callsite #44512 is supposed to fix), where we currently cannot produce :invoke, even though inference was able to compute a full method match. My preferred solution to this is to extend the semantics of :invoke to always make it possible to generate an :invoke node when the full method match set is known (even if codegen does not make use of it). However, I haven't really seen this be much of a problem in my testing.
There are certain branching patterns that would require the insertion of additional Pi nodes in order to avoid overapproximation of the lattice elements. This is one of the reasons that we currently don't do regular inference on SSA IR (the other being debuggability of lowered code). We've been thinking about some solutions to this, but this particular case is somewhat simpler than inference because 1) the branching patterns that inference can prove :consistent are limited anyway 2) regular inference may have already inserted the requisite PiNodes in which case there isn't a problem. In particular, in my testing I haven't actually seen any case of precision regressions due to this issue.

Benchmark results

The benchmarks for this are looking very attractive. On my ongoing inference torture test (https://gist.github.com/Keno/5587fbfe89bff96c863c8eeb1477defa), this gives an additional 2x improvement in inference time on top of the improvements already merged on master and those pending in #44447 and #44494. I should note that this is total inference time. By the nature of the benchmark, it basically performs inference and constant propagation once for each specialized call site. The 2x improvement here essentially comes down to the constant-propagation portion of that time dropping down to noise level.

The performance is even better on the real world benchmark that motivated https://gist.github.com/Keno/5587fbfe89bff96c863c8eeb1477defa. There constant propagation is performed multiple times with different constants for each non-constant inference, so the performance improvement is proportionally better, to the point where inference time is no longer dominating in that benchmark (down to about 30s from several hours in early January before all the recent compile-time perf improvements).

Current status

This passes tests for me locally, but I'm convinced that issues will show up during PkgEval. The re-use of the abstract interpreter pieces is also quite janky and should be factored better.

timholy · 2022-03-17T19:28:07Z

I haven't delved into the algorithm, but naively this is how I expected constprop would work. Sounds like a terrific enhancement, thanks for tackling this.

ianatol · 2022-03-22T23:43:26Z

I read this for a while today and I have some refactor changes planned, but gonna run PkgEval on it as is to begin more formally investigating tomorrow.

@nanosoldier runtests(ALL, vs = ":master")

nanosoldier · 2022-03-23T07:41:30Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

Keno · 2022-03-24T01:11:02Z

@nanosoldier runtests(["ADI", "AMLPipelineBase", "ANOVA", "ARules", "Agents", "AlgebraOfGraphics", "Arrow", "AutoMLPipeline", "BGEN", "BHAtp", "BallroomSkatingSystem", "BangBang", "BedgraphFiles", "BenchmarkConfigSweeps", "Binscatters", "BioFetch", "BipolarSphericalHarmonics", "BoltzmannMachinesPlots", "Bootstrap", "CMAEvolutionStrategy", "CSVReader", "Cassette", "ChainLadder", "CoinbasePro", "CombinatorialSpaces", "ConstraintSolver", "CovarianceMatrices", "CropRootBox", "CrystalInfoFramework", "Cthulhu", "DLMReader", "DPClustering", "Dagger", "DarkCurves", "DataFrameMacros", "DataFrameTools", "DataFramesMeta", "DataKnots", "DataSkimmer", "DecisionProgramming", "Diagonalizations", "DimensionalData", "Distances", "Diversity", "DrelTools", "DrillHoles", "EchelleCCFs", "EffectSizes", "Effects", "Equate", "FHIRClient", "FastAI", "Feather", "FeatureTransforms", "FeynmanDiagram", "FileTrees", "FinEtoolsFlexBeams", "FixedEffectModels", "FixedPointAcceleration", "FlowAtlas", "FlowWorkspace", "Folds", "FrameFun", "FunSQL", "Futbolista", "GBIF", "GLM", "GPLinearODEMaker", "GeoClustering", "GeoDatasets", "GeoLearning", "GeoStatsBase", "GlobalSearchRegression", "GlobalSensitivityAnalysis", "Graph500", "GraphDataFrameBridge", "GridapDistributed", "GroupedTemporalTerms", "GslibIO", "HITRAN", "Hadleyverse", "HighFrequencyCovariance", "HurdleDMR", "INMET", "ImageComponentAnalysis", "ImageGeoms", "InMemoryDatasets", "IncrementalPruning", "IndexedTables", "InformationGeometry", "InteractionWeightedDIDs", "InteractiveErrors", "IntervalTrees", "InvariantCausal", "IterTools", "JMcDM", "JSONLines", "JSONTables", "JWAS", "JlrsReflect", "JudiLing", "JuliaCon", "KCenters", "KeyedFrames", "LITS", "Lasso", "LazyGrids", "Legolas", "LegolasFlux", "LifeTable", "Lighthouse", "LinRegOutliers", "LocalAnisotropies", "LockandKeyLookups", "LoopVectorization", "LowLevelParticleFilters", "LsqFit", "MCMCChains", "MIPVerify", "MLJAbstractGPsGlue", "MLJEnsembles", "MLJLinearModels", "MLJModels", "MLJMultivariateStatsInterface", "MLJScientificTypes", "MatrixLMnet", "Meshes", "Metida", "MimiPAGE2020", "MixedModels", "MixedModelsExtras", "MixedModelsMakie", "MixedModelsPermutations", "MixedModelsSim", "ModelParameters", "Modia", "ModiaPlot_CairoMakie", "ModiaPlot_PyPlot", "ModiaResult", "MultiModalMuSig", "MusicManipulations", "MutualInformationImageRegistration", "NCBITaxonomy", "NNlib", "NaiveGAflux", "OMOPCommonDataModel", "ObservationDims", "Onda", "OndaEDF", "Optim", "OptimKit", "OrdinalGWAS", "OutlierDetection", "OutlierDetectionData", "OutlierDetectionNeighbors", "PNGFiles", "POMDPSimulators", "PackageCompiler", "ParameterSpacePartitions", "Persa", "PersistenceDiagramsBase", "Phylo", "PhyloNetworks", "PhyloPlots", "Pitchjx", "PkgUtility", "PlantBiophysics", "PlotMesh", "PlotlyBase", "PopGenCore", "PopGenSims", "PoreMatMod", "PorousMaterials", "PowerModelsAnnex", "PowerPlots", "PowerSimulations", "PowerSystems", "ProbabilisticCircuits", "ProgenyTestingTools", "ProgressiveHedging", "PyRhodium", "QuantumLattices", "QuantumTomography", "QuerySQLite", "Queryverse", "Qwind", "RData", "RDatasets", "RELOG", "ReadWriteDlm2", "Recommenders", "RegressionDiscontinuity", "RegressionTables", "Remark", "ReplicateBE", "ReportMetrics", "RigidBodyTools", "RipQP", "Ripserer", "RobustModels", "Run", "RvSpectMLPlots", "SQLite", "Santiago", "ScenTrees", "ScientificTypes", "ShapML", "Shapley", "SimpleChains", "SlackThreads", "SnoopCompile", "SolverBenchmark", "SortMark", "SpatialDependence", "SphericalHarmonicModes", "SpineBasedRecordLinkage", "SpmSpectroscopy", "StataDTAFiles", "StaticArrays", "StatsModels", "StochasticIntegrals", "Stonks", "Strapping", "StructArrays", "SugarKelp", "SunAsAStar", "SyntheticDatasets", "SystemBenchmark", "TMLE", "TableOperations", "TextAnalysis", "TidyStanza", "TimeSeries", "Tracking", "Transducers", "TypedTables", "UncertaintyQuantification", "UnitfulAssets", "UpROOT", "VectorSphericalHarmonics", "VisualSearchACTR", "WRDSMerger", "WiSER", "Wordlegames", "XLSX", "YAAD"], vs = ":master")

nanosoldier · 2022-03-24T02:31:49Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

This is consistent with what we would do for a non-constant return. This helps the compiler dump less code onto LLVM in situations where there is a big basic block that our constprop can see is dead. That said, for that particular situation, it might be better to have a constant propagation pass at the IR level that runs after inlining. We can revisit that after #44660.

aviatesk · 2022-09-02T00:37:53Z

Superseded by #44803.

Keno mentioned this pull request Mar 17, 2022

WIP: Cthulhu support for semi-concrete evaluation JuliaDebug/Cthulhu.jl#280

Closed

ViralBShah added the compiler:codegen Generation of LLVM IR and native code label Mar 17, 2022

Keno added compiler:inference Type inference and removed compiler:codegen Generation of LLVM IR and native code labels Mar 17, 2022

Keno force-pushed the kf/semiconcreteeval branch from 569a495 to a377b15 Compare March 21, 2022 21:52

Keno mentioned this pull request Mar 23, 2022

Store effects in OptimizationSource JuliaDebug/Cthulhu.jl#281

Merged

WIP: Semi-concrete IR interpreter

5905386

Keno force-pushed the kf/semiconcreteeval branch from a377b15 to 5905386 Compare March 24, 2022 01:09

This was referenced Mar 28, 2022

Pass around effects during inference cacheing #44777

Merged

Semi-concrete IR interpreter #44803

Merged

Keno mentioned this pull request May 2, 2022

Allow constprop to improve constant results to Bottom #45146

Merged

aviatesk closed this Sep 2, 2022

DilumAluthge deleted the kf/semiconcreteeval branch September 3, 2022 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Semi-concrete IR interpreter #44660

WIP: Semi-concrete IR interpreter #44660

Uh oh!

Keno commented Mar 17, 2022

Uh oh!

timholy commented Mar 17, 2022

Uh oh!

ianatol commented Mar 22, 2022

Uh oh!

nanosoldier commented Mar 23, 2022

Uh oh!

Keno commented Mar 24, 2022

Uh oh!

nanosoldier commented Mar 24, 2022

Uh oh!

aviatesk commented Sep 2, 2022

Uh oh!

Uh oh!

Uh oh!

WIP: Semi-concrete IR interpreter #44660

WIP: Semi-concrete IR interpreter #44660

Uh oh!

Conversation

Keno commented Mar 17, 2022

Background

This PR

Why is this a good idea

Threats to validity

Benchmark results

Current status

Uh oh!

timholy commented Mar 17, 2022

Uh oh!

ianatol commented Mar 22, 2022

Uh oh!

nanosoldier commented Mar 23, 2022

Uh oh!

Keno commented Mar 24, 2022

Uh oh!

nanosoldier commented Mar 24, 2022

Uh oh!

aviatesk commented Sep 2, 2022

Uh oh!

Uh oh!