Skip to content

Mention likely multi-threading issues on BVAR notebook #550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
emanuele opened this issue May 23, 2023 · 12 comments
Open

Mention likely multi-threading issues on BVAR notebook #550

emanuele opened this issue May 23, 2023 · 12 comments
Labels
question Further information is requested

Comments

@emanuele
Copy link

emanuele commented May 23, 2023

Describe the issue:

When running the Bayesian Vector Autoregressive Models — PyMC example gallery 1 on x86_64 (tested on Ubuntu 20.04 + fresh install of PyMC via official installation instructions), the sampling time explodes to tens of hours instead of the expected few minutes. I can reproduce the issue on multiple machines.

Differently, on arm64 (M1 Macbook) the issue does not occur, and the sampling time is a few minutes as expected.

Below is a minimal example, extracted from the notebook above, that consistently shows the issue on x86_64.

The critical line is l.113, where pm.sample() has default values and the NUTS sampler is auto-assigned.

By manually specifying pm.sample(cores=1, ... or using a non-default NUTS sampler (nuts_sampler="blackjax", or nuts_sampler="numpyro"), the issue disappears on x86_64.

Related discussion on Discord here.

Screenshot of the issue:
Screen Shot 2023-05-22 at 2 10 44 PM

Reproduceable code example:

import numpy as np
import pandas as pd
import pymc as pm


RANDOM_SEED = 8927
rng = np.random.default_rng(RANDOM_SEED)

def simulate_var(
    intercepts, coefs_yy, coefs_xy, coefs_xx, coefs_yx, noises=(1, 1), *, warmup=100, steps=200
):
    draws_y = np.zeros(warmup + steps)
    draws_x = np.zeros(warmup + steps)
    draws_y[:2] = intercepts[0]
    draws_x[:2] = intercepts[1]
    for step in range(2, warmup + steps):
        draws_y[step] = (
            intercepts[0]
            + coefs_yy[0] * draws_y[step - 1]
            + coefs_yy[1] * draws_y[step - 2]
            + coefs_xy[0] * draws_x[step - 1]
            + coefs_xy[1] * draws_x[step - 2]
            + rng.normal(0, noises[0])
        )
        draws_x[step] = (
            intercepts[1]
            + coefs_xx[0] * draws_x[step - 1]
            + coefs_xx[1] * draws_x[step - 2]
            + coefs_yx[0] * draws_y[step - 1]
            + coefs_yx[1] * draws_y[step - 2]
            + rng.normal(0, noises[1])
        )
    return draws_y[warmup:], draws_x[warmup:]


### Define a helper function that will construct our autoregressive step for the marginal contribution of each lagged
### term in each of the respective time series equations
def calc_ar_step(lag_coefs, n_eqs, n_lags, df):
    ars = []
    for j in range(n_eqs):
        ar = pm.math.sum(
            [
                pm.math.sum(lag_coefs[j, i] * df.values[n_lags - (i + 1) : -(i + 1)], axis=-1)
                for i in range(n_lags)
            ],
            axis=0,
        )
        ars.append(ar)
    beta = pm.math.stack(ars, axis=-1)

    return beta


### Make the model in such a way that it can handle different specifications of the likelihood term
### and can be run for simple prior predictive checks. This latter functionality is important for debugging of
### shape handling issues. Building a VAR model involves quite a few moving parts and it is handy to
### inspect the shape implied in the prior predictive checks.
def make_model(n_lags, n_eqs, df, priors, mv_norm=True, prior_checks=True):
    coords = {
        "lags": np.arange(n_lags) + 1,
        "equations": df.columns.tolist(),
        "cross_vars": df.columns.tolist(),
        "time": [x for x in df.index[n_lags:]],
    }

    with pm.Model(coords=coords) as model:
        lag_coefs = pm.Normal(
            "lag_coefs",
            mu=priors["lag_coefs"]["mu"],
            sigma=priors["lag_coefs"]["sigma"],
            dims=["equations", "lags", "cross_vars"],
        )
        alpha = pm.Normal(
            "alpha", mu=priors["alpha"]["mu"], sigma=priors["alpha"]["sigma"], dims=("equations",)
        )
        data_obs = pm.Data("data_obs", df.values[n_lags:], dims=["time", "equations"], mutable=True)

        betaX = calc_ar_step(lag_coefs, n_eqs, n_lags, df)
        betaX = pm.Deterministic(
            "betaX",
            betaX,
            dims=[
                "time",
            ],
        )
        mean = alpha + betaX

        if mv_norm:
            n = df.shape[1]
            ## Under the hood the LKJ prior will retain the correlation matrix too.
            noise_chol, _, _ = pm.LKJCholeskyCov(
                "noise_chol",
                eta=priors["noise_chol"]["eta"],
                n=n,
                sd_dist=pm.HalfNormal.dist(sigma=priors["noise_chol"]["sigma"]),
            )
            obs = pm.MvNormal(
                "obs", mu=mean, chol=noise_chol, observed=data_obs, dims=["time", "equations"]
            )
        else:
            ## This is an alternative likelihood that can recover sensible estimates of the coefficients
            ## But lacks the multivariate correlation between the timeseries.
            sigma = pm.HalfNormal("noise", sigma=priors["noise"]["sigma"], dims=["equations"])
            obs = pm.Normal(
                "obs", mu=mean, sigma=sigma, observed=data_obs, dims=["time", "equations"]
            )

        if prior_checks:
            idata = pm.sample_prior_predictive()
            return model, idata
        else:
            idata = pm.sample_prior_predictive()
            idata.extend(pm.sample(draws=2000, random_seed=130))
            pm.sample_posterior_predictive(idata, extend_inferencedata=True, random_seed=rng)
    return model, idata


if __name__=="__main__":

    
    var_y, var_x = simulate_var(
        intercepts=(18, 8),
        coefs_yy=(-0.8, 0),
        coefs_xy=(0.9, 0),
        coefs_xx=(1.3, -0.7),
        coefs_yx=(-0.1, 0.3),
    )
    df = pd.DataFrame({"x": var_x, "y": var_y})
    df.head()
    
    n_lags = 2
    n_eqs = 2
    priors = {
        "lag_coefs": {"mu": 0.3, "sigma": 1},
        "alpha": {"mu": 15, "sigma": 5},
        "noise_chol": {"eta": 1, "sigma": 1},
        "noise": {"sigma": 1},
    }
    model, idata_fake_data = make_model(n_lags, n_eqs, df, priors, prior_checks=False)

Error message:

No error message, just the code hanging (almost) forever.

PyMC version information:

Fresh install of PyMC via official installation instructions.

%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor,aeppl,xarray
----
Last updated: Tue May 23 2023

Python implementation: CPython
Python version       : 3.11.3
IPython version      : 8.13.2

pytensor: 2.11.2
aeppl   : not installed
xarray  : 2023.5.0

pymc     : 5.3.0
watermark: 2.3.1
pandas   : 2.0.1
numpy    : 1.24.3

Watermark: 2.3.1

Context for the issue:

  1. This example in the PyMC documentation becomes unusable for a fair share of users.
  2. The reason behind the issue may affect PyMC users in general, irrespective of the example.
@welcome
Copy link

welcome bot commented May 23, 2023

Welcome Banner
🎉 Welcome to PyMC! 🎉 We're really excited to have your input into the project! 💖

If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.

@emanuele emanuele changed the title BUG: Unexpected extremely low sampling rate on x86_64 (Ubuntu 20.04) BUG: Unexpected extremely low sampling rate on x86_64 (Linux) May 24, 2023
@ricardoV94
Copy link
Member

ricardoV94 commented May 24, 2023

I can reproduce in my (4x2) core CPU. I think it boils down to BLAS/ LAPACK multithreading.
By default it saturates all the cores.

Setting the MKL threads to 1 fixes it for me:

%env MKL_NUM_THREADS=1
%env OPENBLAS_NUM_THREADS=1

Or setting the number of threads to 2 and reducing the number of chains to 2/3 is also fine.

More context for this can be found in:
https://discourse.pymc.io/t/regarding-the-use-of-multiple-cores/4249/3
https://discourse.pymc.io/t/nuts-uses-all-cores/909/10

Maybe it's useful to add a warning message at the top of that notebook?

Part of this may be helped by fixing pymc-devs/pymc#6717

@ricardoV94 ricardoV94 added the question Further information is requested label May 24, 2023
@emanuele
Copy link
Author

Good points @ricardoV94 . But why this happens only on Linux? Why specifically with this problem?

Anedoctically, it is the first time I see multiprocessing/multithreading being a problem on Linux and not on Windows or MacOS :D

@ricardoV94
Copy link
Member

ricardoV94 commented May 24, 2023

My guess is that neither of those are actually triggering multi-threading. For instance, I think OpenMP (which could be another indirect source of multi-threading) is not installed by default on MacOs. Not sure without having a machine to try those out.

The way Python multi-processes are created is also different across those 3 operating systems IIRC, that could also matter.

@ricardoV94 ricardoV94 transferred this issue from pymc-devs/pymc Jun 13, 2023
@ricardoV94 ricardoV94 changed the title BUG: Unexpected extremely low sampling rate on x86_64 (Linux) Mention multi-threading issues on BVAR notebook Jun 13, 2023
@ricardoV94 ricardoV94 changed the title Mention multi-threading issues on BVAR notebook Mention likely multi-threading issues on BVAR notebook Jun 13, 2023
@ricardoV94
Copy link
Member

ricardoV94 commented Jul 7, 2023

Another user afflicted with this: https://discourse.pymc.io/t/bayesian-var-multivariate-slow-performance/12463/2?u=ricardov94

I think we should change the flags explicitly on the pymc example and have a warning message on why we are doing this. Not everyone needs it, but many users do. We can mention users can try to turn it off and see if it still samples fast.

@OriolAbril
Copy link
Member

Updating the flags sounds great

@emanuele
Copy link
Author

What is the suggested way to update the flags just in the case of Linux? A simple cell at the beginning of the notebook like

%env MKL_NUM_THREADS=1
%env OPENBLAS_NUM_THREADS=1

is not conditional on the operating system. Is checking via sys.platform the way to go? Or PyMC has a more structured way?

@ricardoV94
Copy link
Member

I think those flags are fine in most OSes?

@ricardoV94
Copy link
Member

It would also be worth testing on the lastest PyMC (5.6.0) release as we managed to remove some useless Blas operations on MvNormal models. The problem may not exist anymore. We would need someone who had problems before to test it out.

@emanuele
Copy link
Author

emanuele commented Jul 12, 2023

Just tested on Linux with PyMC v5.6.0 with the code above without setting the flags and, unfortunately, the problem still persists. Maybe the sampling time diverges less dramatically: sampling reaches 10% in 3 minutes and all cores are completely filled (mostly in red color with htop, which means kernel time, so not what we want here). But after setting the flags as above, 10% of sampling requires <30 seconds.

%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor,aeppl,xarray

gives:

Last updated: Wed Jul 12 2023

Python implementation: CPython
Python version       : 3.9.7
IPython version      : 8.14.0

pytensor: 2.12.3
aeppl   : not installed
xarray  : 2023.6.0

pymc  : 5.6.0
numpy : 1.25.1
pandas: 2.0.3

Watermark: 2.4.3

@emanuele
Copy link
Author

I think those flags are fine in most OSes?

I really have no idea :)

@ricardoV94
Copy link
Member

Yes I think they work in other OSes, and won't hurt if not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants