Skip to content

Bayesian Linear Regression

Now that we have a trivial package running, we can do something more interesting. We will start with a simple example of Bayesian linear regression, using a bike sharing data set2 that will be provided in a suitable format.

Model

The model is given by: $$ \begin{align} \sigma^2 &\sim \mathrm{InverseGamma}(3, 4/10) \\ \boldsymbol{\beta} \mid \sigma^2 &\sim \mathrm{Gaussian}(0, I\sigma^2) \\ y_n \mid \boldsymbol{\beta}, \sigma^2 &\sim \mathrm{Gaussian}(\mathbf{x}_n^{\top}\boldsymbol{\beta}, \sigma^2) \end{align} $$ The parameters of the model are the noise variance $\sigma^2$ and vector of coefficients $\boldsymbol{\beta}$. The data consists of observations $y_n$ and explanatory variables $\mathbf{x}_n$ for $n=1,\ldots,N$.

Implementation

To specify a model in Birch, we create a class that inherits from Model. Use your preferred text editor to create a file src/LinearRegressionModel.birch and enter the following contents:

/**
 * Bayesian linear regression model with conjugate normal-inverse-gamma
 * prior.
 */
class LinearRegressionModel < Model {
  /**
   * Explanatory variables.
   */
  X:Real[_,_];

  /**
    * Regression coefficients.
    */
  β:Random<Real[_]>;

  /**
    * Observation variance.
    */
  σ2:Random<Real>;

  /**
    * Observations.
    */
  y:Random<Real[_]>;

  function simulate() {
    let N <- rows(X);
    let P <- columns(X);
    if N > 0 && P > 0 {
      σ2 ~ InverseGamma(3.0, 0.4);
      β ~ Gaussian(vector(0.0, P), identity(P), σ2);
      y ~ Gaussian(X*β, σ2);
    }
  }
}

The explanatory variables $X$ (the $\mathbf{x}_n$, as a matrix), observations $y$ ($y_n$, as a vector) and parameters $\beta$ and $\sigma^2$ have been declared as member variables of the class. Variables in Birch are typed. We see here a few different types:

  • Real is a double-precision floating point number.
  • Real[_] is a vector of Real.
  • Real[_,_] is a matrix of Real.
  • Random<Type> declares a Random object of given Type. Such randoms—as we refer to them—1enable the important features of automatic marginalization, automatic conjugacy, and automatic differentiation. The former two of these are implemented in Birch using an algorithm called delayed sampling3. It is particularly useful in this example, as it enables an analytical solution to the problem.

Tip

You can use Greek letters in Birch code. To enter them, you may need to install a separate keyboard in your operating system, or copy and paste from a character map.

The simulate() member function implements the model:

  • The if statement is merely defensive programming: it skips the model for the degenerate situation of no explanatory variables, or no data points.
  • The ~ operator attaches a distribution (a Distribution object) to a random.
  • The let keyword declares a variable, where the type of the variable is deduced from its initial value. An equivalent way to declare N, for example, would be N:Integer <- rows(X), but the let syntax often looks tidier.

The basic model is now implemented. It is worth building and running at this stage as a check. Build, as usual, with:

birch build

and run with:

birch sample --model LinearRegressionModel

Running will not do anything interesting at this stage—for that we need some data!

Data

We will use a data set from the Capital Bikeshare system in Washington D.C. for the years 2011 to 2012. The aim is to use weather and holiday information to predict the total number of bike hires on any given day2.

The data set has been preprocessed to ont-hot encode categorical variables, e.g. the season, a four-category variable, becomes four indicator variables. These conversions make it reasonable to attempt a linear regression. Each data point represents one day. The observation is of the logarithm of the total number of bike hires on that day.

Download the data set to the input/ directory. The file is in JSON format. Birch supports both JSON and YAML file formats. You can view and edit these files by hand with a text editor, or for larger files, write programs to generate and manipulate them.

Tip

If you’re wanting to stay on the command line, check out jq for working with JSON files.

For now, have a look at the contents of the file in a text editor, web browser, or on the command line (less input/bike_share.json and hit q when you’ve seen enough). It contains two variables: a matrix X and a vector y. We need to read these into the model.

Recall that, in defining the LinearRegressionModel class, we overrode the simulate() member function of the Model class. The Model class also has two other member functions called read() and write() that we can implement to read and write data.

Add the following two member functions after the simulate() member function in the LinearRegressionModel class:

    function read(buffer:Buffer) {
      X <-? buffer.getRealMatrix("X");
      y <-? buffer.getRealVector("y");
    }

    function write(buffer:Buffer) {
      buffer.set("β", β);
      buffer.set("σ2", σ2);
    }

The read() member function reads from the input file into the member variables X and y. The strings "X" and "y" name the elements in the input file. The write() member function writes the parameters to the output file.

The Buffer class provides the interface for easily reading and writing these. Its get style member functions (e.g. the getReadMatrix() and getRealVector() used above) return optionals. An optional is just a variable that may or may not have a value. Here, if the requested element exists in the file, it has a value, otherwise it does not. The <-? is a special assignment operator that assigns a value to the variable on the left only if the optional on the right actually has a value.

Inference

Now rebuild:

birch build

and run:

birch sample \
    --model LinearRegressionModel \
    --input input/bike_share.json \
    --output output/linear_regression.json

This run command specifies the model class to use, the input file from which to read, and the output file to which to write.

View the output with:

cat output/linear_regression.json

You will see a single sample drawn from the posterior distribution.

Tip

We have already noted that this particular example has an analytical solution. We can, in fact, output this solution if preferred. To do so, update the write() member function as follows, then re-build and re-run:

    function write(buffer:Buffer) {
      if β.hasDistribution() {
        buffer.set("β", β.getDistribution());
      } else {
        buffer.set("β", β);
        buffer.set("σ2", σ2);
      }
    }

Tip

Debug mode is used by default. This mode enables all error checking and disables all optimizations to assist with debugging. Debug mode is recommended when developing and testing code. When you are happy that your code is working correctly, you can use release mode instead, which takes longer to build but runs much faster (several times so). Release mode is enabled by adding the option --enable-release when calling birch:

birch build --enable-release
birch sample --enable-release ...

  1. We avoid referring to them as random variables, as that is a precise mathematical term, although they do represent this concept within the program. A more accurate description might be a random variate that has been drawn from some distribution, although they are not exactly this either. We stick with randoms

  2. H. Fanaee-T & J. Gama (2014). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence. 2:113-127. 

  3. L.M. Murray, D. Lundén, J. Kudlicka, D. Broman and T.B. Schön (2018). Delayed Sampling and Automatic Rao–Blackwellization of Probabilistic Programs. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Lanzarote, Spain.