Bayesian Linear Regression
Now that we have a trivial package running, we can do something more interesting. We will start with a simple example of Bayesian linear regression, using a bike sharing data set^{2} that will be provided in a suitable format.
Model
The model is given by: $$ \begin{align} \sigma^2 &\sim \mathrm{InverseGamma}(3, 4/10) \\ \boldsymbol{\beta} \mid \sigma^2 &\sim \mathrm{Gaussian}(0, I\sigma^2) \\ y_n \mid \boldsymbol{\beta}, \sigma^2 &\sim \mathrm{Gaussian}(\mathbf{x}_n^{\top}\boldsymbol{\beta}, \sigma^2) \end{align} $$ The parameters of the model are the noise variance $\sigma^2$ and vector of coefficients $\boldsymbol{\beta}$. The data consists of observations $y_n$ and explanatory variables $\mathbf{x}_n$ for $n=1,\ldots,N$.
Implementation
To specify a model in Birch, we create a class that inherits from Model. Use your preferred text editor to create a file src/LinearRegressionModel.birch
and enter the following contents:
/**
* Bayesian linear regression model with conjugate normalinversegamma
* prior.
*/
class LinearRegressionModel < Model {
/**
* Explanatory variables.
*/
X:Real[_,_];
/**
* Regression coefficients.
*/
β:Random<Real[_]>;
/**
* Observation variance.
*/
σ2:Random<Real>;
/**
* Observations.
*/
y:Random<Real[_]>;
function simulate() {
let N < rows(X);
let P < columns(X);
if N > 0 && P > 0 {
σ2 ~ InverseGamma(3.0, 0.4);
β ~ Gaussian(vector(0.0, P), identity(P), σ2);
y ~ Gaussian(X*β, σ2);
}
}
}
The explanatory variables $X$ (the $\mathbf{x}_n$, as a matrix), observations $y$ ($y_n$, as a vector) and parameters $\beta$ and $\sigma^2$ have been declared as member variables of the class. Variables in Birch are typed. We see here a few different types:
Real
is a doubleprecision floating point number.Real[_]
is a vector ofReal
.Real[_,_]
is a matrix ofReal
.Random<Type>
declares a Random object of givenType
. Such randoms—as we refer to them—^{1}enable the important features of automatic marginalization, automatic conjugacy, and automatic differentiation. The former two of these are implemented in Birch using an algorithm called delayed sampling^{3}. It is particularly useful in this example, as it enables an analytical solution to the problem.
Tip
You can use Greek letters in Birch code. To enter them, you may need to install a separate keyboard in your operating system, or copy and paste from a character map.
The simulate()
member function implements the model:
 The
if
statement is merely defensive programming: it skips the model for the degenerate situation of no explanatory variables, or no data points.  The
~
operator attaches a distribution (a Distribution object) to a random.  The
let
keyword declares a variable, where the type of the variable is deduced from its initial value. An equivalent way to declareN
, for example, would beN:Integer < rows(X)
, but thelet
syntax often looks tidier.
The basic model is now implemented. It is worth building and running at this stage as a check. Build, as usual, with:
birch build
and run with:
birch sample model LinearRegressionModel
Running will not do anything interesting at this stage—for that we need some data!
Data
We will use a data set from the Capital Bikeshare system in Washington D.C. for the years 2011 to 2012. The aim is to use weather and holiday information to predict the total number of bike hires on any given day^{2}.
The data set has been preprocessed to onthot encode categorical variables, e.g. the season, a fourcategory variable, becomes four indicator variables. These conversions make it reasonable to attempt a linear regression. Each data point represents one day. The observation is of the logarithm of the total number of bike hires on that day.
Download the data set to the input/
directory. The file is in JSON format. Birch supports both JSON and YAML file formats. You can view and edit these files by hand with a text editor, or for larger files, write programs to generate and manipulate them.
Tip
If you’re wanting to stay on the command line, check out jq for working with JSON files.
For now, have a look at the contents of the file in a text editor, web browser, or on the command line (less input/bike_share.json
and hit q
when you’ve seen enough). It contains two variables: a matrix X
and a vector y
. We need to read these into the model.
Recall that, in defining the LinearRegressionModel
class, we overrode the simulate()
member function of the Model
class. The Model
class also has two other member functions called read()
and write()
that we can implement to read and write data.
Add the following two member functions after the simulate()
member function in the LinearRegressionModel
class:
function read(buffer:Buffer) {
X <? buffer.getRealMatrix("X");
y <? buffer.getRealVector("y");
}
function write(buffer:Buffer) {
buffer.set("β", β);
buffer.set("σ2", σ2);
}
The read()
member function reads from the input file into the member variables X
and y
. The strings "X"
and "y"
name the elements in the input file. The write()
member function writes the parameters to the output file.
The Buffer class provides the interface for easily reading and writing these. Its get
style member functions (e.g. the getReadMatrix()
and getRealVector()
used above) return optionals. An optional is just a variable that may or may not have a value. Here, if the requested element exists in the file, it has a value, otherwise it does not. The <?
is a special assignment operator that assigns a value to the variable on the left only if the optional on the right actually has a value.
Inference
Now rebuild:
birch build
and run:
birch sample \
model LinearRegressionModel \
input input/bike_share.json \
output output/linear_regression.json
This run command specifies the model class to use, the input file from which to read, and the output file to which to write.
View the output with:
cat output/linear_regression.json
You will see a single sample drawn from the posterior distribution.
Tip
We have already noted that this particular example has an analytical solution. We can, in fact, output this solution if preferred. To do so, update the write()
member function as follows, then rebuild and rerun:
function write(buffer:Buffer) {
if β.hasDistribution() {
buffer.set("β", β.getDistribution());
} else {
buffer.set("β", β);
buffer.set("σ2", σ2);
}
}
Tip
Debug mode is used by default. This mode enables all error checking and disables all optimizations to assist with debugging. Debug mode is recommended when developing and testing code. When you are happy that your code is working correctly, you can use release mode instead, which takes longer to build but runs much faster (several times so). Release mode is enabled by adding the option enablerelease
when calling birch
:
birch build enablerelease
birch sample enablerelease ...

We avoid referring to them as random variables, as that is a precise mathematical term, although they do represent this concept within the program. A more accurate description might be a random variate that has been drawn from some distribution, although they are not exactly this either. We stick with randoms. ↩

H. FanaeeT & J. Gama (2014). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence. 2:113127. ↩↩

L.M. Murray, D. Lundén, J. Kudlicka, D. Broman and T.B. Schön (2018). Delayed Sampling and Automatic Rao–Blackwellization of Probabilistic Programs. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Lanzarote, Spain. ↩