Interpretable Machine Learning

Some notes on the small book Interpretable Machine Learning by Christoph Molnar.

Introduction and overview

Christoph Molnar overviews in his book the concept of model interpretability, and surveys several ways to help ML engineers design more interpretable machine learning model, i.e. enable humans to understand the reasons why the ML model outputs what it outputs instead of any the other possible answers. The motivation for model interpretability is qui immediate, as a matter of fact it is often observed that results from a recommender system such as YouTube video recommendations are somewhat obsucre, and the more importance machine learning algorithms take into our lives, the more important it will be that we are able to understand the "whys" of the ML models decisions.

Efforts are being made by many of the major actors, as for example Amazon's recommendation systems providing a set of articles that are commonly bought together, or the "customers that bought what you just bought also often purchased this other article" and such. However, it is far from being widespread, and it also has an important impact for many many companies or labs that usually focus on ML model performance and trust almost blindly the output of their model. Note that this is not only applicable to ML, it is also often the case for sufficiently complicated models and algorithms such are optimization (among which discrete optimization is probably a great candidate), where we are unable to track or get an intuition of why the result is such and not something completely different.

The consequences of bad model interpretability are numerous, but I see one very important point, which is functional model debugging failures. If you trust you model/algorithm blindly, it then becomes very hard to see if the model behaves as it should. For example, imagine that you have a discrete optimization problem, which is NP-hard and that you solve using a custom algorithm. Since the problem is too complex, it is nearly impossible (or too long) to know the optimal solution for a large instance, and you are left with trusting your algorithm or only confronting your algorithm to the known optimal on very small instances that may not be representative of larger instances, and certainly not a proof. In this kind of situations, it becomes very important to have some other ways of checking the result of your algorithm, its consistency, why the result on such instance is much lower than on other similar instances, etc.

The same applies to machine learning, with an even more critical factor which is that ML models are much more obscure on their internal mechanisms. Indeed, unlike instruction-based algorithms which are easily probed and for which each step bears meaning, machine learning models internal steps bear no meaning and are only the result of the model training on data.

Model interpretability

What the author defines as interpretability is the capacity of a model/solution to justify its output. First he mentions that there are models that are intrinsincly interpretables sur as linear regressions, decisions trees and such where the internal parameter values already offers a sufficiently good explanation of the output. For example, for a linear regression, the weights of each feature will tell us how important are each feature compared to the others.

But for more sophisticated model, interpretability is obtained through more indirect or approximation methods, among which:

Partial dependance plots (WIP)
Individual Conditional Expectations (WIP)
Accumulated Local Effects (WIP)
Feature Interaction (WIP)
Feature Importance (WIP)
Global Surrogate, which consist of approximating the output of the model by a simple, intepretable machine learning model.
Local Surrogate, which is the same as global but only for the neighborhood of an instance.
Shapley Values, inspired from game-theory where the relative importance of each feature in the "coalition" is derived from similar instances in the dataset.

Interpretability evaluation

Interpretability is not easily evaluated since there is no quantification of interpretability and since it depends largely on the social and technological context of the recipient of explanations. Obviously the explaination of the model output will not be the same for an field expert and for the end customer of a recommender system.

Several criteria of evaluation are underline in the book, first as general criteria for an explanation method, then for individual explanation produced by such methods:

Explanation power: how expressive the provided explanations are (do they use natural language, if-then-else type conditions...)
Translucency: how related to the model internal is the explanation (for example an explanation consisting in the weights for a linear model is a very translucent explanation)
Portability: how tied to a model is the explanation, i.e. can it be ported to other models without modifications (in the last example, the weight is not a portable explanation)
Algorithmic complexity

Individual explanations:

Accuracy: how fit to the data is the explanation
Fidelity: how fit to the model output is the explanation
Consistency: how robust to a model change is the explanation (closely related to the method portability)
Stability (Always desirable): how robust to some small perturbations is the explanation
Comprehensibility
Certainty: how sure / confident in the explanation
Degree of Importance: feature importance for explaining the data
Novelty of the data: how exceptional is an instance compared to the known dataset
Representativeness: range covered by the explanation

Human friendly explanations

For a model to be easily intepretable by humans the author argues that the explanations should follow several rules:

Explanations have to be selected: a good practice is to refrain from providing every possible bit of explainatory data and to limit the explanation to the 1-3 most important pieces of explanations.
Explanations should be constrastive: usually humans prefer explanations that compare the instance with similar one that have a different ouput: "why is this instance so special that it doesn't output the same result as intuitively similar instances ?".
Explanations should be targeted according to the target of the explanation
Explanations should focus on abnormal data whenever possible: the most important piece of explanation is not always the most numerically significant but maybe the one which bears the highest abnormality.
Explanations should be as truthful as possible, i.e. it should remain applicable on other instances and not be invalidated
If no abnormality can be found, a general and probable explanation is also preferable to many pieces of unexceptional explanations.

Interpretable Models - Linear Regression

Linear regressions are simple models, which force the output to be a linear combination of the inputs. This means that the model is additive and thus easily explainable. The weights are the explaination.

Assumptions used to compute the confidence intervals:

Linearity: forced by linear regression
Normality: the probability of the target outcome given the feature should follow a normal distribution.
Homoscedasticity: (constant variance) The variance of the error terms is assumed to be constant, which is generally not verified (variance typically often increase for large values)
Independence: assumption that each instance in independent of any other, often not verified when you have several repeated measurements. If this is not the cas you need to have specific linear regression models such as mixed effect models of GEE (//TODO).
Fixed features: inputs are considered exact and without measurement errors (always wrong, but it would be highly impractical otherwise)
Absence of multicollinearity: When two features are strongly correlated, it blurrs the importance of the two (weights could go either way, and the model would be as efficient with only one of them)

Interpretation

Interpretation depends on the type of feature:

numerical: increase in feature --> increase * weight on outcome
binary/categorical: presence/absence/selection --> weight on outcome
intercept: if features are normalized and bin/cat 0 = reference --> outcome of all feature at their mean

R-squared

Another important measurement is the R-squared, which tells how much of the total variance of the target outcome is explained by the model.

Higher R-squarred is better.
R² = 1 - square sum of errors / square sum of data variance.
R² increases with the number of features, so it is better to use the adjusted R² = 1 - (1 - R²) (n-1)/(n-p-1) where p is the number of features and n the number of instances
low adjusted R² --> not interpretable because it does not explain much of the variance.

Feature importance

Importance of a feature in LR can be measured by the absolute value of its T-statistic, which is the estimated weight scaled with its standard error. (standard error = standard deviation of the outcome in a outcome = intercept + feature_weight * feature_value function)

--> possible to plot for each feature (like a facet plot) the y = i + w*x curve, with standard error shown, which highlights the distribution of the ground truths around the predictions.

Visual interpretation

Weight plots (https://christophm.github.io/interpretable-ml-book/limo.html#weight-plot)

Weight plots show for each feature the weight estimate and the standard error.

Low SE represents a reliable feature
High weight estimate means a high influence in the outcome
Scaling makes the estimate weights more comparable

Effect plots

Box plots for each feature. Only effects are represented. Effect = weight * value.

vertical line = effect of the median
box = 25% and 75% quantiles effect
horizontal line = span +- 1.5 InnerQuartileRange
dots = outliers

Explain individual predictions

Position the individual feature effects on an Effect plot: it enables to see how and why the outcome was decided (in particular, outlier effects are interesting)

Encoding Categorical Features

Two encoding presented:

Treatment coding: N - 1 features for N categories, 1 hot encoding
Effect coding: Compare each category to the mean and use this value for encoding (only N-1 categories encoded)

Do Linear models create good explanations ?

"linear models do not create the best explanations"

Contrastive, but the reference instance is a data point where all numerical are 0 and categorical are at their reference category (usually meaningless). If all features are mean centered, and cat features are effect coded, then the reference instance is the point where all features are at the mean.
Selectivity can be achieved by LR using less features or training sparse linear models, but by default, explainations are not selective.
Truthfulness: yes as long as the Linear Model is appropriate (aR² high).
Linearity makes the explanation mode general and simple

Sparse Linear Models

Regularization makes the model more frugal.

By adding a lambda * norm of the weights, in the minimization term, it penalizes models that have a lot of non null weights. The higher the lambda, the less feature are going into the model.
Usually, lambda is tuned during cross-validation

Some quotations

A change in a feature by one unit changes the odds ratio (multiplicative) by a factor of exp(βj). We could also interpret it this way: A change in xj by one unit increases the log odds ratio by the value of the corresponding weight.

These are the interpretations for the logistic regression model with different feature types:

Numerical feature: If you increase the value of feature xj by one unit, the estimated odds change by a factor of exp(βj)
Binary categorical feature: One of the two values of the feature is the reference category (in some languages, the one encoded in 0). Changing the feature xj from the reference category to the other category changes the estimated odds by a factor of exp(βj).
Categorical feature with more than two categories: One solution to deal with multiple categories is one-hot-encoding, meaning that each category has its own column. You only need L-1 columns for a categorical feature with L categories, otherwise it is over-parameterized. The L-th category is then the reference category. You can use any other encoding that can be used in linear regression. The interpretation for each category then is equivalent to the interpretation of binary features.
Intercept β0: When all numerical features are zero and the categorical features are at the reference category, the estimated odds are exp(β0). The interpretation of the intercept weight is usually not relevant.

Another disadvantage of the logistic regression model is that the interpretation is more difficult because the interpretation of the weights is multiplicative and not additive.

On the good side, the logistic regression model is not only a classification model, but also gives you probabilities. This is a big advantage over models that can only provide the final classification. Knowing that an instance has a 99% probability for a class compared to 51% makes a big difference.