As a counterargument: it's precisely because of high-dimensional statistical learning that interpretability is a valuable trait. Yes, the power of modern ML is that it can handle situations that the designers did not explicitly design for--but this doesn't necessarily mean that it handles them well. For example, if your approval for a loan is subject to an AI and it rejected you, then you want to know why you were not approved. You'd want the reason your application was not granted to be something reasonable (like a poor credit history) and not something like "the particular combination of inputs triggered some weird path and rejected you offhand." Another example is machine vision for self-driving cars. You want the car to understand what a stop sign is and not just react to the color, otherwise the first pedestrian with a red jacket will bring the car to a screeching halt. Even though you may not have had red jackets in your training set (or may not have had enough so that misclassifications ended up contributing to your error percentage), you can verify the model works as intended using interpretability.
It's dangerous to treat this sort of models as a black box, as the details of how the model makes a decision is as important as the output; otherwise, how could it be trusted?
-
This topic is the subject of my thesis, so i am currently steeped in it. Let me know if I can answer any more questions!
While I accept your counterargument (esp. regarding credit scores, autonomous vehicles and the like) I think interpretability in deep models is only meaningful if you have accurate, objective labels. But how often is this the case? For example, when people refer to concepts, they often rely on some established culture, a common understanding, rather than things you can measure. Let's take the simple example of classifying spoiled fruit at the grocery store. You can train a ConvNet and it will probably learn to recognize some visual traits of "spoiledness", but how objective can it really be, given that humans don't always agree what "spoiled" really means? In other words, fruit is spoiled only when a large enough social group says it is spoiled. So if that is your label, then the model can only reflect the shared understanding of "spoiledness" in the given social group. Then interpretability can help you check if the model is looking at plausible things (e.g. the fruit, not the background). However, this won't really tell you "why" the model thinks this fruit is spoiled. You can draw an interesting analogy between this and ensemble models, where the "social group" of the models in the ensemble forms the shared culture.
Also and interesting interpretability approach that the article did not mention is SHAP[0]
Regarding accurate labels, a lack of appropriate or sufficient labeling will knock over your model regardless of how powerful your model is (interpretable or not). This is where other benefits of model interpretation come in—you can spot potential errors in your model's training, and that gives you an indication that you need to re-evaluate your base assumptions about the data.
SHAP is cool technology! It looks like it builds off LIME and similar to fit a hyperplane against the model surface. I'm not surprised the article didn't mention it though, as it's a bit in the weeds for an overview piece.
You could say the same thing about a person’s reasoning. If someone were to think a fruit is spoiled, does that say more about the fruit or about their understanding of spoiledness as you put it?
That argument falls apart as you leave the simplest elements of a problem. Sure, stop signs are important but that’s such a basic aspect of driving as to basically be irrelevant. All self driving AI’s are going to get really good at identifying stop signs very early in development, it’s stuff like mirages that designers may never consider that make or break such systems.
The easiest elements for humans to interpret are therefore the least important as they’re going to have plentiful training data and people checking for failures. Interpretability is therefore only really useful for toy problems that don’t actually need machine learning.
So, the ethics argument for ML explainability may not be particularly strong (although, once the regulators arrive that isn't going to matter), but the practical argument is extremely strong.
Any explanation method will give you more insight into how your model, features and data interact. This will allow you to improve the model, and also avoid insane features that are just capturing either information from the future or are not worth the processing effort.
Maybe it's that I came to data science from a theory-driven science point, but I don't really understand why you wouldn't want to be able to interpret your model.
> Maybe it's that I came to data science from a theory-driven science point, but I don't really understand why you wouldn't want to be able to interpret your model.
Because it limits your model to doing things you could understand, which thus makes it less powerful. In other words it’s a useful property only so far as you get it for free. A self driving car AI being understandable isn’t worth it killing more people in the field.
> There's no proof that uninterpretable models perform better
It’s a vastly larger solution space. So it’s really the reverse that would be surprising.
> In some sense, it's possible to interpret any model, the effort requires just varies.
Models can be of arbitrarily large sizes to the point where people really can’t understand them. How do you go about dissecting a 2 layer NN with 10^40th nodes?
> Models can be of arbitrarily large sizes to the point where people really can’t understand them. How do you go about dissecting a 2 layer NN with 10^40th nodes?
I'd probably take the output of the first layer, and cluster it.
It would very much depend on what this NN was attempting to do.
Like, all of the work in this field does suggest that people value this feature, and its incredibly useful for debugging which is generally pretty hard in ML/statistics.
> It’s a vastly larger solution space. So it’s really the reverse that would be surprising.
Imagine a world in which linear models returned the entire matrix by observation rather than the coefficients. People would argue that it was uninterpretable, but it's a problem of tools.
I actually think that if you can instrument a model appropriately, then you can definitely build an interpretation layer on top of it. Clearly that doesn't make the model perform worse.
Even if you can't instrument it, you can run thousands of experiments changing one feature at a time and then estimate the impact of this feature on the model. Granted, that's not practical on many problems, but neither were deep NN's a decade ago.
Do you have any material you can recommend as an introduction to interpretation of NN systems, or how to design NN-based ML to be good for interpretation?
My background is more classic ML and computer vision (features design, geometry, photogrammetry) and I've been very hesitant to use NN in a number of situations precisely because of things like auditability and being able to fix bugs in critical-for-customer edge cases.
R Guidotti et. al[0] wrote a good literature survey on black-box explainers, and contains a summary table on page 20 of the current state of the art.
In terms of designing NN-based ML, the above paper has some info and this paper by S Teso[1] is a good place to start looking further (though it is focused on XAL). SENNs are cool, but ultimately most inherenly interpretable models come down to classic ML (decision trees, linear/logreg, etc.) which is limiting compared to the power of NNs. Post-hoc explanations are basically the only option (esp. for DNNs).
The loan example comes up a lot but I'm not sure why. We already use machine learning to evaluate credit ratings in the US. It is not always a fair system but it is standard practice and nobody asks why they are turned down for a loan or demands to know how the system works (this information is proprietary and banks would claim their algorithms are a trade secret since a better algorithm gives them an edge on pricing loans).
>>You'd want the reason your application was not granted to be something reasonable (like a poor credit history) and not something like "the particular combination of inputs triggered some weird path and rejected you offhand."
This is a low dimensional bias. If there is increased risk of default from high A & B & C & D but not high A or high B or high C or high D, then the combination or parameters is what matters even if it is not easy to explain. Typically in a high dimensional space most of the volume is far from the axes so it is unlikely that things will line up along some preconceived set of inputs. As it is 'poor credit history' is in fact an index that amalgamates a large number of different parameters so I'm not sure if that really explains why the loan was rejected or simply gives a simple name for a complicated thing.
In general yes, it is good to thoroughly debug any ML algorithm and make sure that is is doing roughy what you think that it is doing. A lot of times this process can be quite complicated and relies on a lot of intuition & heuristics. While thoroughly testing a ML solution is certainly best practice, I'm not sure if having a highly skilled researcher conducting an in-depth mathematical analysis of an algorithm would really make it 'interpretable'.
I had experience developing automated underwriting models in the insurance industry. As the models are becoming more sophisticated and adapting machine learning, heavier scrutiny is coming from regulators.
For good reasons, guardrails are required to not only protect against discrimination, but also so proxies can't be used either. For example, we can't include race in health and life underwriting decisions. But since zip code is highly predictive of race, that attribute must also be excluded.
I'm not familiar with banking regulations, but imagine similar policies are applied. In these cases, being able to demonstrate that a model isn't discriminating is not only ethically important, but in many cases is legally required.
While you as a person may not be able to get details of the algorithms and methods used, the regulators get exhaustive documentation about every aspect of them.
With GDPR, explainability is going to become a requirement, but it will probably take some test cases before this happens.
I’m pretty sure that despite the fact that statistical models are used in credit scores, those models are subject to regulatory review. For example, I’m pretty sure a “black, therefore, no loan” algorithm is illegal, regardless of whether that parameter was learned or programmed. It’s a huge problem in some of these models that latent racism can bubble up into a putatively unbiased algorithm. In some industries, including finance, I think that can expose to some regulatory risk, hence explainability being quite helpful.
> This is a low dimensional bias. If there is increased risk of default from high A & B & C & D but not high A or high B or high C or high D, then the combination or parameters is what matters even if it is not easy to explain.
However, even if we expect that a non-obvious combination of parameters will matter, we usually expect the hyperplane of our predictions to be at least a little bit smooth in various ways: monotonic or curved instead of jagged, small changes in input should cause only small changes in output, etc. Not just to make it easier to understand, also because the kinds of processes we study tend to behave that way.
For regions of high density, machine learning does exactly what you say it does: generate high-quality predictions or categorizations, even if the particular path that led it there is nonobvious or weird. But these models are generally not sensitive to how they categorize or predict unusual combinations of inputs and as to predictive quality for those edge cases, all bets are off. A very simple case is polynomial regression, which can be tuned to perfectly fit the training data but outside of the training set might oscillate wildly or go to infinity -- and this isn't really the result of overfitting, it's just what polynomials do.
The loan example comes up because of a long history of explicit discrimination in the US against Black borrowers (https://en.wikipedia.org/wiki/Redlining). Because of that history, people are justifiably skeptical when someone says: "I know that this industry discriminated against you in the past, but trust me, this new and completely opaque system is TOTALLY not going to discriminate against you anymore." Especially when the new systems have demonstrable racial bias (https://www.realtor.com/news/trends/black-communities-higher... among many many other sources).
What if an interpretable model is worse at telling stop signs from jackets than an uninterpretable model? Should we use the worse model because we value interpretability?
This is the type of hypothetical that kills the discussion, though.
If the model is interpretable, you have a high chance of knowing why it does or does not tell a stop sign from a jacket. If it is not, you only know that in your test/validation set, it can do the job.
Even tasks that machine learning clearly excels at is currently in a state where all good uses of it has a human supervisor at some level. Recognizing faces, as an example. For my personal library, I absolutely have to disambiguate the recognized faces of my kids as they get older in all of the products I've used.
If we value interpretability for the particular model, e.g. as in the loan example or where by law you have to make sure race was not a consideration, I'd say yes. In places where interpretability has no additional value, than of course no.
But it of course depends on the exact value trade-off, which any model designer already has to consider.
Yes, because in the interpretable model, the fix can also let you check robustness against flags and mailboxes at the same time. Not everything is correctly captured in the test dataset, so we need levels of abstraction that let us be more general.
You can use an uninterpretable model in conjunction with a post-hoc explainer—and in fact, this is most often how explainers are used. This gives you the best of both worlds: powerful models and auditability for their decisions.
It's dangerous to treat this sort of models as a black box, as the details of how the model makes a decision is as important as the output; otherwise, how could it be trusted?
-
This topic is the subject of my thesis, so i am currently steeped in it. Let me know if I can answer any more questions!