解释使用标准化特征的模型

标准化特征是许多 ML 管道中常见的预处理步骤。在解释使用标准化特征的模型时,通常希望使用原始输入特征(而不是它们的标准化版本)来获得解释。本笔记本展示了如何使用以下属性来实现这一点:应用于模型输入的任何单变量变换都不会影响模型的 Shapley 值(请注意,像 PCA 分解这样的多变量变换确实会改变 Shapley 值,因此此技巧不适用于此处)。

构建使用标准化特征的线性模型

[1]:
import sklearn

import shap

# get standardized data
X, y = shap.datasets.california()
scaler = sklearn.preprocessing.StandardScaler()
scaler.fit(X)
X_std = scaler.transform(X)

# train the linear model
model = sklearn.linear_model.LinearRegression().fit(X_std, y)

# explain the model's predictions using SHAP
explainer = shap.explainers.Linear(model, X_std)
shap_values = explainer(X_std)

# visualize the model's dependence on the first feature
shap.plots.scatter(shap_values[:, 0])
Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
../../../_images/example_notebooks_tabular_examples_linear_models_Explaining_a_model_that_uses_standardized_features_2_1.png

将解释转换为原始特征空间

[2]:
# we add back the feature names stripped by the StandardScaler
for i, c in enumerate(X.columns):
    shap_values.feature_names[i] = c

# we convert back to the original data
# (note we can do this because X_std is a set of univariate transformations of X)
shap_values.data = X.values

# visualize the model's dependence on the first feature again, now in the new original feature space
shap.plots.scatter(shap_values[:, 0])
../../../_images/example_notebooks_tabular_examples_linear_models_Explaining_a_model_that_uses_standardized_features_4_0.png

有更多有用的示例的想法吗? 鼓励提交拉取请求以添加到此文档笔记本!