shap.utils.hclust

shap.utils.hclust(X: _ArrayLike, y: _ArrayLike | None = None, linkage: Literal['single', 'complete', 'average'] = 'single', metric: str = 'auto', random_state: int | np.random.RandomState = 0) → np.ndarray

为特征 X 相对于目标变量 y 拟合一个分层聚类模型。

有关聚类方法的更多信息，请参阅 scipy.cluster.hierarchy.linkage()。

有关 scipy 距离度量的更多信息，请参阅 scipy.spatial.distance.pdist()。

参数:

X: 2d-array-like

要聚类的特征

y: array-like 或 None

目标变量

linkage: str

定义计算聚类之间距离的方法。必须是 “single”、“complete” 或 “average” 之一。

metric: str

Scipy 距离度量或 “xgboost_distances_r2”。

如果 xgboost_distances_r2，则使用 shap.utils.xgboost_distances_r2() 估计特征 X 相对于目标变量 y 的冗余距离。
否则，使用给定的距离度量计算特征之间的距离。
如果 auto (默认)，如果提供了目标变量，则使用 xgboost_distances_r2，否则使用 cosine 距离度量。

random_state: int 或 np.random.RandomState

Numpy 随机状态，默认为 0。

返回:

clustering: np.array: 分层聚类编码为连接矩阵。