文本到文本解释:抽象摘要示例

本笔记本演示了如何在预训练的 Transformer 模型上,针对文本到文本的场景生成模型解释。下面,我们将演示在 Hugging Face 提供的 Extreme Summarization (XSum) 数据集上,为预训练模型 distilbart 生成解释的过程 (https://hugging-face.cn/sshleifer/distilbart-xsum-12-6)。

第一个示例仅需要模型和分词器,我们使用模型解码器来生成要解释的输出 tokens 的对数几率。在第二个示例中,我们将演示如何为 API/函数形式(输入->文本,输出->文本)的模型生成解释。在这种情况下,我们需要使用文本相似度模型来近似对数几率。用于计算 SHAP 值的底层解释器是分区解释器。

[1]:
import numpy as np
import torch
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

import shap

加载模型和分词器

[2]:
tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-xsum-12-6")
model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-xsum-12-6").cuda()

加载数据

[3]:
dataset = load_dataset("xsum", split="train")
Using custom data configuration default
Reusing dataset xsum (/home/slundberg/.cache/huggingface/datasets/xsum/default/1.2.0/f9abaabb5e2b2a1e765c25417264722d31877b34ec34b437c53242f6e5c30d6d)
[4]:
# slice inputs from dataset to run model inference on
s = dataset["document"][0:1]

创建解释器对象

[5]:
explainer = shap.Explainer(model, tokenizer)

计算 SHAP 值

[6]:
shap_values = explainer(s)
floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
Partition explainer: 2it [00:19,  9.52s/it]

可视化 SHAP 解释

[7]:
shap.plots.text(shap_values)


[0]
输出
New
Welsh
Rugby
Union
chairman
Gareth
Davies
says
a
£
3
.
3
m
players
'
fund
should
be
used
to
keep
stars
in
Wales
.


-2-5-814-6.11448-6.11448base value-1.21992-1.21992fNew(inputs)1.057 succeeding deposed David Pickering following governing body elections. 0.92 Former Wales and British and Irish Lions fly-half Davies became 0.827 WRU chairman on Tuesday 21 October, 0.338 ; this is not region or WRU - I'd rather spend that money on keeping players in Wales," said Davies. 0.337 He is now serving a notice period to leave his role as Newport Gwent 0.306 Davies was among the leading figures among Dragons, 0.295 Dragons chief executive after being voted on to the WRU board in September. 0.178 "So I would start off [by saying] with the limited pot of money, we have to retain players in Wales. 0.169 In recent weeks, Racing Metro flanker Dan Lydiate was linked with returning to Wales. 0.137 "I've always felt - and this is with my rugby hat on now 0.134 "Now, if that can be done and there's some spare monies available at the end, yes, let's look to bring players back. 0.116 "It's obviously a limited amount of money [available]. The union are contributing 60% of that contract and the regions are putting £1. 0.099 based players with returns to Wales. 0.078 Recent reports have linked some France- 0.06 "There are players coming out of contract, perhaps in the next year or so… you're looking at your Liam Williams' of the world; Justin Tipuric for example - we need to keep these guys in Wales. 0.045 3m should be spent on ensuring current Wales-based stars remain there. 0.042 In the wake of that deal being done, Davies said the £3. 0.029 there in December 2013 after being dismissed for disciplinary reasons by former club Bayonne. 0.013 provided there's enough money." British and Irish Lions centre Roberts has insisted he will see out his Racing Metro contract. He and Phillips also earlier dismissed the idea of leaving Paris. 0.013 3m comes from the regions. 0.009 Likewise the Paris club's scrum-half Mike Phillips and centre Jamie Roberts were also touted for possible returns. 0.006 "We actually want them there. They are the ones who are going to impress the young kids, for example. "They are the sort of heroes that our young kids want to emulate. -0.092 Davies re-iterated his stance, saying keeping players such as Scarlets full-back Liam Williams and Ospreys flanker Justin Tipuric in Wales should take precedence. -0.062 3m in. "So it's a total pot of just over £3m and if you look at the sorts of salaries that the... guys... have been tempted to go overseas for [are] significant amounts of money. -0.05 Wales coach Warren Gatland has said: "We haven't instigated contact with the players. -0.031 "So if we were to bring the players back, we'd probably get five or six players. "And I've always felt - and this is with my rugby hat on now; this is not region or WRU - I'd rather spend that money on keeping players in Wales. -0.019 Ospreys, Scarlets and Cardiff Blues officials who were embroiled in a protracted dispute with the WRU that ended in a £60m deal in August this year. -0.016 The WRU provides £2m to the fund and £1. -0.015 Centre Roberts and flanker Lydiate joined Racing ahead of the 2013-14 season while scrum-half Phillips moved -0.014 "But it's a cruel world, isn't it? "It's fine to take the buck and go, but great if you can get them back as well, -0.012 Roberts also admitted being hurt by comments in French Newspaper L'Equipe attributed to Racing Coach Laurent Labit questioning their effectiveness. -0.006 "But we are aware that one or two of them are keen to return to Wales sooner rather than later." Speaking to Scrum V on BBC Radio Wales,
输入
0.078 / 8
最近的报道将一些法国
0.099 / 7
籍球员与回归威尔士联系起来。
0.137 / 16
“我一直觉得 - 现在我是以橄榄球的角度来看待这个问题
0.338 / 24
;这不是地区或 WRU 的立场 - 我宁愿把这笔钱花在让球员留在威尔士,”戴维斯说。
-0.016 / 15
WRU 向该基金提供了 200 万英镑,而 1.
0.013 / 7
300 万英镑来自各个地区。
0.92 / 13
前威尔士和英国及爱尔兰雄狮队边锋戴维斯成为
0.827 / 8
WRU 主席,于 10 月 21 日星期二上任,
1.057 / 11
在管理机构选举后接替了被罢免的大卫·皮克林。
0.337 / 16
他目前正在履行离职通知期,以离开其纽波特格温特
0.295 / 15
龙队首席执行官的职位,此前他于 9 月被选入 WRU 董事会。
0.306 / 12
戴维斯是龙队中的主要人物之一,
-0.019 / 34
奥斯普雷斯队、斯卡莱茨队和卡迪夫蓝军官员也卷入了一场与 WRU 的旷日持久的争端,这场争端在今年 8 月以 6000 万英镑的协议结束。
0.042 / 16
在该协议达成后,戴维斯表示,这 3.
0.045 / 15
300 万英镑应该用于确保目前在威尔士的球星留在那里。
0.169 / 20
最近几周,有传言称 Racing Metro 侧卫丹·莱迪特将返回威尔士。
0.009 / 23
同样,巴黎俱乐部的半卫麦克·菲利普斯和中锋杰米·罗伯茨也被传有可能回归。
-0.05 / 21
威尔士教练沃伦·加特兰德说:“我们没有主动与球员联系。
-0.006 / 34
“但我们意识到,他们中的一两个人渴望尽快返回威尔士。”在接受 BBC Radio Wales 的 Scrum V 节目采访时,
-0.092 / 36
戴维斯重申了他的立场,称应该优先考虑让斯卡莱茨队的后卫利亚姆·威廉姆斯和奥斯普雷斯队的侧卫贾斯汀·蒂普里克等球员留在威尔士。
0.116 / 30
“这显然是一笔有限的资金 [可用]。联盟贡献了该合同的 60%,而各个地区投入了 1.
-0.062 / 47
300 万英镑。“因此,总共有略高于 300 万英镑的资金,如果你看看那些...家伙...被诱惑去海外的薪水种类 [是] 相当可观的金额。
-0.031 / 59
“因此,如果我们要把球员带回来,我们可能会得到五到六名球员。“我一直觉得 - 现在我是以橄榄球的角度来看待这个问题;这不是地区或 WRU 的立场 - 我宁愿把这笔钱花在让球员留在威尔士。
0.06 / 46
“有些球员的合同即将到期,也许在明年左右…… 你看看像利亚姆·威廉姆斯这样的人;例如贾斯汀·蒂普里克 - 我们需要让这些人留在威尔士。
0.006 / 40
“我们实际上希望他们在那里。例如,他们是那些将给年轻人留下深刻印象的人。“他们是我们年轻人想要效仿的那种英雄。
0.178 / 26
“因此,我首先会 [说] ,在资金有限的情况下,我们必须留住威尔士的球员。
0.134 / 31
“现在,如果可以做到这一点,并且最后还有一些剩余资金可用,是的,让我们考虑带回球员。
-0.014 / 36
“但这是一个残酷的世界,不是吗?“拿走钱离开是可以的,但如果也能让他们回来就太好了,
0.013 / 37
前提是有足够的钱。”英国和爱尔兰雄狮队中锋罗伯茨坚称,他将履行完他在 Racing Metro 的合同。他和菲利普斯早些时候也驳斥了离开巴黎的想法。
-0.012 / 26
罗伯茨还承认,他对法国报纸《队报》中归因于 Racing 教练劳伦特·拉比特质疑他们效率的评论感到受伤。
-0.015 / 26
中锋罗伯茨和侧卫莱迪特在 2013-14 赛季之前加入了 Racing,而半卫菲利普斯则转会
0.029 / 17
到那里,在 2013 年 12 月,此前他因纪律原因被前俱乐部巴约纳解雇。

API

下面,我们将演示如何为一个模型(API/函数)生成解释。由于这是一个模型无关的案例,我们使用文本相似度模型来近似生成输出文本的对数几率,这用于计算 SHAP 解释。

[8]:
# Define function
def f(x):
    inputs = tokenizer(x.tolist(), return_tensors="pt", padding=True).to("cuda")
    with torch.no_grad():
        out = model.generate(**inputs)
    sentence = [tokenizer.decode(g, skip_special_tokens=True) for g in out]
    return np.array(sentence)

对于模型无关的案例,我们使用 shal.models.TeacherForcing 类包装要解释的模型,并定义文本相似度模型和分词器。TeacherForcing 类使用相似度模型来近似从模型(函数->f)生成输出文本的对数几率

我们还必须定义一个文本掩码器,并定义 mask_token=”…” 并传递 collapse_mask_token=True,这会提示算法在使用掩码时使用文本填充

[9]:
# wrap model with TeacherForcingLogits class
teacher_forcing_model = shap.models.TeacherForcing(
    f, similarity_model=model, similarity_tokenizer=tokenizer, device=model.device
)
# create a Text masker
masker = shap.maskers.Text(tokenizer, mask_token="...", collapse_mask_token=True)

使用包装模型和文本掩码器创建解释器对象

[10]:
explainer_model_agnostic = shap.Explainer(teacher_forcing_model, masker)

计算 SHAP 值

[11]:
shap_values_model_agnostic = explainer_model_agnostic(s)
Partition explainer: 2it [00:34, 17.39s/it]

可视化 SHAP 解释

[12]:
shap.plots.text(shap_values_model_agnostic)


[0]
输出
New
Welsh
Rugby
Union
chairman
Gareth
Davies
says
a
£
3
.
3
m
players
'
fund
should
be
used
to
keep
stars
in
Wales
.


-4-7-10-12-6.38956-6.38956base value-3.19536-3.19536fNew(inputs)0.332 Centre Roberts and flanker Lydiate joined Racing ahead of the 2013-14 season while scrum-half Phillips moved there in December 2013 after being dismissed for disciplinary reasons by former club Bayonne. 0.263 succeeding deposed David Pickering following governing body elections. 0.249 Former Wales and British and Irish Lions fly-half Davies became 0.214 The WRU provides £2m to the fund and £1. 0.205 'd rather spend that money on keeping players in Wales," said Davies. 0.203 "So if we were to bring the players back, we'd probably get five or six players. "And I've always felt - and this is with my rugby hat on now; this is not region or WRU - I'd rather spend that money on keeping players in Wales. 0.195 In recent weeks, Racing Metro flanker Dan Lydiate was linked with returning to Wales. 0.191 ; this is not region or WRU - I 0.185 Likewise the Paris club's scrum-half Mike Phillips and centre Jamie Roberts were also touted for possible returns. 0.179 He is now serving a notice period to leave his role as Newport Gwent 0.16 Davies was among the leading figures among Dragons, 0.155 "It's obviously a limited amount of money [available]. The union are contributing 60% of that contract and the regions are putting £1. 0.149 "I've always felt - and this is with my rugby hat on now 0.114 WRU chairman on Tuesday 21 October, 0.096 3m should be spent on ensuring current Wales-based stars remain there. 0.082 Dragons chief executive after being voted on to the WRU board in September. 0.062 In the wake of that deal being done, 0.054 "There are players coming out of contract, perhaps in the next year or so… you're looking at your Liam Williams' of the world; Justin Tipuric for example - we need to keep these guys in Wales. 0.053 3m in. "So it's a total pot of just over £3m and if you look at the sorts of salaries that the... guys... have been tempted to go overseas for [are] significant amounts of money. 0.046 "But it's a cruel world, isn't it? "It's fine to take the buck and go, but great if you can get them back as well, 0.041 Davies said the £3. 0.037 3m comes from the regions. 0.036 provided there's enough money." British and Irish Lions centre Roberts has insisted he will see out his Racing Metro contract. He and Phillips also earlier dismissed the idea of leaving Paris. 0.026 "We actually want them there. They are the ones who are going to impress the young kids, for example. "They are the sort of heroes that our young kids want to emulate. 0.024 "Now, if that can be done and there's some spare monies available at the end, yes, let's look to bring players back. 0.021 Ospreys, Scarlets and Cardiff Blues officials who were embroiled in a protracted dispute with the WRU that ended in a £60m deal in August this year. 0.013 based players with returns to Wales. 0.013 Davies re-iterated his stance, saying keeping players such as Scarlets full-back Liam Williams and Ospreys flanker Justin Tipuric in Wales should take precedence. 0.012 Roberts also admitted being hurt by comments in French Newspaper L'Equipe attributed to Racing Coach Laurent Labit questioning their effectiveness. -0.081 Wales coach Warren Gatland has said: "We haven't instigated contact with the players. -0.062 Recent reports have linked some France- -0.048 "But we are aware that one or two of them are keen to return to Wales sooner rather than later." Speaking to Scrum V on BBC Radio Wales, -0.025 "So I would start off [by saying] with the limited pot of money, we have to retain players in Wales.
输入
-0.062 / 8
最近的报道将一些法国
0.013 / 7
籍球员与回归威尔士联系起来。
0.149 / 16
“我一直觉得 - 现在我是以橄榄球的角度来看待这个问题
0.191 / 10
; 这不是地区或 WRU 的立场 - 我
0.205 / 14
宁愿把这笔钱花在让球员留在威尔士,”戴维斯说。
0.214 / 15
WRU 向该基金提供了 200 万英镑,而 1.
0.037 / 7
300 万英镑来自各个地区。
0.249 / 13
前威尔士和英国及爱尔兰雄狮队边锋戴维斯成为
0.114 / 8
WRU 主席,于 10 月 21 日星期二上任,
0.263 / 11
在管理机构选举后接替了被罢免的大卫·皮克林。
0.179 / 16
他目前正在履行离职通知期,以离开其纽波特格温特
0.082 / 15
龙队首席执行官的职位,此前他于 9 月被选入 WRU 董事会。
0.16 / 12
戴维斯是龙队中的主要人物之一,
0.021 / 34
奥斯普雷斯队、斯卡莱茨队和卡迪夫蓝军官员也卷入了一场与 WRU 的旷日持久的争端,这场争端在今年 8 月以 6000 万英镑的协议结束。
0.062 / 10
在该协议达成后,
0.041 / 6
戴维斯表示,这 3.
0.096 / 15
300 万英镑应该用于确保目前在威尔士的球星留在那里。
0.195 / 20
最近几周,有传言称 Racing Metro 侧卫丹·莱迪特将返回威尔士。
0.185 / 23
同样,巴黎俱乐部的半卫麦克·菲利普斯和中锋杰米·罗伯茨也被传有可能回归。
-0.081 / 21
威尔士教练沃伦·加特兰德说:“我们没有主动与球员联系。
-0.048 / 34
“但我们意识到,他们中的一两个人渴望尽快返回威尔士。”在接受 BBC Radio Wales 的 Scrum V 节目采访时,
0.013 / 36
戴维斯重申了他的立场,称应该优先考虑让斯卡莱茨队的后卫利亚姆·威廉姆斯和奥斯普雷斯队的侧卫贾斯汀·蒂普里克等球员留在威尔士。
0.155 / 30
“这显然是一笔有限的资金 [可用]。联盟贡献了该合同的 60%,而各个地区投入了 1.
0.053 / 47
300 万英镑。“因此,总共有略高于 300 万英镑的资金,如果你看看那些...家伙...被诱惑去海外的薪水种类 [是] 相当可观的金额。
0.203 / 59
“因此,如果我们要把球员带回来,我们可能会得到五到六名球员。“我一直觉得 - 现在我是以橄榄球的角度来看待这个问题;这不是地区或 WRU 的立场 - 我宁愿把这笔钱花在让球员留在威尔士。
0.054 / 46
“有些球员的合同即将到期,也许在明年左右…… 你看看像利亚姆·威廉姆斯这样的人;例如贾斯汀·蒂普里克 - 我们需要让这些人留在威尔士。
0.026 / 40
“我们实际上希望他们在那里。例如,他们是那些将给年轻人留下深刻印象的人。“他们是我们年轻人想要效仿的那种英雄。
-0.025 / 26
“因此,我首先会 [说] ,在资金有限的情况下,我们必须留住威尔士的球员。
0.024 / 31
“现在,如果可以做到这一点,并且最后还有一些剩余资金可用,是的,让我们考虑带回球员。
0.046 / 36
“但这是一个残酷的世界,不是吗?“拿走钱离开是可以的,但如果也能让他们回来就太好了,
0.036 / 37
前提是有足够的钱。”英国和爱尔兰雄狮队中锋罗伯茨坚称,他将履行完他在 Racing Metro 的合同。他和菲利普斯早些时候也驳斥了离开巴黎的想法。
0.012 / 26
罗伯茨还承认,他对法国报纸《队报》中归因于 Racing 教练劳伦特·拉比特质疑他们效率的评论感到受伤。
0.332 / 43
中锋罗伯茨和侧卫莱迪特在 2013-14 赛季之前加入了 Racing,而半卫菲利普斯则转会到那里,在 2013 年 12 月,此前他因纪律原因被前俱乐部巴约纳解雇。

有更多有用的示例的想法吗? 欢迎提交 Pull Request 以添加到此文档笔记本!