4.2 损失函数

本章代码：
https://github.com/zhangxiann/PyTorch_Practice/blob/master/lesson4/loss_function_1.py
https://github.com/zhangxiann/PyTorch_Practice/blob/master/lesson4/loss_function_1.py

这篇文章主要介绍了损失函数的概念，以及 PyTorch 中提供的常用损失函数。

损失函数

损失函数是衡量模型输出与真实标签之间的差异。我们还经常听到代价函数和目标函数，它们之间差异如下：

损失函数(Loss Function)是计算一个样本的模型输出与真实标签的差异
Loss $=f\left(y^{\wedge}, y\right)$
代价函数(Cost Function)是计算整个样本集的模型输出与真实标签的差异，是所有样本损失函数的平均值
$\cos t=\frac{1}{N} \sum_{i}^{N} f\left(y_{i}^{\wedge}, y_{i}\right)$
目标函数(Objective Function)就是代价函数加上正则项

在 PyTorch 中的损失函数也是继承于nn.Module，所以损失函数也可以看作网络层。

在逻辑回归的实验中，我使用了交叉熵损失函数loss_fn = nn.BCELoss()，$BCELoss$的继承关系：nn.BCELoss() -> _WeightedLoss -> _Loss -> Module。在计算具体的损失时loss = loss_fn(y_pred.squeeze(), train_y)，这里实际上在 Loss 中进行一次前向传播，最终调用BCELoss()的forward()函数F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)。

下面介绍 PyTorch 提供的损失函数。注意在所有的损失函数中，size_average和reduce参数都不再使用。

nn.CrossEntropyLoss

nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')

功能：把nn.LogSoftmax()和nn.NLLLoss()结合，计算交叉熵。nn.LogSoftmax()的作用是把输出值归一化到了 [0,1] 之间。

主要参数：

weight：各类别的 loss 设置权值
ignore_index：忽略某个类别的 loss 计算
reduction：计算模式，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

下面介绍熵的一些基本概念

自信息：$\mathrm{I}(x)=-\log [p(x)]$
信息熵就是求自信息的期望：$\mathrm{H}(\mathrm{P})=E_{x \sim p}[I(x)]=-\sum_{i}^{N} P\left(x_{i}\right) \log P\left(x_{i}\right)$
相对熵，也被称为 KL 散度，用于衡量两个分布的相似性(距离)：$\boldsymbol{D}{K L}(\boldsymbol{P}, \boldsymbol{Q})=\boldsymbol{E}{\boldsymbol{x} \sim p}\left[\log \frac{\boldsymbol{P}(\boldsymbol{x})}{Q(\boldsymbol{x})}\right]$。其中$P(X)$是真实分布，$Q(X)$是拟合的分布
交叉熵：$\mathrm{H}(\boldsymbol{P}, \boldsymbol{Q})=-\sum_{i=1}^{N} \boldsymbol{P}\left(\boldsymbol{x}_{i}\right) \log \boldsymbol{Q}\left(\boldsymbol{x}_{i}\right)$

相对熵展开可得：

$\begin{aligned} \boldsymbol{D}{K L}(\boldsymbol{P}, \boldsymbol{Q}) &=\boldsymbol{E}{\boldsymbol{x} \sim p}\left[\log \frac{P(x)}{Q(\boldsymbol{x})}\right] \ &=\boldsymbol{E}{\boldsymbol{x} \sim p}[\log P(\boldsymbol{x})-\log Q(\boldsymbol{x})] \ &=\sum{i=1}^{N} P\left(x_{i}\right)\left[\log P\left(\boldsymbol{x}{i}\right)-\log Q\left(\boldsymbol{x}{i}\right)\right] \ &=\sum_{i=1}^{N} P\left(\boldsymbol{x}{i}\right) \log P\left(\boldsymbol{x}{i}\right)-\sum_{i=1}^{N} P\left(\boldsymbol{x}_{i}\right) \log \boldsymbol{Q}\left(\boldsymbol{x}_{i}\right) \ &= H(P,Q) -H(P) \end{aligned}$

所以交叉熵 = 信息熵 + 相对熵，即$\mathrm{H}(\boldsymbol{P}, \boldsymbol{Q})=\boldsymbol{D}{K \boldsymbol{L}}(\boldsymbol{P}, \boldsymbol{Q})+\mathrm{H}(\boldsymbol{P})$，又由于信息熵$H(P)$是固定的，因此优化交叉熵$H(P,Q)$等价于优化相对熵$D{KL}(P,Q)$。

所以对于每一个样本的 Loss 计算公式为：

$\mathrm{H}(\boldsymbol{P}, \boldsymbol{Q})=-\sum_{i=1}^{N} \boldsymbol{P}\left(\boldsymbol{x}{\boldsymbol{i}}\right) \log Q\left(\boldsymbol{x}{\boldsymbol{i}}\right) = logQ(x_{i})$，因为$N=1$，$P(x_{i})=1$。

所以$\operatorname{loss}(x, \text { class })=-\log \left(\frac{\exp (x[\text { class }])}{\sum_{j} \exp (x[j])}\right)=-x[\text { class }]+\log \left(\sum_{j} \exp (x[j])\right)$。

如果了类别的权重，则$\operatorname{loss}(x, \text { class })=\operatorname{weight}[\text { class }]\left(-x[\text { class }]+\log \left(\sum_{j} \exp (x[j])\right)\right)$。

下面设有 3 个样本做 2 分类。inputs 的形状为 $3 \times 2$，表示每个样本有两个神经元输出两个分类。target 的形状为 $3 \times 1$，注意类别从 0 开始，类型为torch.long。

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# fake data
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

# def loss function
loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')

# forward
loss_none = loss_f_none(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)

输出为：

Cross Entropy Loss:
  tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)

我们根据单个样本的 loss 计算公式$\operatorname{loss}(x, \text { class })=-\log \left(\frac{\exp (x[\text { class }])}{\sum_{j} \exp (x[j])}\right)=-x[\text { class }]+\log \left(\sum_{j} \exp (x[j])\right)$，可以使用以下代码来手动计算第一个样本的损失

idx = 0

input_1 = inputs.detach().numpy()[idx]      # [1, 2]
target_1 = target.numpy()[idx]              # [0]

# 第一项
x_class = input_1[target_1]

# 第二项
sigma_exp_x = np.sum(list(map(np.exp, input_1)))
log_sigma_exp_x = np.log(sigma_exp_x)

# 输出loss
loss_1 = -x_class + log_sigma_exp_x

print("第一个样本loss为: ", loss_1)

结果为：1.3132617

下面继续看带有类别权重的损失计算，首先设置类别的权重向量weights = torch.tensor([1, 2], dtype=torch.float)，向量的元素个数等于类别的数量，然后在定义损失函数时把weight参数传进去。

输出为：

weights:  tensor([1., 2.])
tensor([1.3133, 0.2539, 0.2539]) tensor(1.8210) tensor(0.3642)

权值总和为：$1+2+2=5$，所以加权平均的 loss 为：$1.8210\div5=0.3642$，通过手动计算的方式代码如下：

weights = torch.tensor([1, 2], dtype=torch.float)
weights_all = np.sum(list(map(lambda x: weights.numpy()[x], target.numpy())))  # [0, 1, 1]  # [1 2 2]
mean = 0
loss_f_none = nn.CrossEntropyLoss(reduction='none')
loss_none = loss_f_none(inputs, target)
loss_sep = loss_none.detach().numpy()
for i in range(target.shape[0]):

x_class = target.numpy()[i]
tmp = loss_sep[i] * (weights.numpy()[x_class] / weights_all)
mean += tmp

print(mean)

结果为 0.3641947731375694

nn.NLLLoss

nn.NLLLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')

功能：实现负对数似然函数中的符号功能

主要参数：

weight：各类别的 loss 权值设置
ignore_index：忽略某个类别
reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

每个样本的 loss 公式为：$l_{n}=-w_{y_{n}} x_{n, y_{n}}$。还是使用上面的例子，第一个样本的输出为 [1,2]，类别为 0，则第一个样本的 loss 为 -1；第一个样本的输出为 [1,3]，类别为 1，则第一个样本的 loss 为 -3。

代码如下：

weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none_w = nn.NLLLoss(weight=weights, reduction='none')
loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)

# view
print("\nweights: ", weights)
print("NLL Loss", loss_none_w, loss_sum, loss_mean)

输出如下：

weights:  tensor([1., 1.])
NLL Loss tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)

nn.BCELoss

nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')

功能：计算二分类的交叉熵。需要注意的是：输出值区间为 [0,1]。

主要参数：

weight：各类别的 loss 权值设置
ignore_index：忽略某个类别
reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

计算公式为：$l_{n}=-w_{n}\left[y_{n} \cdot \log x_{n}+\left(1-y_{n}\right) \cdot \log \left(1-x_{n}\right)\right]$

使用这个函数有两个不同的地方：

预测的标签需要经过 sigmoid 变换到 [0,1] 之间。
真实的标签需要转换为 one hot 向量，类型为torch.float。

代码如下：

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

target_bce = target

# itarget
inputs = torch.sigmoid(inputs)

weights = torch.tensor([1, 1], dtype=torch.float)

loss_f_none_w = nn.BCELoss(weight=weights, reduction='none')
loss_f_sum = nn.BCELoss(weight=weights, reduction='sum')
loss_f_mean = nn.BCELoss(weight=weights, reduction='mean')

# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("\nweights: ", weights)
print("BCE Loss", loss_none_w, loss_sum, loss_mean)

结果为：

BCE Loss tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

第一个 loss 为 0，3133，手动计算的代码如下：

x_i = inputs.detach().numpy()[idx, idx]
y_i = target.numpy()[idx, idx]              #

# loss
# l_i = -[ y_i * np.log(x_i) + (1-y_i) * np.log(1-y_i) ]      # np.log(0) = nan
l_i = -y_i * np.log(x_i) if y_i else -(1-y_i) * np.log(1-x_i)

nn.BCEWithLogitsLoss

nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)

功能：结合 sigmoid 与二分类交叉熵。需要注意的是，网络最后的输出不用经过 sigmoid 函数。这个 loss 出现的原因是有时网络模型最后一层输出不希望是归一化到 [0,1] 之间，但是在计算 loss 时又需要归一化到 [0,1] 之间。

主要参数：

weight：各类别的 loss 权值设置
pos_weight：设置样本类别对应的神经元的输出的 loss 权值
ignore_index：忽略某个类别
reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

代码如下：

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

target_bce = target

# itarget
# inputs = torch.sigmoid(inputs)

weights = torch.tensor([1], dtype=torch.float)
pos_w = torch.tensor([3], dtype=torch.float)        # 3

loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none', pos_weight=pos_w)
loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum', pos_weight=pos_w)
loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean', pos_weight=pos_w)

# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)

# view
print("\npos_weights: ", pos_w)
print(loss_none_w, loss_sum, loss_mean)

输出为

pos_weights:  tensor([3.])
tensor([[0.9398, 2.1269],
        [0.3808, 2.1269],
        [3.0486, 0.0544],
        [4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)

与 BCELoss 进行对比

BCE Loss tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

可以看到，样本类别对应的神经元的输出的 loss 都增加了 3 倍。

nn.L1Loss

nn.L1Loss(size_average=None, reduce=None, reduction='mean')

功能：计算 inputs 与 target 之差的绝对值

主要参数：

reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

公式：$l_{n}=\left|x_{n}-y_{n}\right|$

nn.MSELoss

功能：计算 inputs 与 target 之差的平方

公式：$l_{n}=\left(x_{n}-y_{n}\right)^{2}$

主要参数：

reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

代码如下：

inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3

loss_f = nn.L1Loss(reduction='none')
loss = loss_f(inputs, target)

print("input:{}\ntarget:{}\nL1 loss:{}".format(inputs, target, loss))

# ------------------------------------------------- 6 MSE loss ----------------------------------------------

loss_f_mse = nn.MSELoss(reduction='none')
loss_mse = loss_f_mse(inputs, target)

print("MSE loss:{}".format(loss_mse))

输出如下：

input:tensor([[1., 1.],
        [1., 1.]])
target:tensor([[3., 3.],
        [3., 3.]])
L1 loss:tensor([[2., 2.],
        [2., 2.]])
MSE loss:tensor([[4., 4.],
        [4., 4.]])

nn.SmoothL1Loss

nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean')

功能：平滑的 L1Loss

公式：$z_{i}=\left{\begin{array}{ll}0.5\left(x_{i}-y_{i}\right)^{2}, & \text { if }\left|x_{i}-y_{i}\right|<1 \ \left|x_{i}-y_{i}\right|-0.5, & \text { otherwise }\end{array}\right.$

下图中橙色曲线是 L1Loss，蓝色曲线是 Smooth L1Loss

主要参数：

reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

nn.PoissonNLLLoss

nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')

功能：泊松分布的负对数似然损失函数

主要参数：

log_input：输入是否为对数形式，决定计算公式
- 当 log_input = True，表示输入数据已经是经过对数运算之后的，loss(input, target) = exp(input) - target * input
- 当 log_input = False，，表示输入数据还没有取对数，loss(input, target) = input - target * log(input+eps)
full：计算所有 loss，默认为 loss
eps：修正项，避免 log(input) 为 nan

代码如下：

inputs = torch.randn((2, 2))
target = torch.randn((2, 2))

loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none')
loss = loss_f(inputs, target)
print("input:{}\ntarget:{}\nPoisson NLL loss:{}".format(inputs, target, loss))

输出如下：

input:tensor([[0.6614, 0.2669],
        [0.0617, 0.6213]])
target:tensor([[-0.4519, -0.1661],
        [-1.5228,  0.3817]])
Poisson NLL loss:tensor([[2.2363, 1.3503],
        [1.1575, 1.6242]])

手动计算第一个 loss 的代码如下：

idx = 0

loss_1 = torch.exp(inputs[idx, idx]) - target[idx, idx]*inputs[idx, idx]

print("第一个元素loss:", loss_1)

结果为：2.2363

nn.KLDivLoss

nn.KLDivLoss(size_average=None, reduce=None, reduction='mean')

功能：计算 KLD(divergence)，KL 散度，相对熵

注意事项：需要提前将输入计算 log-probabilities，如通过nn.logsoftmax()

主要参数：

reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)，batchmean(batchsize 维度求平均值)

公式：$\begin{aligned} D_{K L}(P | Q)=E_{x-p}\left[\log \frac{P(x)}{Q(x)}\right] &=E_{x-p}[\log P(x)-\log Q(x)] =\sum_{i=1}^{N} P\left(x_{i}\right)\left(\log P\left(x_{i}\right)-\log Q\left(x_{i}\right)\right) \end{aligned}$

对于每个样本来说，计算公式如下，其中$y_{n}$是真实值$P(x)$，$x_{n}$是经过对数运算之后的预测值$logQ(x)$。

$l_{n}=y_{n} \cdot\left(\log y_{n}-x_{n}\right)$

代码如下：

inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.3, 0.5]])
inputs_log = torch.log(inputs)
target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)

loss_f_none = nn.KLDivLoss(reduction='none')
loss_f_mean = nn.KLDivLoss(reduction='mean')
loss_f_bs_mean = nn.KLDivLoss(reduction='batchmean')

loss_none = loss_f_none(inputs, target)
loss_mean = loss_f_mean(inputs, target)
loss_bs_mean = loss_f_bs_mean(inputs, target)

print("loss_none:\n{}\nloss_mean:\n{}\nloss_bs_mean:\n{}".format(loss_none, loss_mean, loss_bs_mean))

输出如下：

loss_none:
tensor([[-0.5448, -0.1648, -0.1598],
        [-0.2503, -0.4597, -0.4219]])
loss_mean:
-0.3335360586643219
loss_bs_mean:
-1.000608205795288

手动计算第一个 loss 的代码为：

idx = 0
loss_1 = target[idx, idx] * (torch.log(target[idx, idx]) - inputs[idx, idx])
print("第一个元素loss:", loss_1)

结果为：-0.5448。

nn.MarginRankingLoss

nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')

功能：计算两个向量之间的相似度，用于排序任务

特别说明：该方法计算两组数据之间的差异，返回一个$n \times n$ 的 loss 矩阵

主要参数：

margin：边界值，$x_{1}$与$x_{2}$之间的差异值
reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

计算公式：$\operatorname{loss}(x, y)=\max (0,-y *(x 1-x 2)+\operatorname{margin})$，$y$的取值有 +1 和 -1。

当 $y=1$时，希望$x_{1} > x_{2}$，当$x_{1} > x_{2}$，不产生 loss
当 $y=-1$时，希望$x_{1} < x_{2}$，当$x_{1} < x_{2}$，不产生 loss

代码如下：

x1 = torch.tensor([[1], [2], [3]], dtype=torch.float)
x2 = torch.tensor([[2], [2], [2]], dtype=torch.float)

target = torch.tensor([1, 1, -1], dtype=torch.float)

loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')

loss = loss_f_none(x1, x2, target)

print(loss)

输出为：

tensor([[1., 1., 0.],
        [0., 0., 0.],
        [0., 0., 1.]])

第一行表示$x_{1}$中的第一个元素分别与$x_{2}$中的 3 个元素计算 loss，以此类推。

nn.MultiLabelMarginLoss

nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='mean')

功能：多标签边界损失函数

举例：4 分类任务，样本 x 属于 0 类和 3 类，那么标签为 [0, 3, -1, -1]，

主要参数：

reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

计算公式：$\operatorname{loss}(x, y)=\sum_{i j} \frac{\max (0,1-(x[y[j]]-x[i]))}{x \cdot \operatorname{size}(0)}$，表示每个真实类别的神经元输出减去其他神经元的输出。

代码如下：

x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

loss_f = nn.MultiLabelMarginLoss(reduction='none')

loss = loss_f(x, y)

print(loss)

输出为：

0.8500

手动计算如下：

x = x[0]
item_1 = (1-(x[0] - x[1])) + (1 - (x[0] - x[2]))    # [0]
item_2 = (1-(x[3] - x[1])) + (1 - (x[3] - x[2]))    # [3]

loss_h = (item_1 + item_2) / x.shape[0]

print(loss_h)

nn.SoftMarginLoss

nn.SoftMarginLoss(size_average=None, reduce=None, reduction='mean')

功能：计算二分类的 logistic 损失

主要参数：

reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

计算公式：$\operatorname{loss}(x, y)=\sum_{i} \frac{\log (1+\exp (-y[i] * x[i]))}{\text { x.nelement } 0}$

代码如下：

inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])
target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)

loss_f = nn.SoftMarginLoss(reduction='none')

loss = loss_f(inputs, target)

print("SoftMargin: ", loss)

输出如下：

SoftMargin:  tensor([[0.8544, 0.4032],
        [0.4741, 0.9741]])

手动计算第一个 loss 的代码如下：

idx = 0

inputs_i = inputs[idx, idx]
target_i = target[idx, idx]

loss_h = np.log(1 + np.exp(-target_i * inputs_i))

print(loss_h)

结果为：0.8544

nn.MultiLabelSoftMarginLoss

nn.MultiLabelSoftMarginLoss(weight=None, size_average=None, reduce=None, reduction='mean')

功能：SoftMarginLoss 的多标签版本

主要参数：

weight：各类别的 loss 权值设置
reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

计算公式：$\operatorname{loss}(x, y)=-\frac{1}{C} \sum_{i} y[i] \log \left((1+\exp (-x[i]))^{-1}\right)+(1-y[i]) * \log \left(\frac{\exp (-x[i])}{(1+\exp (-x[i]))}\right)$

代码如下

inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float)

loss_f = nn.MultiLabelSoftMarginLoss(reduction='none')

loss = loss_f(inputs, target)

print("MultiLabel SoftMargin: ", loss)

输出为：

MultiLabel SoftMargin:  tensor([0.5429])

手动计算的代码如下：

x = torch.tensor([[0.1, 0.2, 0.7], [0.2, 0.5, 0.3]])
y = torch.tensor([1, 2], dtype=torch.long)

loss_f = nn.MultiMarginLoss(reduction='none')

loss = loss_f(x, y)

print("Multi Margin Loss: ", loss)

nn.MultiMarginLoss

nn.MultiMarginLoss(p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')

功能：计算多分类的折页损失

主要参数：

p：可以选择 1 或 2
weight：各类别的 loss 权值设置
margin：边界值
reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

计算公式：$\operatorname{loss}(x, y)=\frac{\left.\sum_{i} \max (0, \operatorname{margin}-x[y]+x[i])\right)^{p}}{\quad \text { x.size }(0)}$，其中 y 表示真实标签对应的神经元输出，x 表示其他神经元的输出。

代码如下：

x = torch.tensor([[0.1, 0.2, 0.7], [0.2, 0.5, 0.3]])
y = torch.tensor([1, 2], dtype=torch.long)

loss_f = nn.MultiMarginLoss(reduction='none')

loss = loss_f(x, y)

print("Multi Margin Loss: ", loss)

输出如下：

Multi Margin Loss:  tensor([0.8000, 0.7000])

手动计算第一个 loss 的代码如下：

x = x[0]
margin = 1

i_0 = margin - (x[1] - x[0])
# i_1 = margin - (x[1] - x[1])
i_2 = margin - (x[1] - x[2])

loss_h = (i_0 + i_2) / x.shape[0]

print(loss_h)

输出为：0.8000

nn.TripletMarginLoss

nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')

功能：计算三元组损失，人脸验证中常用

主要参数：

p：范数的阶，默认为 2
margin：边界值
reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

计算公式：$L(a, p, n)=\max \left{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\text { margin, } 0\right}$，$d\left(x_{i}, y_{i}\right)=\left|\mathbf{x}{i}-\mathbf{y}{i}\right|{p}$，其中$d(a{i}, p_{i})$表示正样本对之间的距离(距离计算公式与 p 有关)，$d(a_{i}, n_{i})$表示负样本对之间的距离。表示正样本对之间的距离比负样本对之间的距离小 margin，就没有了 loss。

代码如下：

anchor = torch.tensor([[1.]])
pos = torch.tensor([[2.]])
neg = torch.tensor([[0.5]])

loss_f = nn.TripletMarginLoss(margin=1.0, p=1)

loss = loss_f(anchor, pos, neg)

print("Triplet Margin Loss", loss)

输出如下：

Triplet Margin Loss tensor(1.5000)

手动计算的代码如下：

margin = 1
a, p, n = anchor[0], pos[0], neg[0]

d_ap = torch.abs(a-p)
d_an = torch.abs(a-n)

loss = d_ap - d_an + margin

print(loss)

nn.HingeEmbeddingLoss

nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')

功能：计算两个输入的相似性，常用于非线性 embedding 和半监督学习

特别注意：输入 x 应该为两个输入之差的绝对值

主要参数：

margin：边界值
reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

计算公式：$l_{n}=\left{\begin{array}{ll}x_{n}, & \text { if } y_{n}=1 \ \max \left{0, \Delta-x_{n}\right}, & \text { if } y_{n}=-1\end{array}\right.$

代码如下：

inputs = torch.tensor([[1., 0.8, 0.5]])
target = torch.tensor([[1, 1, -1]])

loss_f = nn.HingeEmbeddingLoss(margin=1, reduction='none')

loss = loss_f(inputs, target)

print("Hinge Embedding Loss", loss)

输出为：

Hinge Embedding Loss tensor([[1.0000, 0.8000, 0.5000]])

手动计算第三个 loss 的代码如下：

margin = 1.
loss = max(0, margin - inputs.numpy()[0, 2])

print(loss)

结果为 0.5

nn.CosineEmbeddingLoss

torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')

功能：采用余弦相似度计算两个输入的相似性

主要参数：

margin：边界值，可取值 [-1, 1]，推荐为 [0, 0.5]
reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

计算公式：$\operatorname{loss}(x, y)=\left{\begin{array}{ll}1-\cos \left(x_{1}, x_{2}\right), & \text { if } y=1 \ \max \left(0, \cos \left(x_{1}, x_{2}\right)-\operatorname{margin}\right), & \text { if } y=-1\end{array}\right.$

其中$\cos (\theta)=\frac{A \cdot B}{|A||B|}=\frac{\sum_{i=1}^{n} A_{i} \times B_{i}}{\sqrt{\sum_{i=1}^{n}\left(A_{i}\right)^{2}} \times \sqrt{\sum_{i=1}^{n}\left(B_{i}\right)^{2}}}$

代码如下：

x1 = torch.tensor([[0.3, 0.5, 0.7], [0.3, 0.5, 0.7]])
x2 = torch.tensor([[0.1, 0.3, 0.5], [0.1, 0.3, 0.5]])

target = torch.tensor([[1, -1]], dtype=torch.float)

loss_f = nn.CosineEmbeddingLoss(margin=0., reduction='none')

loss = loss_f(x1, x2, target)

print("Cosine Embedding Loss", loss)

输出如下：

Cosine Embedding Loss tensor([[0.0167, 0.9833]])

手动计算第一个样本的 loss 的代码为：

margin = 0.

def cosine(a, b):
numerator = torch.dot(a, b)
denominator = torch.norm(a, 2) * torch.norm(b, 2)
return float(numerator/denominator)

l_1 = 1 - (cosine(x1[0], x2[0]))

l_2 = max(0, cosine(x1[0], x2[0]))

print(l_1, l_2)

结果为：0.016662120819091797 0.9833378791809082

nn.CTCLoss

nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)

功能：计算 CTC 损失，解决时序类数据的分类，全称为 Connectionist Temporal Classification

主要参数：

blank：blank label
zero_infinity：无穷大的值或梯度置 0
reduction：计算模式，，可以为 none(逐个元素计算)，sum(所有元素求和，返回标量)，mean(加权平均，返回标量)

对时序方面研究比较少，不展开讲了。

参考资料

深度之眼 PyTorch 框架班

如果你觉得这篇文章对你有帮助，不妨点个赞，让我有更多动力写出好文章。

我的文章会首发在公众号上，欢迎扫码关注我的公众号张贤同学。

上一页4.1 权值初始化下一页4.3 优化器

最后更新于5年前

hashtag损失函数

hashtagnn.CrossEntropyLoss

hashtagnn.NLLLoss

hashtagnn.BCELoss

hashtagnn.BCEWithLogitsLoss

hashtagnn.L1Loss

hashtagnn.MSELoss

hashtagnn.SmoothL1Loss

hashtagnn.PoissonNLLLoss

hashtagnn.KLDivLoss

hashtagnn.MarginRankingLoss

hashtagnn.MultiLabelMarginLoss

hashtagnn.SoftMarginLoss

hashtagnn.MultiLabelSoftMarginLoss

hashtagnn.MultiMarginLoss

hashtagnn.TripletMarginLoss

hashtagnn.HingeEmbeddingLoss

hashtagnn.CosineEmbeddingLoss

hashtagnn.CTCLoss

损失函数

nn.CrossEntropyLoss

nn.NLLLoss

nn.BCELoss

nn.BCEWithLogitsLoss

nn.L1Loss

nn.MSELoss

nn.SmoothL1Loss

nn.PoissonNLLLoss

nn.KLDivLoss

nn.MarginRankingLoss

nn.MultiLabelMarginLoss

nn.SoftMarginLoss

nn.MultiLabelSoftMarginLoss

nn.MultiMarginLoss

nn.TripletMarginLoss

nn.HingeEmbeddingLoss

nn.CosineEmbeddingLoss

nn.CTCLoss