Metric_in_Semantic_Segmentation

Metric in Semantic Segmentation

Dice Coefficient

定义：Dice系数，是一种集合相似度度量函数，通常用于计算两个样本点的相似度（值范围为[0, 1]）。用于分割问题，分割最好时为1，最差为0。（可解决样本不均衡问题）
计算公式：
$Dice = \frac{2 \times \left | X \cap Y \right |}{\left | X \right | + \left | Y \right | } = \frac{2 \times 预测正确的结果 }{ 真实结果 + 预测结果 } \qquad\qquad X是标签；Y是预测值$
其中 $\left | X \cap Y \right |$ 是表示 X 和 Y 的交集（逐像素相乘后相加），$\left | X \right | $ 和 $ \left | Y \right |$ 表示其元素的个数（逐像素（Or平方）相加）。在计算的时候一般会加一个smooth，防止分母出现0
Dice loss = 1 - Dice

代码实现1（简单）：

# H*W，只针对二维，多类多batch分开计算
def dice_coeff(pred, target):
    smooth = 1.
    num = pred.size(0)
    m1 = pred.view(num, -1)  # Flatten 
    m2 = target.view(num, -1)  # Flatten
    intersection = (m1 * m2).sum() # 计算交集
    return (2. * intersection + smooth) / (m1.sum() + m2.sum() + smooth)

代码实现2（标准）：

# H*W
def dice_coeff(input: Tensor, target: Tensor, reduce_batch_first: bool = False, epsilon=1e-6):
    # Average of Dice coefficient for all batches, or for a single mask
    assert input.size() == target.size()
    if input.dim() == 2 and reduce_batch_first:
        raise ValueError(f'Dice: asked to reduce batch 
        					but got tensor without batch dimension (shape {input.shape})')

    if input.dim() == 2 or reduce_batch_first:
    	# torch.dot 点乘，对应元素相乘后相加，一个值，分子交集
        inter = torch.dot(input.reshape(-1), target.reshape(-1))
        # 分母，并集
        sets_sum = torch.sum(input) + torch.sum(target)
        if sets_sum.item() == 0:
            sets_sum = 2 * inter
        return (2 * inter + epsilon) / (sets_sum + epsilon)
    else:
        # compute and average metric for each batch element
        dice = 0
        for i in range(input.shape[0]):
            dice += dice_coeff(input[i, ...], target[i, ...])
        return dice / input.shape[0]

def multiclass_dice_coeff(input: Tensor, target: Tensor,
						  reduce_batch_first: bool = False, epsilon=1e-6):
    # Average of Dice coefficient for all classes
    assert input.size() == target.size()
    dice = 0
    for channel in range(input.shape[1]):
        dice += dice_coeff(input[:, channel, ...], target[:, channel, ...], 
        				   reduce_batch_first, epsilon)

    return dice / input.shape[1]

def dice_loss(input: Tensor, target: Tensor, multiclass: bool = False):
	# 在调用的时候，groud-truth若是多类别，需要进行one-hot编码
	# 【B,C,H,W】target and input
    # Dice loss (objective to minimize) between 0 and 1
    assert input.size() == target.size()
    fn = multiclass_dice_coeff if multiclass else dice_coeff
  return 1 - fn(input, target, reduce_batch_first=True)

Mean Intersection over Union

mIoU：Mean Intersection over Union，均交并比，为语义分割的标准度量。其计算所有类别交集和并集之比的平均值.
先验提示：
- TP(真正): 预测正确, 预测结果是正类, 真实是正类
- FP(假正): 预测错误, 预测结果是正类, 真实是负类
- FN(假负): 预测错误, 预测结果是负类, 真实是正类
- TN(真负): 预测正确, 预测结果是负类, 真实是负类 # 跟该类别无关,所以不包含在并集中
mIoU的计算：直观理解，计算两圆交集（橙色部分）与两圆并集（红色+橙色+黄色）之间的比例，理想情况下两圆重合，比例为1
$mIoU = \frac{1}{k+1} \sum_{i=0}^{k} \frac{TP}{FN+FP+TP}$

计算：

先求混淆矩阵：K 分类问题就会生成 K * K 的混淆矩阵。

假设有150个样本数据，预测类别1，2，3各有50 个，分类结束的混淆矩阵为上：

每一行之和表示该类别的真实样本数量，每一列之和表示被预测为该类别的样本数量

第一行说明有43个属于第一类别的样本被正确预测为了第一类，有两个属于第一类别的样本被错误预测成为了第二类。

再求 mIoU：

mIoU = 混淆矩阵对角线的值 / (混淆矩阵的每一行再加上每一列，最后减去对角线上的值)

混淆矩阵: 对角线上的值的和代表分类正确的像素点个数(preb与target一致),对角线之外的其他值的和代表所有分类错误的像素的个数。

混淆矩阵矩阵中 (x, y) 位置的元素代表该张图片中真实类别为 x ,被预测为 y 的像素个数。

代码：

# 1先求混淆矩阵
def _fast_hist(self, label_pred, label_true):
        # 找出标签中需要计算的类别,去掉了背景
        mask = (label_true >= 0) & (label_true < self.num_classes)
        # np.bincount计算了从0到n**2-1这n**2个数中每个数出现的次数，返回值形状(n, n)
        hist = np.bincount(self.num_classes * label_true[mask].astype(int) +
                            label_pred[mask], minlength=self.num_classes ** 2)
                            .reshape(self.num_classes,self.num_classes)
        return hist
# 2根据混淆矩阵求mIoU
# 输入：预测值和真实值 [batch_size, H, W] 
# 语义分割的任务是为每个像素点分配一个label
def ev aluate(self, predictions, gts):
    for lp, lt in zip(predictions, gts):
    assert len(lp.flatten()) == len(lt.flatten())
    self.hist += self._fast_hist(lp.flatten(), lt.flatten())
    # miou
    # 每个类别 iou
    iou = np.diag(self.hist) / (self.hist.sum(axis=1) + self.hist.sum(axis=0) np.diag(self.hist))
    # 取平均值
    miou = np.nanmean(iou)

常用求mIoU代码：

# 输入 pred，target 【B,H,W】
# 第一种方式 比较合适我理解
def iou_mean(pred, target, n_classes = 1):
    # n_classes ：the number of classes in your dataset,not including background
    # for mask and ground-truth label, not probability map
    ious = [] #每个类别的 IoU
    iousSum = 0
    pred = pred.view(-1)
    target = target.view(-1)
    # Ignore IoU for background class ("0")
    for cls in range(1, n_classes+1):  
      	pred_inds = pred == cls
        target_inds = target == cls
        # Cast to long to prevent overflows
        intersection = (pred_inds[target_inds]).long().sum().data.cpu().item()  
        union = pred_inds.long().sum().data.cpu().item() 
        		+ target_inds.long().sum().data.cpu().item() - intersection
        if union == 0:
          ious.append(float('nan'))  # If there is no ground truth, do not include in evaluation
        else:
          ious.append (float(intersection) / float(max(union, 1)))
          iousSum += float(intersection) / float(max(union, 1))
       
      return iousSum/n_classes  # mIoU
      
# 第二种方式
# 'K' classes, output and target sizes are N or N * L or N * H * W, each value in range 0 to K - 1.
def intersectionAndUnion(output, target, K, ignore_index=255):
    assert output.ndim in [1, 2, 3]
    assert output.shape == target.shape
    output = output.reshape(output.size).copy()
    target = target.reshape(target.size)
    output[np.where(target == ignore_index)[0]] = ignore_index
    intersection = output[np.where(output == target)[0]]
    area_intersection, _ = np.histogram(intersection, bins=np.arange(K + 1))
    area_output, _ = np.histogram(output, bins=np.arange(K + 1))
    area_target, _ = np.histogram(target, bins=np.arange(K + 1))
    area_union = area_output + area_target - area_intersection
    
    ious = area_intersection / area_union+epsilon  # 是一个array，代表每个类别的IoU
    mIoU = np.nanmean(ious)  # mIoU

Dice和IoU的联系：

其中在 Dice 中 $\left | X \cap Y \right |$ 就是 TP，$\left | X \right |$ 假设是ground-truth的话就是 FN+TP，$\left | Y \right |$ 假设是预测的 mask的话就是 TP+FP：

得到：

根据在【0，1】值域中的函数图像，可以发现：
- IoU和Dice同时为0，同时为1；这很好理解，就是全预测正确和全部预测错误
- 在相同的预测情况下，可以发现Dice给出的评价会比IoU高一些