2014 年,GoogLeNet贏得了 ImageNet 挑戰(zhàn)賽 (Szegedy等人,2015 年) ,它使用的結(jié)構(gòu)結(jié)合了 NiN (Lin等人,2013 年)、重復(fù)塊 (Simonyan 和 Zisserman,2014 年)和卷積混合的優(yōu)點(diǎn)內(nèi)核。它也可以說(shuō)是第一個(gè)在 CNN 中明確區(qū)分主干(數(shù)據(jù)攝?。?、主體(數(shù)據(jù)處理)和頭部(預(yù)測(cè))的網(wǎng)絡(luò)。這種設(shè)計(jì)模式在深度網(wǎng)絡(luò)的設(shè)計(jì)中一直存在:由對(duì)圖像進(jìn)行操作的前 2-3 個(gè)卷積給出。他們從底層圖像中提取低級(jí)特征。接下來(lái)是一組卷積塊。最后,頭部將目前獲得的特征映射到手頭所需的分類(lèi)、分割、檢測(cè)或跟蹤問(wèn)題。
GoogLeNet 的關(guān)鍵貢獻(xiàn)是網(wǎng)絡(luò)主體的設(shè)計(jì)。它巧妙地解決了卷積核的選擇問(wèn)題。而其他作品試圖確定哪個(gè)卷積,范圍從 1×1到11×11最好,它只是 連接多分支卷積。接下來(lái)我們介紹一個(gè)略微簡(jiǎn)化的 GoogLeNet 版本:最初的設(shè)計(jì)包括許多通過(guò)中間損失函數(shù)穩(wěn)定訓(xùn)練的技巧,應(yīng)用于網(wǎng)絡(luò)的多個(gè)層。由于改進(jìn)的訓(xùn)練算法的可用性,它們不再是必需的。
import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l
import tensorflow as tf
from d2l import tensorflow as d2l
8.4.1. 起始?jí)K
GoogLeNet 中的基本卷積塊稱(chēng)為Inception 塊,源于電影 Inception的模因“我們需要更深入” 。
如圖8.4.1所示,初始?jí)K由四個(gè)并行分支組成。前三個(gè)分支使用窗口大小為1×1,3×3, 和 5×5從不同的空間大小中提取信息。中間兩個(gè)分支還加了一個(gè)1×1輸入的卷積減少了通道的數(shù)量,降低了模型的復(fù)雜度。第四個(gè)分支使用3×3最大池化層,然后是1×1卷積層改變通道數(shù)。四個(gè)分支都使用適當(dāng)?shù)奶畛涫馆斎牒洼敵鼍哂邢嗤母叨群蛯挾取?/font>最后,每個(gè)分支的輸出沿著通道維度連接起來(lái),并構(gòu)成塊的輸出。Inception 塊的常用超參數(shù)是每層的輸出通道數(shù),即如何在不同大小的卷積之間分配容量。
class Inception(nn.Module):
# c1--c4 are the number of output channels for each branch
def __init__(self, c1, c2, c3, c4, **kwargs):
super(Inception, self).__init__(**kwargs)
# Branch 1
self.b1_1 = nn.LazyConv2d(c1, kernel_size=1)
# Branch 2
self.b2_1 = nn.LazyConv2d(c2[0], kernel_size=1)
self.b2_2 = nn.LazyConv2d(c2[1], kernel_size=3, padding=1)
# Branch 3
self.b3_1 = nn.LazyConv2d(c3[0], kernel_size=1)
self.b3_2 = nn.LazyConv2d(c3[1], kernel_size=5, padding=2)
# Branch 4
self.b4_1 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
self.b4_2 = nn.LazyConv2d(c4, kernel_size=1)
def forward(self, x):
b1 = F.relu(self.b1_1(x))
b2 = F.relu(self.b2_2(F.relu(self.b2_1(x))))
b3 = F.relu(self.b3_2(F.relu(self.b3_1(x))))
b4 = F.relu(self.b4_2(self.b4_1(x)))
return torch.cat((b1, b2, b3, b4), dim=1)
class Inception(nn.Block):
# c1--c4 are the number of output channels for each branch
def __init__(self, c1, c2, c3, c4, **kwargs):
super(Inception, self).__init__(**kwargs)
# Branch 1
self.b1_1 = nn.Conv2D(c1, kernel_size=1, activation='relu')
# Branch 2
self.b2_1 = nn.Conv2D(c2[0], kernel_size=1, activation='relu')
self.b2_2 = nn.Conv2D(c2[1], kernel_size=3, padding=1,
activation='relu')
# Branch 3
self.b3_1 = nn.Conv2D(c3[0], kernel_size=1, activation='relu')
self.b3_2 = nn.Conv2D(c3[1], kernel_size=5, padding=2,
activation='relu')
# Branch 4
self.b4_1 = nn.MaxPool2D(pool_size=3, strides=1, padding=1)
self.b4_2 = nn.Conv2D(c4, kernel_size=1, activation='relu')
def forward(self, x):
b1 = self.b1_1(x)
b2 = self.b2_2(self.b2_1(x))
b3 = self.b3_2(self.b3_1(x))
b4 = self.b4_2(self.b4_1(x))
return np.concatenate((b1, b2, b3, b4), axis=1)
class Inception(nn.Module):
# `c1`--`c4` are the number of output channels for each branch
c1: int
c2: tuple
c3: tuple
c4: int
def setup(self):
# Branch 1
self.b1_1 = nn.Conv(self.c1, kernel_size=(1, 1))
# Branch 2
self.b2_1 = nn.Conv(self.c2[0], kernel_size=(1, 1))
self.b2_2 = nn.Conv(self.c2[1], kernel_size=(3, 3), padding='same')
# Branch 3
self.b3_1 = nn.Conv(self.c3[0], kernel_size=(1, 1))
self.b3_2 = nn.Conv(self.c3[1], kernel_size=(5, 5), padding='same')
# Branch 4
self.b4_1 = lambda x: nn.max_pool(x, window_shape=(3, 3),
strides=(1, 1), padding='same')
self.b4_2 = nn.Conv(self.c4, kernel_size=(1, 1))
def __call__(self, x):
b1 = nn.relu(self.b1_1(x))
b2 = nn.relu(self.b2_2(nn.relu(self.b2_1(x))))
b3 = nn.relu(self.b3_2(nn.relu(self.b3_1(x))))
b4 = nn.relu(self.b4_2(self.b4_1(x)))
return jnp.concatenate((b1, b2, b3, b4), axis=-1)
評(píng)論
查看更多