在线观看www成人影院-在线观看www日本免费网站-在线观看www视频-在线观看操-欧美18在线-欧美1级

電子發燒友App

硬聲App

0
  • 聊天消息
  • 系統消息
  • 評論與回復
登錄后你可以
  • 下載海量資料
  • 學習在線課程
  • 觀看技術視頻
  • 寫文章/發帖/加入社區
會員中心
創作中心

完善資料讓更多小伙伴認識你,還能領取20積分哦,立即完善>

3天內不再提示
創作
電子發燒友網>電子資料下載>電子資料>PyTorch教程13.6之多個GPU的簡潔實現

PyTorch教程13.6之多個GPU的簡潔實現

2023-06-05 | pdf | 0.19 MB | 次下載 | 免費

資料介紹

為每個新模型從頭開始實施并行性并不好玩。此外,優化同步工具以獲得高性能有很大的好處。在下文中,我們將展示如何使用深度學習框架的高級 API 來執行此操作。數學和算法與第 13.5 節中的相同毫不奇怪,您至少需要兩個 GPU 才能運行本節的代碼。

import torch
from torch import nn
from d2l import torch as d2l
from mxnet import autograd, gluon, init, np, npx
from mxnet.gluon import nn
from d2l import mxnet as d2l

npx.set_np()

13.6.1。玩具網絡

讓我們使用一個比13.5 節中的 LeNet 更有意義的網絡 ,它仍然足夠容易和快速訓練。我們選擇了一個 ResNet-18 變體He et al. , 2016由于輸入圖像很小,我們對其進行了輕微修改。特別地,與第 8.6 節的不同之處在于,我們在開始時使用了更小的卷積核、步長和填充。此外,我們刪除了最大池化層。

#@save
def resnet18(num_classes, in_channels=1):
  """A slightly modified ResNet-18 model."""
  def resnet_block(in_channels, out_channels, num_residuals,
           first_block=False):
    blk = []
    for i in range(num_residuals):
      if i == 0 and not first_block:
        blk.append(d2l.Residual(out_channels, use_1x1conv=True,
                    strides=2))
      else:
        blk.append(d2l.Residual(out_channels))
    return nn.Sequential(*blk)

  # This model uses a smaller convolution kernel, stride, and padding and
  # removes the max-pooling layer
  net = nn.Sequential(
    nn.Conv2d(in_channels, 64, kernel_size=3, stride=1, padding=1),
    nn.BatchNorm2d(64),
    nn.ReLU())
  net.add_module("resnet_block1", resnet_block(64, 64, 2, first_block=True))
  net.add_module("resnet_block2", resnet_block(64, 128, 2))
  net.add_module("resnet_block3", resnet_block(128, 256, 2))
  net.add_module("resnet_block4", resnet_block(256, 512, 2))
  net.add_module("global_avg_pool", nn.AdaptiveAvgPool2d((1,1)))
  net.add_module("fc", nn.Sequential(nn.Flatten(),
                    nn.Linear(512, num_classes)))
  return net
#@save
def resnet18(num_classes):
  """A slightly modified ResNet-18 model."""
  def resnet_block(num_channels, num_residuals, first_block=False):
    blk = nn.Sequential()
    for i in range(num_residuals):
      if i == 0 and not first_block:
        blk.add(d2l.Residual(
          num_channels, use_1x1conv=True, strides=2))
      else:
        blk.add(d2l.Residual(num_channels))
    return blk

  net = nn.Sequential()
  # This model uses a smaller convolution kernel, stride, and padding and
  # removes the max-pooling layer
  net.add(nn.Conv2D(64, kernel_size=3, strides=1, padding=1),
      nn.BatchNorm(), nn.Activation('relu'))
  net.add(resnet_block(64, 2, first_block=True),
      resnet_block(128, 2),
      resnet_block(256, 2),
      resnet_block(512, 2))
  net.add(nn.GlobalAvgPool2D(), nn.Dense(num_classes))
  return net

13.6.2。網絡初始化

我們將在訓練循環內初始化網絡。有關初始化方法的復習,請參閱第 5.4 節。

net = resnet18(10)
# Get a list of GPUs
devices = d2l.try_all_gpus()
# We will initialize the network inside the training loop

The initialize function allows us to initialize parameters on a device of our choice. For a refresher on initialization methods see Section 5.4. What is particularly convenient is that it also allows us to initialize the network on multiple devices simultaneously. Let’s try how this works in practice.

net = resnet18(10)
# Get a list of GPUs
devices = d2l.try_all_gpus()
# Initialize all the parameters of the network
net.initialize(init=init.Normal(sigma=0.01), ctx=devices)

Using the split_and_load function introduced in Section 13.5 we can divide a minibatch of data and copy portions to the list of devices provided by the devices variable. The network instance automatically uses the appropriate GPU to compute the value of the forward propagation. Here we generate 4 observations and split them over the GPUs.

x = np.random.uniform(size=(4, 1, 28, 28))
x_shards = gluon.utils.split_and_load(x, devices)
net(x_shards[0]), net(x_shards[1])
[08:00:43] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97: Running performance tests to find the best convolution algorithm, this can take a while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
(array([[ 2.2610207e-06, 2.2045981e-06, -5.4046786e-06, 1.2869955e-06,
     5.1373163e-06, -3.8297967e-06, 1.4339059e-07, 5.4683451e-06,
     -2.8279192e-06, -3.9651104e-06],
    [ 2.0698672e-06, 2.0084667e-06, -5.6382510e-06, 1.0498458e-06,
     5.5506434e-06, -4.1065491e-06, 6.0830087e-07, 5.4521784e-06,
     -3.7365021e-06, -4.1891640e-06]], ctx=gpu(0)),
 array([[ 2.4629783e-06, 2.6015525e-06, -5.4362617e-06, 1.2938218e-06,
     5.6387889e-06, -4.1360108e-06, 3.5758853e-07, 5.5125256e-06,
     -3.1957325e-06, -4.2976326e-06],
    [ 1.9431673e-06, 2.2600434e-06, -5.2698201e-06, 1.4807417e-06,
     5.4830934e-06, -3.9678889e-06, 7.5751018e-08, 5.6764356e-06,
     -3.2530229e-06, -4.0943951e-06]], ctx=gpu(1)))

Once data passes through the network, the corresponding parameters are initialized on the device the data passed through. This means that initialization happens on a per-device basis. Since we picked GPU 0 and GPU 1 for initialization, the network is initialized only there, and not on the CPU. In fact, the parameters do not even exist on the CPU. We can verify this by printing out the parameters and observing any errors that might arise.

weight = net[0].params.get('weight')

try:
  weight.data()
except RuntimeError:
  print('not initialized on cpu')
weight.data(devices[0])[0], weight.data(devices[1])[0]
not initialized on cpu
(array([[[ 0.01382882, -0.01183044, 0.01417865],
     [-0.00319718, 0.00439528, 0.02562625],
     [-0.00835081, 0.01387452, -0.01035946]]], ctx=gpu(0)),
 array([[[ 0.01382882, -0.01183044, 0.01417865],
     [-0.00319718, 0.00439528, 0.02562625],
     [-0.00835081, 0.01387452, -0.01035946]]], ctx=gpu(1)))

Next, let’s replace the code to evaluate the accuracy by one that works in parallel across multiple devices. This serves as a replacement of the evaluate_accuracy_gpu function from Section 7.6. The main difference is that we split a minibatch before invoking the network. All else is essentially identical.

#@save
def evaluate_accuracy_gpus(net, data_iter, split_f=d2l.split_batch):
  """Compute the accuracy for a model on a dataset using multiple GPUs."""
  # Query the list of devices
  devices = list(net.collect_params().values())[0].list_ctx()
  # No. of correct predictions, no. of predictions
  metric = d2l.Accumulator(2)
  for features, labels in data_iter:
    X_shards, y_shards = split_f(features, labels, devices)
    # Run in parallel
    pred_shards = [net(X_shard) for X_shard in X_shards]
    metric.add(sum(float(d2l.accuracy(pred_shard, y_shard)) for
            pred_shard, y_shard in zip(
              pred_shards, y_shards)), labels.size)
  return metric[0] / metric[1]

13.6.3。訓練

和以前一樣,訓練代碼需要執行幾個基本功能以實現高效并行:

  • 需要在所有設備上初始化網絡參數。

  • 在迭代數據集時,小批量將被劃分到所有設備上。

  • 我們跨設備并行計算損失及其梯度。

  • 梯度被聚合并且參數被相應地更新。

最后,我們計算精度(再次并行)以報告網絡的最終性能。訓練例程與前面章節中的實現非常相似,只是我們需要拆分和聚合數據。

def train(net, num_gpus, batch_size, lr):
  train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
  devices = [d2l.try_gpu(i) for i in range(num_gpus)]
  def init_weights(module):
    if type(module) in [nn.Linear, nn.Conv2d

下載該資料的人也在下載 下載該資料的人還在閱讀
更多 >

評論

查看更多

下載排行

本周

  1. 1山景DSP芯片AP8248A2數據手冊
  2. 1.06 MB  |  532次下載  |  免費
  3. 2RK3399完整板原理圖(支持平板,盒子VR)
  4. 3.28 MB  |  339次下載  |  免費
  5. 3TC358743XBG評估板參考手冊
  6. 1.36 MB  |  330次下載  |  免費
  7. 4DFM軟件使用教程
  8. 0.84 MB  |  295次下載  |  免費
  9. 5元宇宙深度解析—未來的未來-風口還是泡沫
  10. 6.40 MB  |  227次下載  |  免費
  11. 6迪文DGUS開發指南
  12. 31.67 MB  |  194次下載  |  免費
  13. 7元宇宙底層硬件系列報告
  14. 13.42 MB  |  182次下載  |  免費
  15. 8FP5207XR-G1中文應用手冊
  16. 1.09 MB  |  178次下載  |  免費

本月

  1. 1OrCAD10.5下載OrCAD10.5中文版軟件
  2. 0.00 MB  |  234315次下載  |  免費
  3. 2555集成電路應用800例(新編版)
  4. 0.00 MB  |  33566次下載  |  免費
  5. 3接口電路圖大全
  6. 未知  |  30323次下載  |  免費
  7. 4開關電源設計實例指南
  8. 未知  |  21549次下載  |  免費
  9. 5電氣工程師手冊免費下載(新編第二版pdf電子書)
  10. 0.00 MB  |  15349次下載  |  免費
  11. 6數字電路基礎pdf(下載)
  12. 未知  |  13750次下載  |  免費
  13. 7電子制作實例集錦 下載
  14. 未知  |  8113次下載  |  免費
  15. 8《LED驅動電路設計》 溫德爾著
  16. 0.00 MB  |  6656次下載  |  免費

總榜

  1. 1matlab軟件下載入口
  2. 未知  |  935054次下載  |  免費
  3. 2protel99se軟件下載(可英文版轉中文版)
  4. 78.1 MB  |  537798次下載  |  免費
  5. 3MATLAB 7.1 下載 (含軟件介紹)
  6. 未知  |  420027次下載  |  免費
  7. 4OrCAD10.5下載OrCAD10.5中文版軟件
  8. 0.00 MB  |  234315次下載  |  免費
  9. 5Altium DXP2002下載入口
  10. 未知  |  233046次下載  |  免費
  11. 6電路仿真軟件multisim 10.0免費下載
  12. 340992  |  191187次下載  |  免費
  13. 7十天學會AVR單片機與C語言視頻教程 下載
  14. 158M  |  183279次下載  |  免費
  15. 8proe5.0野火版下載(中文版免費下載)
  16. 未知  |  138040次下載  |  免費
主站蜘蛛池模板: 手机看片日韩永久福利盒子| 免费观看一区二区| 你懂的网址免费国产| 国产亚洲第一| 国内精品哆啪啪| 免费看大美女大黄大色| 成年人网站免费观看| 免费在线观看a| 三级视频在线| 黄乱色伦| 午夜免费福利在线| 成年片色大黄全免费网址| 97久久天天综合色天天综合色hd| 黄色亚洲| 欧美色欧美亚洲高清在线视频 | 精品国产香港三级| brazzers720欧美丰满| 天天舔天天色| 99婷婷| 深夜视频在线免费| 天天干天天日天天射天天操毛片| 一级网站片| bt天堂网www连接| 高清色视频| 成年网站在线播放| 天堂中文www在线| 最新在线网址| 欧美最猛性xxxx高清| 全亚洲最大的777io影院| 热久久久| 高h细节肉爽文bl文| 操他射他影院| 在线激情网| 扒开双腿猛进入jk校视频| 天堂在线观看视频| 天堂网成人| 久久综合五月婷婷| 黄色尤物| 天天干天天操天天添| 色姑娘天天干| 中文字幕在线观看一区二区|