Pytorch Tutorial

基础元素 Tensor

Tensor 是 PyTorch 中的一个基础数据结构，和 numpy 中的 ndarray 非常相近，我们使用 tensors 去编码输入和输出，还有模型参数。

生成操作

x_ones = torch.ones_like(data) # 生成和 data 大小一样的全1 tensor
x_rand = torch.rand_like(data) # 生成和 data 大小一样的随机 tensor (value 0~1)

shape = [1, 2] # 指定大小
rand_tensor = torch.rand(shape)
zeros_tensor = torch.zeros(shape)
ones_tensor = torch.ones(shape)

Tensor 属性

tensor = torch.rand(3, 4)
tensor.shape # Shape of tensor
tensor.dtype # Datatype of tensor
tensor.device # Device that tensor is stored on

Tensor 操作

# 索引, 切片操作和 Numpy 一致
# join 操作
t1 = torch.cat([tensor, tensor, tensor], dim=1)
# 算术操作

# tensors 之间的矩阵乘法, y1, y2, y3 会有相同的值 
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)
torch.matmul(tensor, tensor.T, out=y3)

# tensors 之间的 element-wise product (点乘), z1, z2, z3 会有相同的只
z1 = tensor * tensor
z2 = tensor.mul(tensor)
torch.mul(tensor, tensor, out=z3)

# 单个元素取值操作, 当只有一个元素时, 可以将它转换成Python数值
value = value.item()

# In-place 操作, 会替换之前元素的存储空间, 在其他操作后加上 _ 后缀
tensor.add_(5)
tensor.copy_(x)
tensor.t_(x)

和 Numpy 的关联

# Tensor to Numpy
np = tensor.numpy()
# Numpy to Tensor
tensor = torch.from_numpy(np)

数据处理

PyTorch 处理数据有两个前提：torch.utils.data.Dataset 和 torch.utils.data.DataLoader。Dataset 存储样本和它们对应的标签，DataLoader 对 Dataset 绑定迭代轮数。

数据集

torchvision.datasets 模块包含许多真实世界图像数据如 CIFAR, COCO (full list here)。

所有数据集都是 torch.utils.data.Dataset 的子类，也就是说，它们都有 __getitem__ 和 __len__ 这两个API。所以，它们可以通过 torch.multiprocessing 给 torch.utils.data.DataLoader 传递多个样本实现多线程。

所有数据集都有类似的API。它们有两个共同参数 transform 和 target_transform 来分别转换 input 和 target。可以通过提供的 base classes 创建自己的数据集。

导入官方提供数据集

此处以 ImageNet ILSVRC2012 数据集举例，但 PyTorch 不提供 ImageNet 数据集的下载，需要自己下载数据集后放在root路径，此处放在 ./dataset 路径下。如果 PyTorch 有提供数据集下载，download=True 后会将数据集下载在 root 路径下。

from torchvision import datasets
from torchvision.transforms import ToTensor

# training_data = datasets.ImageNet(root="dataset", train=True, download=True, transform=ToTensor())
"""
download = True 会报错, 原因上述已经提到, PyTorch 库中代码根本没有下载, 源代码如下
if download is True:
	msg = ("The dataset is no longer publicly accessible. You need to "
		   "download the archives externally and place them in the root "
           "directory.")
	raise RuntimeError(msg)
elif download is False:
    msg = ("The use of the download flag is deprecated, since the dataset "
           "is no longer publicly accessible.")
    warnings.warn(msg, RuntimeWarning)
"""
training_data = datasets.ImageNet(root="dataset", train=True, transform=ToTensor())
test_data = datasets.ImageNet(root="dataset", train=False, transform=ToTensor())

创建自定义数据集

Dataset 类的部分内容如下：

class Dataset(object):
    """An abstract class representing a Dataset.
    
    All other datasets should subclass it. All subclasses should overide ``__len__``,     that provides the size of the dataset, and ``__getitem__``, supporing integer     	indexing in range from 0 to len(self) exclusive.
    """
    def __getitem__(self, index):
        raise NotImplementedError
        
    def __len__(self):
        raise NotImplementedError

子类继承父类 Dataset 时，必须重写 __getitem__ 和 __len__ 方法，否则将会报错。

其中 __getitem__ 实现通过索引来返回图像数据的功能，__len__ 返回数据集的大小。

要自定义数据集，首先要继承 Dataset 类，然后在 __init__ 方法中对数据进行整理，给图片打标签，划分数据集等等。

Example：

class PCIEDataset(Dataset):
    def __init__(self, root, size, mode):
        super(PCIEDataset, self).__init__()
        self.root = root
        self.size = size
        
        self.images, self.labels = load_images(self.root) # 从路径中读取文件
        # 图像处理, 任意什么处理都可以, 这里只是简单 resize
        self.images = [x.reshape(self.size) for x in self.images] # 将图片resize
        
        # 对数据集进行划分
        if mode == 'train': # 60%
            self.images = self.images[:int(0.6*len(self.images))]
            self.labels = self.labels[:int(0.6*len(self.labels))]
        elif mode == 'val': # 40%
            self.images = self.images[-int(0.4*len(self.images))]
            self.labels = self.labels[-int(0.4*len(self.labels))]
            
	def __getitem__(self, index):
        img, label = self.images[index], self.labels[index]
        return torch.tensor(img), torch.tensor(label)
    
    def __len__(self):
        return len(self.images)

DataLoader

Dataset 每次检索提供一个样本和一个标签，在训练模型时，我们通常通过 minibatches 来取数据集，在每个 epoch 重新打乱数据来避免模型过拟合，并且使用 Python 的 multiprocessing 对数据检索加速。

DataLoader 类的定义如下：

class torch.utils.data.DataLoader(
	dataset,								# 加载数据的数据集
    batch_size=1,							# 每个 batch 加载多少个样本
    shuffle=False,							# 在每个 epoch 重新打乱数据
    sampler=None,							# 从数据集中生成 index 的方式
    batch_sampler=None,						
    num_workers=0,							# 用多少个子进程加载数据, 0 默认单进程
    collate_fn=<function default_collate>,	# 将一个 batch 的数据集和标签进行合并操作
    pin_memory=False,						# True 时, 最开始生成的内存属于锁页内存, 转义到GPU的显存速度会更快一点
    drop_last=False							
    # 如果数据集大小不能被 batchsize 整除, 则为True后可以删除最后一个不完整的 batch, 如果为False, 则最后一个 batch 将会更小 
)

Example：

train_loader = DataLoader(dataset, batch_size=64, shuffle=True, pin_memory=True)

for epoch in range(epochs):
    for input, label in train_loader:
        # 训练操作

变换 (Transforms)

由于大多数据在进行训练之前都需要进行处理，PyTorch 内置了一些函数，来方便用户对图像等数据进行处理。

以下提到的类都在 torchvision.transforms 下，例如 torchvision.transforms.Compose

裁剪

CenterCrop

将给定的 PIL.Image 进行中心切割，得到给定的 size。 size 可以是一个 tuple ([height, width])，也可以是一个整数，在整数的情况下，切割出来的图片形状是正方形。

1 2	`CenterCrop([h, w]) CenterCrop(size)`

RandomCrop

切割中心点的位置随便选取，根据给定的 size 进行切割，具体参数如下。

size：可以是 tuple 也可以是整数。
padding：设置填充多少个 pixel。当为 int 时，图像上下左右均填充 int 个，若有两个数，则第一个数为左右扩充多少，第二个数表示上下的。当有 4 个数时，则为左、上、右、下各填充多少个。
fill：填充的值 (仅当填充 mode 为 ‘constant’ 时有效。当为 int 时，各通道均填充该值，当为 [r, g, b] tuple 时，表示RGB通道分别填充的值。
padding_mode：有 4 种填充模式：
- constant 常量。
- edge 按照图片边缘像素值填充。
- reflect 。以边缘值为中心做镜面对称，如 [1, 2, 3, 4] 边缘扩充一个值，结果为 [2, 1, 2, 3, 4, 3]。
- symmetric。以边缘为中心做镜面对称，如 [1, 2, 3, 4] 边缘扩充一个值，结果为 [1, 1, 2, 3, 4, 4]。

1	`RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')`

RandomResizedCrop

随机长宽比裁剪，之后再将图片 resize 到给定的 size。

size：可以是 tuple 也可以是整数。int 不支持，要用长度为1的序列 [size, ]。
scale： float 的 tuple。指定随机 crop 的比例区间，如 0.08x~1.0x。
ratio： float的tuple。随机长宽比范围设置，如 0.75~1.33。
interpolation：插值方法。默认为双线性插值。由 torchvision.transforms.InterpolationMode 定义。如输入为 Tensor，只有 InterpolationMode.NEAREST，InterpolationMode.BILINEAR 和 InterpolationMode.BICUBIC 可用。

1	`RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(3./4., 4./3.), interpolation=InterpolationMode.BILINEAR)`

FiveCrop

将给定图片裁剪成 4 张边角图片和 1 张中心图片。如果输入图片是 torch.Tensor，则 expected shape 为 [d, H, W]，其中 d 为指示维度的数。

size：可以是 tuple 也可以是整数。

1	`FiveCrop(size)`

TenCrop

将给定的图片裁剪成 4 张边角图片和 1 张中心图片，然后全部翻转 (默认水平翻转)。如果输入图片是 torch.Tensor，则 expected shape 为 [d, H, W]，其中 d 为指示维度的数。

size：可以是 tuple 也可以是整数。
vertical_flip：使用水平翻转或竖直翻转。True 垂直翻转，False 水平翻转。

1	`TenCrop(size, vertical_flip=False)`

翻转和旋转

RandomHorizontalFlip

根据给定概率 p 进行水平翻转。如果输入图片是 torch.Tensor，则 expected shape 为 [d, H, W]，其中 d 为指示维度的数。

1	`RandomHorizontalFlip(p=0.5)`

RandomVerticalFlip

根据给定概率 p 进行垂直翻转。如果输入图片是 torch.Tensor，则 expected shape 为 [d, H, W]，其中 d 为指示维度的数。

1	`RandomVerticalFlip(p=0.5)`

RandomRotation

根据角度旋转图片。如果输入图片是 torch.Tensor，则 expected shape 为 [d, H, W]，其中 d 为指示维度的数。

degrees：旋转角度。如果不是序列 $(min, max)$，则角度范围为 $(-degrees, +degrees)$
interpolation：插值方法。默认为 NEAREST。由 torchvision.transforms.InterpolationMode 定义。如输入为 Tensor，只有 InterpolationMode.NEAREST，InterpolationMode.BILINEAR 和 InterpolationMode.BICUBIC 可用。
expand：true 则填充到足够覆盖整个旋转后的图像，false 则使输出图像和输入图像大小一致。
fill：旋转后的图像外面的区域填充的值，默认为0。

1	`RandomRotation(degrees, interpolation=InterpolationMode.NEAREST, expand=False, center=None, fill=0)`

图像变换

Resize

把图像 Resize 成给定的 size。

size：(sequence or int) 期望输出 size。如果 size=(h, w)，则输出降为匹配的 size，如果 size 是一个 int，将会把较小的边设置为该值。也就是说，如果 height > width，那么图片将会 rescale 成 (size * height / width, size)。
interpolation：插值方法。默认为 BILINEAR。由 torchvision.transforms.InterpolationMode 定义。如输入为 Tensor，只有 InterpolationMode.NEAREST，InterpolationMode.BILINEAR 和 InterpolationMode.BICUBIC 可用。

1	`Resize(size, interpolation=InterpolationMode.BILINEAR)`

Normalize

用平均值和均方差归一化 Tensor 图像。

该方法不能被用于 PIL 图像。

给出 n 个 channel 的 (mean[1],...,mean[n]) 和 (std[1],..,std[n]) ，该方法将依次归一化每个 channel，也就是 output[channel] = (input[channel] - mean[channel]) / std[channel] 。

mean：每个 channel 的均值。
std：每个 channel 的均方差。
inplace：是否采用 in-place 。

1	`Normalize(mean, std, inplace=False)`

ToTensor

将 PIL Image 或者 numpy.ndarray 转换为 tensor 。

将 [0, 255] 范围内的 $(H\times W\times C)$ 转换为 [0.0, 1.0] 范围内的 $(C\times H\times W)$ torch.FloatTensor。

PIL Image 应该是 (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) 之中的一种模式， numpy.ndarray 的 dtype = np.uint8 。

1	`ToTensor(pic)`

ToPILImage

将 tensor 或者 ndarray 转换成 PIL Image。

将 $(C\times H\times W)$ 的 Tensor 转换为 $(H\times W\times C)$ 的 PIL Image。、

mode 可以参考 Pillow 官网上的色彩空间。

mode：PIL Image 的色彩空间，如果 mode=None，则会根据输入数据进行适配
- input channel=4 ，mode=RGBA
- input channel=3 ，mode=RGB
- input channel=2 ，mode=LA
- input channel=1 ，mode 由数据类型，也就是 int，float，short 决定。

1 2	`convert = ToPILImage(mode=None) output = convert(pic)`

Pad

使用给定的 padding 值填充给定的图像。

如果 image 是 Tensor ，它的大小应该是 $[d, H, W]$。对于 reflect 和 symmetric mode来说，最多 2 维，对于 edge mode 来说，最多 3 维，对于 constant mode 来说，可以是任意维度。

padding：设置填充多少个 pixel。当为 int 时，图像上下左右均填充 int 个，若有两个数，则第一个数为左右扩充多少，第二个数表示上下的。当有 4 个数时，则为左、上、右、下各填充多少个。
fill：填充的值 (仅当填充 mode 为 ‘constant’ 时有效。当为 int 时，各通道均填充该值，当为 [r, g, b] tuple 时，表示RGB通道分别填充的值。
padding_mode：有 4 种填充模式：
- constant 常量。
- edge 按照图片边缘像素值填充。
- reflect 。以边缘值为中心做镜面对称，如 [1, 2, 3, 4] 边缘扩充一个值，结果为 [2, 1, 2, 3, 4, 3]。
- symmetric。以边缘为中心做镜面对称，如 [1, 2, 3, 4] 边缘扩充一个值，结果为 [1, 1, 2, 3, 4, 4]。

1	`Pad(padding, fill=0, padding_mode='constant')`

ColorJitter

修改亮度，对比度，饱和度还有色调。

如果 image 是 Tensor，它的大小应该是 $[\dots, 3, H, W]$，$\dots$ 是维度。

brightness：浮点数或 (min, max) 的 tuple，亮度的抖动。brightness_factor 从 [max(0, 1 - brightness), 1 + brightness] 或者 [min, max] 中均匀选择。应该是非负数。
contrast：浮点数或 (min, max) 的 tuple，对比度的抖动。contrast_factor 从 [max(0, 1 - contrast), 1 + contrast] 或者 [min, max] 中均匀选择。应该是非负数。
saturation：浮点数或 (min, max) 的 tuple，饱和度的抖动。saturation_factor 从 [max(0, 1 - saturation), 1 + saturation] 或者 [min, max] 中均匀选择。应该是非负数。
hue：浮点数或 (min, max) 的 tuple，色调的抖动。hue_factor 从 [-hue, hue] 或者 [min, max] 中均匀选择。应该 0<= hue <= 0.5 或者 -0.5<= min <= max <=0.5 。

1	`ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)`

Grayscale

将图片转换为灰度图。

如果 image 是 Tensor，它的大小应该是 $[\dots,3,H,W]$，$\dots$ 是维度。

num_output_channels：(1 或者 3) 输出图像的通道数量。
- num_output_channels == 1：返回单通道图像。
- num_output_channels == 3：返回 r==g==b 的 3 通道图像。

1	`Grayscale(num_output_channels=1)`

RandomGrayscale

根据概率 p 将图片转为灰度图。

如果 image 是 Tensor，它的大小应该是 $[\dots,3,H,W]$，$\dots$ 是维度。

如果输入图像为单通道，则返回单通道，如果为 3 通道，则返回 r==g==b 的 3 通道灰度图。

1	`RandomGrayscale(p=0.1)`

LinearTransformation

用方阵和偏移量的 mean_vector 来做线性变换，可用于白化处理。

给定 transformation_matrix 和 mean_vector，会先将 Tensor 展平并从中减去 mean_vector ，然后用 transformation_matrix 计算点积，最后将 Tensor 重塑为之前的 shape。

transformation_matrix：$[D\times D]$ 的 Tensor，$D = C\times H\times W$。
mean_vector：$[D]$ 的 Tensor，$D=C\times H\times W$。

LinearTransformation(transformation_matrix, mean_vector)
'''
Applications:
	whitening transformation: Suppose X is a column vector zero-centered data.
	Then compute the data covariance matrix [D x D] with torch.mm(X.t(), X),
	perform SVD on this matrix and pass it as transformation_matrix.
'''

RandomAffine

图像保持中心不变的随机仿射变换。

如果 image 是 Tensor，它的大小应该是 $[\dots,H,W]$，$\dots$ 是维度。

degrees：旋转选择角度的范围。如果角度为一个值，则范围为 (-degrees, +degrees)。tuple 情况下范围为 (min, max)。如果 degrees=0，则停用旋转。
translate：水平和垂直平移的最大绝对比例的 tuple。如 translate=(a, b)，那么水平平移在 -img_width * a < dx < img_width * a 范围内随机选择，垂直平移在 -img_height * b < dy < img_height * b 范围内随机选择。默认则不会偏移。
scale：放缩的范围区间。如 (a, b)，则范围为 a<= scale <= b，默认保持原大小。
shear：错切选择角度的范围。如果 shear 是一个数字，则平行于 x 轴的错切范围为 (-shear, +shear)。如果 shear 是一个包含两个数字的序列，则平行于 x 轴的错切范围为 (shear[0], shear[1])。如果 shear 是一个包含 4 个数字的序列，那么平行于 x 轴的错切范围为 (shear[0], shear[1])，平行于 y 轴的错切范围为 (shear[2], shear[3])。默认不会应用错切。
interpolation：插值方法。默认为 NEAREST。由 torchvision.transforms.InterpolationMode 定义。如输入为 Tensor，InterpolationMode.NEAREST 和 InterpolationMode.BILINEAR 可用。
fill：区域外填充的值，默认是 0 。

1	`RandomAffine(degrees, translate=None, scale=None, shear=None, interpolation=InterpolationMode.NEAREST, fill=0)`

Transform 组合操作

Compose

将多个 transform 组合起来使用。

transforms.Compose([
    transforms.CenterCrop(10),
    transforms.ToTensor(),
])

RandomChoice

应用从列表中随机选择的单个转换。

transforms.RandomChoice([
    transforms.CenterCrop(10),
    transforms.ToTensor(),
])

RandomApply

给一个 transform 加上概率，以一定的概率执行该操作。

1	`RandomApply(transforms, p=0.5)`

RandomOrder

将 transforms 中的操作顺序随机打乱。

RandomOrder([
    transforms.CenterCrop(10),
    transforms.ToTensor(),
])

Lambda Transforms

还不了解，待补充。

构建模型

神经网络由对数据执行操作的层/模块组成。torch.nn 命名空间提供了构建自己的神经网络所需的所有构建块。 PyTorch 中的每个模块都是 nn.Module 的子类。神经网络是一个模块本身，它由其他模块（层）组成。这种嵌套结构允许轻松构建和管理复杂的架构。

设置训练设备

1	`device = 'cuda' if torch.cuda.is_available() else 'cpu'`

定义网络类

我们使用子类 nn.Module 定义自己的神经网络，然后在 __init__ 中初始化网络层。每个 nn.Module 子类都在 forward 方法中实现对输入数据的操作。

import torch.nn as nn

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        output = self.linear_relu_stack(x)
        return output
    
# 将模型载入到对应设备上
model = NearalNetwork().to(device)
# print 查看模型
print(model)

模型 Layers

Containers

torch.nn.Module 是所有网络的基类，我们的模型也应该继承这个类。

模块添加和迭代

add_module(name, module)

将一个 child module 添加到当前 module 。被添加的 module 可以通过 name 属性获取。

import torch.nn as nn
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.add_module("conv", nn.Conv2d(10, 20, 4))
        #self.conv = nn.Conv2d(10, 20, 4) 和上面这个增加module的方式等价
model = Model()
print(model.conv)

modules()

返回一个包含当前模型所有模块的迭代器。

import torch.nn as nn
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.add_module("conv", nn.Conv2d(10, 20, 4))
        self.add_module("conv1", nn.Conv2d(20 ,10, 4))
model = Model()

for module in model.modules():
    print(module)

children()

返回当前模型 子模块 的迭代器。

named_children()

返回包含模型当前子模块的迭代器，包含模块名字和模块本身。

1
2
3

for name, module in model.named_children():
    if name in ['conv4', 'conv5']:
        print(module)

模块设备管理

cpu(device_id=None)

将所有的模型参数 parameters 和 buffers 复制到 CPU。
cuda(device_id=None)

将所有的模型参数 parameters 和 buffers 复制给 GPU。

模块数据类型

double()

将 parameters 和 buffers 的数据类型转换为 double。
float()

将 parameters 和 buffers 的数据类型转换为 float。
half()

将 parameters 和 buffers 的数据类型转换为 half。

字典查看和导入

state_dict()

返回一个字典，保存着module的所有状态（state）。

parameters 和 persistent buffers 都会包含在字典中，字典的 key 就是 parameter 和 buffer 的 names 。

import torch
from torch.autograd import Variable
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv2 = nn.Linear(1, 2)
        self.vari = Variable(torch.rand([1]))
        self.par = nn.Parameter(torch.rand([1]))
        self.register_buffer("buffer", torch.randn([2,3]))

model = Model()
print(model.state_dict().keys())

load_state_dict(state_dict)

将 state_dict 中的 parameters 和 buffers 复制到此 module和它的后代中。state_dict 中的 key 必须和 model.state_dict() 返回的 key 一致。

网络训练，梯度，前向传播

forward(*input)

定义了每次执行的计算步骤。在所有的子类中都需要重写这个函数。
train(mode=True)

将 module 设置为 training mode。仅仅当模型中有 Dropout 和 BatchNorm 时才会有影响。
zero_grad()

将 module 中的所有模型参数的梯度设置为0.

容器类 Sequential, ModuleList, ParameterList

torch.nn.Sequential(* args)

一个时序容器。Modules 会以他们传入的顺序被添加到容器中。当然，也可以传入一个OrderedDict。

# Example of using Sequential

model = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )
# Example of using Sequential with OrderedDict
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))

torch.nn.ModuleList(modules=None)

将 submodules 保存在一个 list 中。

ModuleList 可以像一般的 Python list 一样被索引。而且 ModuleList 中包含的modules 已经被正确的注册，对所有的 module method 可见。

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])

    def forward(self, x):
        # ModuleList can act as an iterable, or be indexed         using ints
        for i, l in enumerate(self.linears):
            x = self.linears[i // 2](x) + l(x)
        return x

append(module)

等价于 list 的 append() 。

torch.nn.ParameterList(parameters=None)

将 submodules 保存在一个 list 中。

ParameterList 可以像一般的 Python list 一样被索引。而且 ParameterList 中包含的 parameters 已经被正确的注册，对所有的 module method 可见。

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.params = nn.ParameterList([nn.Parameter(torch.randn(10, 10)) for i in range(10)])

    def forward(self, x):
        # ModuleList can act as an iterable, or be indexed using ints
        for i, p in enumerate(self.params):
            x = self.params[i // 2].mm(x) + p.mm(x)
        return x

append(parameter)

等价于 list 的 append() 。

卷积层

X维卷积

卷积操作。

$N$ 为 batch size ，$C$ 为 channel 数量，$L$ 为信号序列的长度。

可使用的为 1-3 维的卷积层，函数如下：

torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
- 输入: $(N,C_{in},L_{in})$
- 输出: $(N,C_{out},L_{out})$
- $L_{out}$ 的 Shape 为：
  
  $$L_{out}=floor((L_{in}+2padding-dilation(kernerl_size-1)-1)/stride+1)$$
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
- 输入: $(N,C_{in},H_{in},W_{in})$
- 输出: $(N,C_{out},H_{out},W_{out})$
- $H_{out}$ 和 $W_{out}$ 的 Shape 为：
  
  $$H_{out}=floor((H_{in}+2padding[0]-dilation[0](kernerl_size[0]-1)-1)/stride[0]+1)$$
  
  $$W_{out}=floor((W_{in}+2padding[1]-dilation[1](kernerl_size[1]-1)-1)/stride[1]+1)$$
torch.nn.Conv3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
- 输入: $(N,C_{in},D_{in},H_{in},W_{in})$
- 输出: $(N,C_{out},D_{out},H_{out},W_{out})$
- $D_{out}, H_{out}, W_{out}$ 的 Shape 为：
  
  $$D_{out}=floor((D_{in}+2padding[0]-dilation[0](kernerl_size[0]-1)-1)/stride[0]+1)$$
  
  $$H_{out}=floor((H_{in}+2padding[1]-dilation[1](kernerl_size[1]-1)-1)/stride[1]+1)$$
  
  $$W_{out}=floor((W_{in}+2padding[2]-dilation[2](kernerl_size[2]-1)-1)/stride[2]+1)$$

输入输出的计算方式：

$$ out(N_i, C_{out_j})=bias(C_{out_j})+\sum^{C_{in}-1}{k=0}weight(C{out_j},k)\bigotimes input(N_i,k) $$

其中 $\bigotimes$ 为 cross-correlation 操作符。

参数及其意义为：

in_channels(int) – 输入信号的通道
out_channels(int) – 卷积产生的通道
kerner_size(int or tuple) - 卷积核的尺寸
stride(int or tuple, optional) - 卷积步长
padding(int or tuple, optional) - 输入的每一条边补充0的层数
dilation(int or tuple, optional) – 卷积核元素之间的间距
groups(int, optional) – 从输入通道到输出通道的阻塞连接数
- groups=1，所有输出由输入卷积得到
- groups=2，将输入 channel 平分成两份，卷积后进行 concat 操作
- groups=in_channels，所有 input_channel 都由自己的滤波器进行卷积，大小为 out_channels/in_channels
bias(bool, optional) - 如果bias=True，添加偏置

转置卷积

反卷积操作，相当于使用空洞卷积进行上采样。

$N$ 为 batch size ，$C$ 为 channel 数量，$L$ 为信号序列的长度。

可使用的为 1-3 维的卷积层，函数如下：

torch.nn.ConvTranspose1d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1)
- 输入: $(N,C_{in},L_{in})$
- 输出: $(N,C_{out},L_{out})$
- $L_{out}$ 的 Shape 为：
  
  $$L_{out}=(L_{in}-1)stride-2padding+dilation\times(kernel_size-1)+output_padding+1$$
torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1)
- 输入: $(N,C_{in},H_{in},W_{in})$
- 输出: $(N,C_{out},H_{out},W_{out})$
- $H_{out}$ 和 $W_{out}$ 的 Shape 为：
  
  $$H_{out}=(H_{in}-1)stride[0]-2padding[0]+dilation[0]\times (kernel_size[0]-1)+output_padding[0]+1$$
  
  $$W_{out}=(W_{in}-1)stride[1]-2padding[1]+dilation[1]\times(kernel_size[1]-1)+output_padding[1]+1$$
torch.nn.ConvTranspose3d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1)
- 输入: $(N,C_{in},D_{in},H_{in},W_{in})$
- 输出: $(N,C_{out},D_{out},H_{out},W_{out})$
- $D_{out}, H_{out}, W_{out}$ 的 Shape 为：
  
  $$D_{out}=(D_{in}-1)stride[0]-2padding[0]+dilation[0]\times(kernel_size[0]-1)+output_padding[0]+1$$
  
  $$H_{out}=(H_{in}-1)stride[1]-2padding[1]+dilation[1]\times(kernel_size[1]-1)+output_padding[1]+1$$
  
  $$W_{out}=(W_{in}-1)stride[2]-2padding[2]+dilation[2]\times(kernel_size[2]-1)+output_padding[2]+1$$

池化层

Maxpool

$N$ 为 batch size ，$C$ 为 channel 数量。

可使用的为 1-3 维的池化层，函数如下：

torch.nn.MaxPool1d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
- 输入: $(N,C_{in},L_{in})$
- 输出: $(N,C_{out},L_{out})$
- $L_{out}$ 的 Shape 为：
  
  $$L_{out}=floor((L_{in} + 2padding - dilation(kernel_size - 1) - 1)/stride + 1)$$
torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
- 输入: $(N,C_{in},H_{in},W_{in})$
- 输出: $(N,C_{out},H_{out},W_{out})$
- $H_{out}$ 和 $W_{out}$ 的 Shape 为：
  
  $$H_{out}=floor((H_{in} + 2padding[0] - dilation[0](kernel_size[0] - 1) - 1)/stride[0] + 1)$$
  
  $$W_{out}=floor((W_{in} + 2padding[1] - dilation[1](kernel_size[1] - 1) - 1)/stride[1] + 1)$$
torch.nn.MaxPool3d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
- 输入: $(N,C_{in},D_{in},H_{in},W_{in})$
- 输出: $(N,C_{out},D_{out},H_{out},W_{out})$
- $D_{out}, H_{out}, W_{out}$ 的 Shape 为：
  
  $$D_{out}=floor((D_{in} + 2padding[0] - dilation[0](kernel_size[0] - 1) - 1)/stride[0] + 1)$$
  
  $$H_{out}=floor((H_{in} + 2padding[1] - dilation[1](kernel_size[1] - 1) - 1)/stride[1] + 1)$$
  
  $$W_{out}=floor((W_{in} + 2padding[2] - dilation[2](kernel_size[2] - 1) - 1)/stride[2] + 1)$$

MaxUnpool

Maxpool的逆过程，不过并不是完全的逆过程，因为在maxpool的过程中，一些值已经丢失。 MaxUnpool输入MaxPool的输出，包括最大值的索引，并计算所有maxpool过程中非最大值被设置为零的部分的反向。

torch.nn.MaxUnpool1d(kernel_size, stride=None, padding=0)
torch.nn.MaxUnpool2d(kernel_size, stride=None, padding=0)
torch.nn.MaxUnpool3d(kernel_size, stride=None, padding=0)

AvgPool

平均池化。

torch.nn.AvgPool1d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True)
torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True)
torch.nn.AvgPool3d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True)

参数：

kernel_size(int or tuple) - 池化窗口大小
stride(int or tuple, optional) - max pooling的窗口移动的步长。默认值是kernel_size
padding(int or tuple, optional) - 输入的每一条边补充0的层数
dilation(int or tuple, optional) – 一个控制窗口中元素步幅的参数
ceil_mode - 如果等于True，计算输出信号大小的时候，会使用向上取整，代替默认的向下取整的操作
count_include_pad - 如果等于True，计算平均池化时，将包括padding填充的0

非线性激活层

torch.nn.ReLU(inplace=False)

表达式：${ReLU}(x)= max(0, x)$
torch.nn.Sigmoid()

表达式：$f(x)=1/(1+e^{-x})$
torch.nn.Tanh()

表达式：$f(x)=\frac{exp(x)-exp(-x)}{exp(x)+exp(-x)}$
torch.nn.Softmax()

表达式：$f(x_i)=\frac{exp(x_i)}{\sum_j exp(x_j)}$
torch.nn.LeakyReLU(negative_slope=0.01, inplace=False)

表达式：$f(x) = max(0, x) + {negative_slope} * min(0, x)$
- negative_slope：控制负斜率的角度，默认等于0.01
torch.nn.ELU(alpha=1.0, inplace=False)

表达式：$f(x) = max(0,x) + min(0, alpha * (e^x - 1))$
torch.nn.Threshold(threshold, value, inplace=False)

表达式：$y=x, if ,,, x>=threshold;,, y=value, if,,,x<threshold$
- threshold：阈值
- value：输入值小于阈值则会被value代替

Normalization 层

BatchNorm

对 mini-batch 的输入进行 Batch Normalization 操作：

$$y=\frac{x-mean[x]}{\sqrt{Var[x]}+\epsilon}\times gamma+\beta$$

在每一个小批量（mini-batch）数据中，计算输入各个维度的均值和标准差。gamma与beta是可学习的大小为C的参数向量（C为输入大小）

在训练时，该层计算每次输入的均值与方差，并进行移动平均。移动平均默认的动量值为0.1。

在验证时，训练求得的均值/方差将用于标准化验证数据。

torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True)
torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True)
torch.nn.BatchNorm3d(num_features, eps=1e-05, momentum=0.1, affine=True)

参数：

num_features： 来自期望输入的特征数。
eps： 为保证数值稳定性（分母不能趋近或取0）,给分母加上的值。默认为1e-5。
momentum： 动态均值和动态方差所使用的动量。默认为0.1。
affine： 一个布尔值，当设为true，给该层添加可学习的仿射变换参数。

Recurrent 层

RNN

torch.nn.RNN( *args,* * kwargs)

将一个多层的 Elman RNN，激活函数为 tanh 或者 ReLU，用于输入序列。

对输入序列中每个元素，RNN每层的计算公式为 $$ h_t=tanh(W_{ih}x_t+b_{ih}+W_{hh}h_{t-1}+b_{hh}) $$ $h_t$是时刻$t$的隐状态。 $x_t$是上一层时刻$t$的隐状态，或者是第一层在时刻$t$的输入。如果 nonlinearity='relu' ,那么将使用 relu 代替 tanh 作为激活函数。

参数：

input_size – 输入x的特征数量。
hidden_size – 隐层的特征数量。
num_layers – RNN的层数。
nonlinearity – 指定非线性函数使用 tanh 还是 relu。默认是 tanh。
bias – 如果是 False，那么RNN层就不会使用偏置权重 $b_ih$和$b_hh$,默认是 True
batch_first – 如果 True 的话，那么输入 Tensor 的shape应该是[batch_size, time_step, feature],输出也是这样。
dropout – 如果值非零，那么除了最后一层外，其它层的输出都会套上一个 dropout 层。
bidirectional – 如果 True，将会变成一个双向 RNN，默认为 False。

LSTM

torch.nn.LSTM( *args,* * kwargs)

将一个多层的 (LSTM) 应用到输入序列。

对输入序列的每个元素，LSTM的每层都会执行以下计算：
$$
\begin{aligned} i_t &=sigmoid(W_{ii}x_t+b_{ii}+W_{hi}h_{t-1}+b_{hi}) \ f_t &= sigmoid(W_{if}x_t+b_{if}+W_{hf}h_{t-1}+b_{hf}) \ o_t &= sigmoid(W_{io}x_t+b_{io}+W_{ho}h_{t-1}+b_{ho})\ g_t &= tanh(W_{ig}x_t+b_{ig}+W_{hg}h_{t-1}+b_{hg})\ c_t &= f_t\odot c_{t-1}+i_t\odot g_t\ h_t &= o_t\odot tanh(c_t) \end{aligned}
$$

$h_t$是时刻$t$的隐状态，$c_t$是时刻$t$的细胞状态，$x_t$是上一层的在时刻$t$的隐状态或者是第一层在时刻$t$的输入。$i_t, f_t, g_t, o_t$ 分别代表输入门，遗忘门，细胞和输出门。

参数：

input_size – 输入的特征维度
hidden_size – 隐状态的特征维度
num_layers – 层数（和时序展开要区分开）
bias – 如果为False，那么LSTM将不会使用$b_{ih},b_{hh}$，默认为True。
batch_first – 如果为True，那么输入和输出Tensor的形状为(batch, seq, feature)
dropout – 如果非零的话，将会在RNN的输出上加个dropout，最后一层除外。
bidirectional – 如果为True，将会变成一个双向RNN，默认为False。

Transformer 层

nn.Transformer

一个基于 “Attension is All You Need” 实现的 Transformer 模型。

torch.nn.Transformer(d_model=512, nhead=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation=<function relu>, custom_encoder=None, custom_decoder=None, layer_norm_eps=1e-05, batch_first=False, norm_first=False, device=None, dtype=None)

参数：

d_model – the number of expected features in the encoder/decoder inputs (default=512).
nhead – the number of heads in the multiheadattention models (default=8).
num_encoder_layers – the number of sub-encoder-layers in the encoder (default=6).
num_decoder_layers – the number of sub-decoder-layers in the decoder (default=6).
dim_feedforward – the dimension of the feedforward network model (default=2048).
dropout – the dropout value (default=0.1).
activation – the activation function of encoder/decoder intermediate layer, can be a string (“relu” or “gelu”) or a unary callable. Default: relu
custom_encoder – custom encoder (default=None).
custom_decoder – custom decoder (default=None).
layer_norm_eps – the eps value in layer normalization components (default=1e-5).
batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False (seq, batch, feature).
norm_first – if True, encoder and decoder layers will perform LayerNorms before other attention and feedforward operations, otherwise after. Default: False (after).

nn.TransformerEncoder

由 N 个 encoder 层堆叠形成的 TransformerEncoder。

torch.nn.TransformerEncoder(encoder_layer, num_layers, norm=None)

参数：

encoder_layer – an instance of the TransformerEncoderLayer() class (required).
num_layers – the number of sub-encoder-layers in the encoder (required).
norm – the layer normalization component (optional).

nn.TransformerDecoder

由 N 个 decoder 层堆叠形成的 TransformerDecoder。

torch.nn.TransformerDecoder(decoder_layer, num_layers, norm=None)

参数：

decoder_layer – an instance of the TransformerDecoderLayer() class (required).
num_layers – the number of sub-decoder-layers in the decoder (required).
norm – the layer normalization component (optional).

Linear 层

Linear

torch.nn.Linear(in_features, out_features, bias=True)

对输入数据做线性变换：$y=Ax+b$。

参数：

in_features - 每个输入样本的大小
out_features - 每个输出样本的大小
bias - 若设置为False，这层不会学习偏置。默认值：True

形状：

输入: $(N,in_features)$
输出： $(N,out_features)$

Bilinear

torch.nn.Bilinear(in1_features, in2_features, out_features, bias=True)

对输入数据做双线性变换：$y=x_1^TAx_2+b$。

参数：

in1_features – size of each first input sample
in2_features – size of each second input sample
out_features – size of each output sample
bias – If set to False, the layer will not learn an additive bias. Default: True

Dropout 层

X 维 Dropout

torch.nn.Dropout(p=0.5, inplace=False)
torch.nn.Dropout2d(p=0.5, inplace=False)
torch.nn.Dropout3d(p=0.5, inplace=False)

输入输出形状参照卷积层—X 维卷积。

训练

Optimization

每次优化循环的迭代为一个 epoch。

每个 epoch 由以下两个重要部分组成：

The Train Loop - 在训练数据集上迭代以收敛到最优参数。
The Validation/Test Loop - 在测试集上测试来检查模型性能是否有提升。

Loss Function

对于回归任务来说，常用的 loss function 包括均方误差 nn.MSELoss，对于分类任务来说，Negative Log Likehood nn.NLLLoss。还有将 nn.LogSoftmax 和 nn.NLLLoss 结合起来的 nn.CrossEntropyLoss 交叉熵 loss function。

Autograd 求导

torch.autograd提供了类和函数用来对任意标量函数进行求导。要想使用自动求导，只需要对已有的代码进行微小的改变。只需要将所有的tensor包含进Variable对象中即可。

可以通过创建 tensor 时设置 requires_grad，也可以通过 x.requires_grad_(True) 来设置自动求导。

可以通过 z.grad_fn 来查看 Gradient function。

在模型中，只需要 loss.backward() 就可以计算后向传播时的导数。

如果想要关闭梯度跟踪，则需要在 with torch.no_grad(): 下进行函数操作，或者对 tensor 使用 detach()。

Optimizer

torch.optim是一个实现了各种优化算法的库。大部分常用的方法得到支持，并且接口具备足够的通用性，使得未来能够集成更加复杂的方法。

为了构建一个Optimizer，你需要给它一个包含了需要优化的参数（必须都是Variable对象）的iterable。然后，你可以设置optimizer的参数选项，比如学习率，权重衰减。

常用的 Optimizer 有 SGD 和 Adam。

所有 Optimizer 中都含有 step() 和 zero_grad() 函数， step() 函数完成单次优化迭代，zero_grad() 将之前计算的梯度清空。通用训练流程如下：

optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
optimizer.zero_grad()
loss(model(input), target).backward()
optimizer.step()

如果要对 learning_rate 采用特殊下降方法，则通用训练流程为：

scheduler = (optimizer, ... )
for epoch in range(100):
    train(...)
    validate(...)
    scheduler.step()

迭代优化方法

SGD

torch.optim.SGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False)

params (iterable) – 待优化参数的iterable或者是定义了参数组的dict
lr (float) – 学习率
momentum (float, 可选) – 动量因子（默认：0）
weight_decay (float, 可选) – 权重衰减（L2惩罚）（默认：0）
dampening (float, 可选) – 动量的抑制因子（默认：0）
nesterov (bool, 可选) – 使用Nesterov动量（默认：False）

Adam

torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)

params (iterable) – 待优化参数的iterable或者是定义了参数组的dict
lr (float, 可选) – 学习率（默认：1e-3）
betas (Tuple[float, float], 可选) – 用于计算梯度以及梯度平方的运行平均值的系数（默认：0.9，0.999）
eps (float, 可选) – 为了增加数值计算的稳定性而加到分母里的项（默认：1e-8）
weight_decay (float, 可选) – 权重衰减（L2惩罚）（默认: 0）

lr_scheduler

余弦退火

torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1, verbose=False)

参数：

optimizer (Optimizer) – Wrapped optimizer.
T_max (int) – Maximum number of iterations.
eta_min (float) – Minimum learning rate. Default: 0.
last_epoch (int) – The index of last epoch. Default: -1.
verbose (bool) – If True, prints a message to stdout for each update. Default: False.

模型的存储和导入

存储模型权重

import torchvision.models as models
# 通过 state_dict() 方法存储权重
model = models.vgg16(pretrained=True)
torch.save(model.state_dict(), 'model_weights.pth')

# 通过 load_state_dict() 方法导入权重
model = models.vgg16()
model.load_state_dict(torch.load('model_weights.pth'))

存储整个模型

将整个模型结构和权重一起存储，数据较大。

1
2
3

# 存储
torch.save(model, 'model.pth')
model = torch.load('model.pth')

Coding > PyTorch

#PyTorch

Pytorch Tutorial

https://pandintelli.github.io/2022/01/13/Pytorch-Tutorial/

作者

Pand

发布于

2022年1月13日

许可协议

基于腾讯云的博客服务器部署上一篇

论文阅读：A Simple Framework for Contrastive Learning of Visual Representations 下一篇