Neural Network Python

data格式转换

features.ToTensor() # 将PIL图像 or NumPy数组 -> PyTorch的Tensor格式, 像素值[0, 255] ->[0.0, 1.0]
features.cpu().numpy() # PyTorch 的 Tensor 转换成 NumPy 的 ndarray,如果 tensor 在 GPU 上，必须先 .cpu()
features.tolist()	# 任意维度Tensor->Python list/int/float
features.item()	# 单个元素Tensor->Python int/float

data维度转换

张量 tensor

storage：内存中起始元素的位置
Shape：元素的个数（2，3，2）
Stride ：步幅（12，4，1）
contiguous：在内存中是否连续
offset：基于起始元素的偏移

PyTorch 的张量操作

大多数 PyTorch 模型（比如 ResNet）期望输入 shape 为 [batch_size, channels, height, width]，也就是 4 个维度。

.squeeze() & .unsqueeze()

shape改变

1 2	.squeeze(dim=None) # 删除所有长度为1的维度,如果dim长度是1，可删除 .unsqueeze(dim) # 在指定维度上增加一个长度为 1 的新维度

.view()

返回 原内存的 view，仅改变stride和shape
tensor 必须是 contiguous（内存连续），否则需要 .contiguous() 先复制数据生成新的连续 tensor

pos_embed.shape # torch.Size([1, 14, 14, 768])
pos_embed_new = pos_embed.view(1, -1, 768)
pos_embed.shape # torch.Size([1, 14, 14, 768])
pos_embed_new.shape # torch.Size([1, 196, 768])

.reshape() = .contiguous().view()

如果张量本身连续，reshape() 不会新建内存，只改变 view（视图），即改变张量的 shape 和 strides。
如果张量不是连续的，reshape() 会 复制一份数据（新建内存）, 再改变张量的 shape 和 strides。
reshape 只关注总元素数一样

1
2
3

pos_embed.shape # torch.Size([1, 196, 768])
pos_embed_new.reshape(1, 14, 14, 768) 
pos_embed_new.shape # torch.Size([1, 14, 14, 768])

.flatten()

把张量展平成一维向量，默认按行优先 flatten（C-order）

x = torch.arange(12).reshape(2, 3, 2) # torch.Size([2, 3, 2])
y = x.flatten() # torch.Size([2, 6])
z = x.flatten(start_dim=1) # torch.Size([12])

x.flatten() == x.reshape(-1) #等价

.permute(dim0, dim1, dim2, …)

内存不连续(contiguous：False)，改变张量的 shape 和 strides
改变张量维度的顺序

1 2	pos_embed.reshape(1, w0, h0, dim).permute(0, 3, 1, 2), # → [1, dim, w0, h0] pos_embed = pos_embed.contiguous() # permute返回非连续张量

.transpose(dim0,dim1)

内存不连续(contiguous：False)，改变张量的 shape 和 strides
仅能交换张量中的两个维度

1 2	q, k, v = qkv[0] * self.scale, qkv[1], qkv[2] attn = q @ k.transpose(-2, -1) # 交换最后两个维度（-2表示倒数第2维，-1表示最后1维）

.contiguous()

内存连续化，但如果原 tensor 已经是连续的，不会新分配内存

1	y = x.transpose(0,1).contiguous()

torch.stack(tensors, dim=0) 往上放一层

将多个张量沿着一个新维度拼接，增加一个新的维度
会新建内存，stack 后 tensor 是连续的

a = torch.tensor([1, 2]) #(2,)
b = torch.tensor([3, 4])
c = torch.tensor([5, 6])

torch.stack([a, b, c], dim=0) # (3,2) tensor([[1, 2],[3, 4],[5, 6]])
torch.stack([a, b, c], dim=1) # (2,3) tensor([[1, 3, 5],[2, 4, 6]]) 新增的维度在第1维

torch.cat(tensors, dim=0) 连接起来

沿已有维度拼接 tensor，不增加新维度。
内存可能连续也可能不连续，cat 会返回新 tensor（通常连续）

1
2
3

a = torch.ones(2,3)
b = torch.zeros(2,3)
y = torch.cat([a,b], dim=0) # (4,3)

.expand(x.shape[0], -1, -1)

x.shape[0] 替换了第一个维度（原来必须是1），“复制”维度为1的数据，扩展成 batch size。
-1 表示保持原维度大小不变（第二和第三维不变）。
不复制数据（共享内存），只是逻辑上重复元素。

slice / narrow()

contiguous 可能变成 False，offset 会改变

x = torch.arange(12).reshape(3,4)
y1 = x[:, 1:3]          # slice x[start:end] (3,2)
y2 = x[:, 0]  			# slice (3,)
y3 = x.narrow(1, 1, 2)  # narrow x.narrow(dim, start, length)(3,2)
y4 = x[x > 8]    #输出是 1D tensor:tensor([9,10,11])

张量广播（broadcasting)

自动逻辑扩展张量维度，使操作兼容。只能广播维度为 1或相等的情况
不会复制内存，只是让操作时逻辑上看成扩展后的 shape，stride 不会增加
使用 .expand() 可以显式广播。

x = [[0],
     [1],
     [2]]   shape (3,1)
y = [[1,1,1,1], [1,1,1,1], [1,1,1,1]]

x broadcast → (3,4)
[[0,0,0,0],
 [1,1,1,1],
 [2,2,2,2]]
z = x + y → [[1,1,1,1], [2,2,2,2], [3,3,3,3]]

自动梯度（autograd）

PyTorch 自动跟踪 tensor 操作，计算梯度。
x = torch.randn(1, requires_grad=True) 时，会记录操作历史生成计算图。
.backward() 自动计算梯度。

注：
1、view / slice / narrow 等操作生成的新 tensor 保留计算图
2、广播也会被记录
3、.detach() 可以切断计算图
4、在 GPU / CPU 上都支持

针对ndarray

# 将one-hot编码的标签y转换为类别索引标签
y = np.argmax(y, axis=1) # 对每一行找出最大值的索引，得到类别标签的整数形式
# example:
y_train = np.array([
    [0, 0, 1],   # 类别 2
    [1, 0, 0],   # 类别 0
    [0, 1, 0],   # 类别 1
])

y_train = np.argmax(y_train, axis=1) # array([2, 0, 1])

ML model input

one-hot 接受情况

模型类别	模型名称/框架	是否接受 One-Hot 标签	正确标签格式
传统机器学习模型	`LogisticRegression` (sklearn)	❌ 不接受	整数类别标签（如 0,1,2）
	`SVM` / `SVC`	❌ 不接受	整数类别标签
	`RandomForestClassifier`	❌ 不接受	整数类别标签
	`KNeighborsClassifier`	❌ 不接受	整数类别标签
	`DecisionTreeClassifier`	❌ 不接受	整数类别标签
	`GaussianNB`	❌ 不接受	整数类别标签
神经网络框架	PyTorch（自定义分类网络）	❌ 通常不接受	整数标签，用 CrossEntropyLoss
	TensorFlow / Keras + SparseCategoricalCrossentropy	❌ 不接受	整数标签（如 0,1,2）
	TensorFlow / Keras + CategoricalCrossentropy	✅ 接受	One-hot 标签

框架	常用损失函数	标签格式要求
PyTorch	`CrossEntropyLoss`	整数标签（`[1, 2, 0, ...]`）
Keras	`SparseCategoricalCrossentropy`	整数标签
Scikit-learn	所有分类模型	整数标签

模型训练相关

.detach() 作用

features = encoder(x)          # encoder前向，生成特征
output = task(features)        # task用特征做预测
loss = loss_fn(output, target) # 计算损失
loss.backward()                # 反向传播，默认梯度会传到encoder和task参数

# 但如果
features = encoder(x).detach() # 断开连接
output = task(features)
loss = loss_fn(output, target)
loss.backward()
# 梯度只会传到task参数，encoder参数不变

hydra

配置组合（Configuration Composition）

Hydra 允许你将多个配置文件（YAML）组合起来，这种方式称为配置组合（config composition）。你可以用模块化的方式组织配置，从而提高可维护性和重用性。

# config.yaml
defaults:
  - model: resnet
  - dataset: imagenet

# config/model/resnet.yaml
hidden_size: 256
layers: 50

命令行覆盖配置

Hydra 允许你在命令行中覆盖配置参数，无需修改代码或配置文件本身。

1	python train.py model.hidden_size=512 dataset=mnist

动态“添加”一个新配置组

1
2

python demo_forcefield.py \
    +experiment=downstream_task/forcefield/gelsight_dino \ # 这个参数会被Hydra当作一个配置 override, 动态“添加”一个配置组项，名为 experiment(在 defaults: 中没有声明过)

多运行支持（Multirun）

通过 -m 选项，Hydra 可以自动运行多个配置组合，用于超参数搜索等任务。

1	python train.py -m model.hidden_size=128,256,512

💡使用示例

import hydra
from omegaconf import DictConfig

# @hydra.main(config_path="config", config_name="default")
@hydra.main(version_base="1.3", config_path="config") # 1.3以上，如果主配置目录中只有一个.yaml文件（比如 default.yaml），可以省略 config_name，Hydra 会自动使用它。
def my_app(cfg: DictConfig):
    print(cfg)

if __name__ == "__main__":
    my_app()

配置文件结构：
config/
├── default.yaml
├── model/
│   ├── resnet.yaml
│   └── vgg.yaml
└── dataset/
    ├── mnist.yaml
    └── imagenet.yaml