Loading... ## 前言 Transformer模型源码解析过程参见前文, 这里是后续的实战示例. <div class="preview"> <div class="post-inser post box-shadow-wrap-normal"> <a href="https://zoe.red/2024/367.html" target="_blank" class="post_inser_a no-external-link no-underline-link"> <div class="inner-image bg" style="background-image: url(https://zoe.red/usr/uploads/2024/03/3146678007.png);background-size: cover;"></div> <div class="inner-content" > <p class="inser-title">Tranformer源码解读(Pytorch版本)</p> <div class="inster-summary text-muted"> 前言哈佛NLP团队实现的Pytorch版代码, 项目链接. 论文链接以下内容按照自己的理解过程一步步记录, 与源码... </div> </div> </a> <!-- .inner-content #####--> </div> <!-- .post-inser ####--> </div> 序列拷贝任务: 即对于任何输入序列seq, 输出seq第二个字符及以后的部分(即seq[1:]) --- ## 合成数据 词汇表特殊字符约定:设置3个特殊字符 * 0: `<pad>`, 有限的模拟句子长短不一的情况 * 1: `<s>`, * 2: `</s>` 创建Dataset, 每次返回对齐后长度为sentence_length的序列, 在模拟句子长短不一的时候, 设置了末尾补全字符范围最多为序列长度的1/3 ```python class RandomSentenceDataset(Dataset): """说明 批量训练评测数据构造, """ def __init__(self, vocab, dataset_length, sentence_length, pad_rng=0, random_seed=0): """_summary_ Args: vocob (_type_): 词典长度 dataset_length (_type_): 数据集大小 sentence_length (_type_): 句子长度 pad_rng (int, optional): 模拟句子末尾随机出现最多pad字符的个数. Defaults to 0. """ super().__init__() self.vocab = vocab self.dataset_length = dataset_length self.sentence_length = sentence_length #self.pad_rng = pad_rng # init random seed np.random.seed(random_seed) @staticmethod def generate_sentence(low, high, sen_len): # data value range: [low, high-1] sentence = torch.from_numpy(np.random.randint(low, high, size=(sen_len,))) # sentence with pad pad_rng = sen_len//3 pad_cnt = np.random.randint(-pad_rng, pad_rng) if pad_cnt < 0: sentence[pad_cnt:] = 0 sentence[pad_cnt-1] = 2 else: #保持不变 if np.random.rand() > 0.5: sentence[-1] = 2 # src = sentence.clone() src = sentence tgt = sentence.clone() tgt[0] = 1 # 此处设置为开始字符, 预测序列 return src, tgt def __getitem__(self, index): # low=3, 表示只设置3个特殊字符, 0: pad, 1: <s>, 2: </s>; 有限的模拟句子长短不一的情况 src, tgt = self.generate_sentence(low=3, high=self.vocab, sen_len=self.sentence_length) return src, tgt def __len__(self): return self.dataset_length ``` 创建Dataloader示例 ```python class Batch: """Object for holding a batch of data with mask during training. 约定: 示例数据集中, * 特殊字符 <PAD>/<UNK>补全与忽略字符用0表示, 在生成的MASK中对应区域也标记为0 * 特殊字符 <S>句子开始用1表示, </S>句子结束使用2表示 """ def __init__(self, src, tgt=None, pad=0): self.src = src self.src_mask = (src != pad).unsqueeze(-2) # [B,Sentence] => [B, 1, Sentence] if tgt is not None: self.tgt = tgt[:, :-1] # 目标句子的[0, n-1], 作为decoder输入 self.tgt_y = tgt[:, 1:] # 目标句子的[1, n], 作为预测输出的GT self.tgt_mask = self.make_std_mask(self.tgt, pad) self.ntokens = (self.tgt_y != pad).data.sum() @staticmethod def make_std_mask(tgt, pad): "Create a mask to hide padding and future words." tgt_mask = (tgt != pad).unsqueeze(-2) # [B,Sentence-1] => [B, 1, Sentence-1] # [B, 1, Sentence-1] & [1, Sentence-1, Sentence-1] => [B, Sentence-1, Sentence-1] # 解释: 2个张量分别被广播, 变为[B, Sentence-1, Sentence-1]之后再运算 tgt_mask = tgt_mask & subsequent_mask(tgt.size(-1)).type_as(tgt_mask.data) return tgt_mask def collate_fn(batch): """ 输入为list, 为batch_size个dataset的__getitem__ 的返回结果 """ tensor_src = torch.stack([_[0] for _ in batch], dim=0) tensor_tgt = torch.stack([_[1] for _ in batch], dim=0) batch = Batch(tensor_src, tensor_tgt, pad=0) return batch train_dataset = RandomSentenceDataset( vocab=configs.vocab, dataset_length=configs.sentence_count_train, sentence_length=configs.sentence_lenght, random_seed=epoch_index ) train_data_loader = DataLoader( train_dataset, batch_size=configs.batch_size, shuffle=True, num_workers=configs.num_workers, collate_fn=collate_fn, drop_last=True, ) ``` --- ## 训练推理 *训练损失与优化器* ```python criterion = LabelSmoothing(size=configs.vocab, padding_idx=0, smoothing=0.0) optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, betas=(0.9, 0.98), eps=1e-9, weight_decay=1e-5) ``` *学习率策略:* * 按照每10个epoch降低一个数量级 *完整代码链接:* * Github链接: 待补充... *输出日志* ``` CPU信息: 物理核心: 12, 线程数: 24, 使用率: 0.5% 型号: AMD Ryzen 9 3900X 12-Core Processor RAM信息: 全部: 64,237.3 MB, 空闲: 60,682.3 MB, 全部使用: 2,833.6 MB, 当前进程使用: 369.7 MB =========================== GPU数量: 2 =========================== 驱动: v550.54.15 GPU-0 Model: NVIDIA GeForce RTX 4060 Ti; VRAM: total=16380.0 MB, free=16054.9 MB, used=325.1 MB GPU-1 Model: NVIDIA GeForce RTX 4060 Ti; VRAM: total=16380.0 MB, free=16067.6 MB, used=312.4 MB PyTorch 版本: 2.2.2+cu121 CUDA 可用性: True CUDA 版本: 12.1 cuDNN 版本: 8902 ***************************** 全局参数 ***************************** epochs_train: 12 interval_eval: 1 check_point_dir: ./checkpoints_with_special_tags_3 batch_size: 256 sentence_lenght: 12 vocab: 22 embed_dim: 512 N: 2 sentence_count_train: 100000 sentence_count_eval: 20000 num_workers: 4 gpu_ids: [0, 1] 模型参数位置: cuda:0 ************************* epoch=01/12 ************************** train with lr 0.001... 24/03/29 21:38:09 Epoch Step: 001/390, Loss: 4.189613 Tokens per Sec: 2,963.1 24/03/29 21:38:11 Epoch Step: 051/390, Loss: 2.123400 Tokens per Sec: 64,198.5 24/03/29 21:38:13 Epoch Step: 101/390, Loss: 0.207219 Tokens per Sec: 67,031.0 24/03/29 21:38:15 Epoch Step: 151/390, Loss: 0.055500 Tokens per Sec: 67,079.0 24/03/29 21:38:16 Epoch Step: 201/390, Loss: 0.038424 Tokens per Sec: 66,888.3 24/03/29 21:38:18 Epoch Step: 251/390, Loss: 0.038157 Tokens per Sec: 66,952.9 24/03/29 21:38:20 Epoch Step: 301/390, Loss: 0.066127 Tokens per Sec: 66,932.4 24/03/29 21:38:22 Epoch Step: 351/390, Loss: 0.018545 Tokens per Sec: 67,074.1 24/03/29 21:38:24 Epoch Step: 390/390, Loss: 0.073214 Tokens per Sec: 66,672.6 函数 run_epoch 执行时间: 15.5 s. train mean loss: 0.5736670 valid... 24/03/29 21:38:24 Epoch Step: 001/79, Loss: 0.009082 Tokens per Sec: 18,144.9 24/03/29 21:38:24 Epoch Step: 051/79, Loss: 0.007587 Tokens per Sec: 153,905.2 24/03/29 21:38:25 Epoch Step: 079/79, Loss: 0.001371 Tokens per Sec: 167,321.4 函数 run_epoch 执行时间: 1.4 s. valid mean loss: 0.0051434 ... ************************* epoch=12/12 ************************** train with lr 0.0001... 24/03/29 21:41:12 Epoch Step: 001/390, Loss: 0.000022 Tokens per Sec: 14,651.4 24/03/29 21:41:13 Epoch Step: 051/390, Loss: 0.000023 Tokens per Sec: 66,913.9 24/03/29 21:41:15 Epoch Step: 101/390, Loss: 0.000020 Tokens per Sec: 67,015.1 24/03/29 21:41:17 Epoch Step: 151/390, Loss: 0.000031 Tokens per Sec: 66,855.2 24/03/29 21:41:19 Epoch Step: 201/390, Loss: 0.000021 Tokens per Sec: 66,921.5 24/03/29 21:41:21 Epoch Step: 251/390, Loss: 0.000028 Tokens per Sec: 64,088.3 24/03/29 21:41:23 Epoch Step: 301/390, Loss: 0.000021 Tokens per Sec: 66,921.2 24/03/29 21:41:25 Epoch Step: 351/390, Loss: 0.000024 Tokens per Sec: 66,863.1 24/03/29 21:41:26 Epoch Step: 390/390, Loss: 0.000020 Tokens per Sec: 67,185.8 函数 run_epoch 执行时间: 14.8 s. train mean loss: 0.0000770 valid... 24/03/29 21:41:26 Epoch Step: 001/79, Loss: 0.000001 Tokens per Sec: 20,032.5 24/03/29 21:41:27 Epoch Step: 051/79, Loss: 0.000001 Tokens per Sec: 169,981.0 24/03/29 21:41:27 Epoch Step: 079/79, Loss: 0.000001 Tokens per Sec: 168,051.6 函数 run_epoch 执行时间: 1.3 s. valid mean loss: 0.0000007 # inference 加载权重初始化: epoch=12, eval_loss=0.0000007 模型参数位置: cuda:0 ---------------------------复现验证集评估结果---------------------------- 24/03/29 21:43:24 Epoch Step: 001/79, Loss: 0.000001 Tokens per Sec: 2,678.5 24/03/29 21:43:25 Epoch Step: 051/79, Loss: 0.000001 Tokens per Sec: 173,034.8 24/03/29 21:43:26 Epoch Step: 079/79, Loss: 0.000001 Tokens per Sec: 167,445.7 函数 run_epoch 执行时间: 2.1 s. valid mean loss: 0.0000007 -------------------------Batch_Index=0-------------------------- sentence_0 src =tensor([19, 3, 5, 15, 11, 20, 7, 15, 13, 21, 2, 0]) sentence_0 pre =tensor([ 1, 3, 5, 15, 11, 20, 7, 15, 13, 21, 2, 15]) sentence_1 src =tensor([ 5, 8, 12, 15, 16, 3, 4, 3, 10, 21, 13, 2]) sentence_1 pre =tensor([ 1, 8, 12, 15, 16, 3, 4, 3, 10, 21, 13, 2]) sentence_2 src =tensor([ 7, 9, 18, 5, 19, 9, 13, 9, 19, 7, 18, 13]) sentence_2 pre =tensor([ 1, 9, 18, 5, 19, 9, 13, 9, 19, 7, 18, 13]) sentence_3 src =tensor([ 5, 10, 19, 6, 13, 10, 13, 12, 20, 2, 0, 0]) sentence_3 pre =tensor([ 1, 10, 19, 6, 13, 10, 13, 12, 20, 2, 2, 6]) sentence_4 src =tensor([ 4, 11, 21, 16, 16, 10, 21, 5, 17, 2, 0, 0]) sentence_4 pre =tensor([ 1, 11, 21, 16, 16, 10, 21, 5, 17, 2, 2, 16]) sentence_5 src =tensor([ 5, 19, 20, 5, 4, 8, 12, 8, 13, 11, 7, 2]) sentence_5 pre =tensor([ 1, 19, 20, 5, 4, 8, 12, 8, 13, 11, 7, 2]) sentence_6 src =tensor([12, 19, 16, 13, 14, 16, 21, 9, 18, 2, 0, 0]) sentence_6 pre =tensor([ 1, 19, 16, 13, 14, 16, 21, 9, 18, 2, 2, 13]) sentence_7 src =tensor([ 9, 10, 20, 18, 16, 16, 10, 10, 4, 18, 2, 0]) sentence_7 pre =tensor([ 1, 10, 20, 18, 16, 16, 10, 10, 4, 18, 2, 18]) ``` 注意句子范围: 2表示句子结束, 1表示句子开始 ![贪心预测-序列拷贝-推理示例](https://zoe.red/usr/uploads/2024/03/4056394574.png) THE END 本文作者:将夜 本文链接:https://zoe.red/2024/373.html 版权声明:本博客所有文章除特别声明外,均默认采用 CC BY-NC-SA 4.0 许可协议。 最后修改:2024 年 03 月 29 日 © 允许规范转载 赞 如果觉得我的文章对你有用,请随意赞赏