快乐建站网,劳务 东莞网站建设,免费数据查询网站,开发网站开票写什么文章目录投机采样模块分析导入库模型初始化投机采样函数Draft阶段Verify阶段验证草稿token生成剩余部分输出结果示例用法EAGLE来源#xff1a; 详细解释内容可参考 EAGLE投机采样 投机采样 import torch
from transformers import AutoModelForCausalLM, AutoTokenizer# 初始化…文章目录投机采样模块分析导入库模型初始化投机采样函数Draft阶段Verify阶段验证草稿token生成剩余部分输出结果示例用法EAGLE来源 详细解释内容可参考 EAGLE投机采样投机采样importtorchfromtransformersimportAutoModelForCausalLM,AutoTokenizer# 初始化模型和tokenizerdraft_model_namegoogle/gemma-2b# 选用一个轻量级模型target_model_namemeta-llama/Llama-2-7b-chat-hf# 选用一个性能更好的模型draft_modelAutoModelForCausalLM.from_pretrained(draft_model_name,device_mapauto)target_modelAutoModelForCausalLM.from_pretrained(target_model_name,device_mapauto)draft_tokenizerAutoTokenizer.from_pretrained(draft_model_name)target_tokenizerAutoTokenizer.from_pretrained(target_model_name)defspeculative_decode(prompt,draft_model,target_model,draft_tokenizer,target_tokenizer,num_draft_tokens5,acceptance_threshold0.8): 简单的投机采样实现 Args: prompt: 输入prompt (字符串) draft_model: Draft Model target_model: Target Model draft_tokenizer: Draft Model的tokenizer target_tokenizer: Target Model的tokenizer num_draft_tokens: Draft Model生成的token数量 acceptance_threshold: 接受Draft token的概率阈值 Returns: 生成的文本 (字符串) # 1. Draft 阶段draft_inputdraft_tokenizer(prompt,return_tensorspt).to(draft_model.device)draft_outputdraft_model.generate(**draft_input,max_new_tokensnum_draft_tokens)draft_tokensdraft_output[:,draft_input[input_ids].shape[-1]:]draft_textdraft_tokenizer.batch_decode(draft_tokens,skip_special_tokensTrue)[0]# 2. Verify 阶段target_inputtarget_tokenizer(prompt,return_tensorspt).to(target_model.device)target_logitstarget_model(**target_input).logits initial_target_tokentorch.argmax(target_logits[:,-1,:],dim-1)# 获取第一个token的预测accepted_tokens[initial_target_token.item()]# 存储接受的token, 初始化第一个tokenrejected_indices[]# 迭代验证Draft Model生成的tokenforiinrange(num_draft_tokens):# 构建上下文包含prompt和之前接受的tokencontext_tokenstarget_tokenizer(prompt,return_tensorspt).to(target_model.device)[input_ids]context_tokenstorch.cat([context_tokens,torch.tensor([[accepted_tokens[j]forjinrange(len(accepted_tokens))]])],dim1)target_logitstarget_model(**{input_ids:context_tokens}).logits# 注意使用context_tokens作为输入target_probstorch.softmax(target_logits[:,-1,:],dim-1)target_prob_for_draft_tokentarget_probs[0,draft_tokens[0,i]].item()iftarget_prob_for_draft_tokenacceptance_threshold:accepted_tokens.append(draft_tokens[0,i].item())else:rejected_indices.append(i)break# 一旦有token被拒绝停止验证# 3. 生成剩余部分 (如果还有未验证的token)ifrejected_indices:# 在被拒绝的token位置使用Target Model生成新的tokencontext_tokenstarget_tokenizer(prompt,return_tensorspt).to(target_model.device)[input_ids]context_tokenstorch.cat([context_tokens,torch.tensor([[accepted_tokens[j]forjinrange(len(accepted_tokens))]])],dim1)target_outputtarget_model.generate(**{input_ids:context_tokens},max_new_tokens1)# 生成一个新tokennew_tokentarget_output[:,context_tokens.shape[-1]:]accepted_tokens.append(new_token[0,0].item())# 将新生成的token添加到accepted_tokens# 将所有接受的token转换为文本generated_texttarget_tokenizer.batch_decode([accepted_tokens],skip_special_tokensTrue)[0]returngenerated_text# 示例用法promptThe capital of France isgenerated_textspeculative_decode(prompt,draft_model,target_model,draft_tokenizer,target_tokenizer)print(fGenerated text:{generated_text})模块分析导入库importtorchfromtransformersimportAutoModelForCausalLM,AutoTokenizertorchPyTorch深度学习框架用于张量计算和模型训练。AutoModelForCausalLMHugging Face Transformers提供的自动加载因果语言模型的类。AutoTokenizerHugging Face Transformers提供的自动加载tokenizer的类。模型初始化draft_model_namegoogle/gemma-2btarget_model_namemeta-llama/Llama-2-7b-chat-hfdraft_modelAutoModelForCausalLM.from_pretrained(draft_model_name,device_mapauto)target_modelAutoModelForCausalLM.from_pretrained(target_model_name,device_mapauto)draft_tokenizerAutoTokenizer.from_pretrained(draft_model_name)target_tokenizerAutoTokenizer.from_pretrained(target_model_name)draft_model_name和target_model_name分别指定轻量级草稿模型和更强大的目标模型的Hugging Face模型名称。AutoModelForCausalLM.from_pretrained加载预训练的语言模型device_mapauto自动分配设备GPU/CPU。AutoTokenizer.from_pretrained加载与模型对应的tokenizer。投机采样函数defspeculative_decode(prompt,draft_model,target_model,draft_tokenizer,target_tokenizer,num_draft_tokens5,acceptance_threshold0.8):参数说明prompt输入的文本提示。draft_model和target_model草稿模型和目标模型。draft_tokenizer和target_tokenizer对应的tokenizer。num_draft_tokens草稿模型生成的token数量。acceptance_threshold接受草稿token的概率阈值。Draft阶段draft_inputdraft_tokenizer(prompt,return_tensorspt).to(draft_model.device)draft_outputdraft_model.generate(**draft_input,max_new_tokensnum_draft_tokens)draft_tokensdraft_output[:,draft_input[input_ids].shape[-1]:]draft_textdraft_tokenizer.batch_decode(draft_tokens,skip_special_tokensTrue)[0]draft_tokenizer将输入文本转换为模型输入的张量。draft_model.generate生成指定数量的token。draft_tokens提取生成的token排除输入部分。draft_text将生成的token解码为文本。Verify阶段target_inputtarget_tokenizer(prompt,return_tensorspt).to(target_model.device)target_logitstarget_model(**target_input).logits initial_target_tokentorch.argmax(target_logits[:,-1,:],dim-1)accepted_tokens[initial_target_token.item()]rejected_indices[]target_tokenizer将输入文本转换为目标模型的输入张量。target_model获取目标模型的输出logits。initial_target_token获取目标模型预测的第一个token。accepted_tokens初始化接受的token列表。rejected_indices存储被拒绝的token索引。验证草稿tokenforiinrange(num_draft_tokens):context_tokenstarget_tokenizer(prompt,return_tensorspt).to(target_model.device)[input_ids]context_tokenstorch.cat([context_tokens,torch.tensor([[accepted_tokens[j]forjinrange(len(accepted_tokens))]])],dim1)target_logitstarget_model(**{input_ids:context_tokens}).logits target_probstorch.softmax(target_logits[:,-1,:],dim-1)target_prob_for_draft_tokentarget_probs[0,draft_tokens[0,i]].item()iftarget_prob_for_draft_tokenacceptance_threshold:accepted_tokens.append(draft_tokens[0,i].item())else:rejected_indices.append(i)breakcontext_tokens构建包含输入和已接受token的上下文。target_model计算目标模型对当前上下文的预测概率。target_prob_for_draft_token获取草稿token在目标模型中的概率。如果概率高于阈值接受该token否则拒绝并终止验证。生成剩余部分ifrejected_indices:context_tokenstarget_tokenizer(prompt,return_tensorspt).to(target_model.device)[input_ids]context_tokenstorch.cat([context_tokens,torch.tensor([[accepted_tokens[j]forjinrange(len(accepted_tokens))]])],dim1)target_outputtarget_model.generate(**{input_ids:context_tokens},max_new_tokens1)new_tokentarget_output[:,context_tokens.shape[-1]:]accepted_tokens.append(new_token[0,0].item())如果存在被拒绝的token使用目标模型生成一个新的token。target_model.generate生成一个新token。new_token提取生成的token并添加到接受列表。输出结果generated_texttarget_tokenizer.batch_decode([accepted_tokens],skip_special_tokensTrue)[0]returngenerated_textbatch_decode将接受的token解码为文本。返回生成的文本。示例用法promptThe capital of France isgenerated_textspeculative_decode(prompt,draft_model,target_model,draft_tokenizer,target_tokenizer)print(fGenerated text:{generated_text})输入提示文本调用投机采样函数生成文本并打印结果。EAGLEimporttorchimporttorch.nnasnnimporttorch.optimasoptimfromtransformersimportAutoModelForCausalLM,AutoTokenizerimportnumpyasnp# 假设我们使用困惑度作为特征并简化特征预测模型classFeaturePredictor(nn.Module):def__init__(self,hidden_size):super(FeaturePredictor,self).__init__()self.linearnn.Linear(hidden_size,1)# 预测困惑度defforward(self,hidden_states):# hidden_states: (batch_size, sequence_length, hidden_size)perplexityself.linear(hidden_states).squeeze(-1)# (batch_size, sequence_length)returnperplexitydeftrain_feature_predictor(target_model,tokenizer,feature_predictor,num_epochs3,learning_rate1e-4): 训练特征预测器使用Target Model的hidden states和困惑度作为训练数据 Args: target_model: Target Model tokenizer: Target Model的tokenizer feature_predictor: 特征预测模型 num_epochs: 训练轮数 learning_rate: 学习率 optimizeroptim.Adam(feature_predictor.parameters(),lrlearning_rate)criterionnn.MSELoss()# 生成一些训练数据texts[The quick brown fox jumps over the lazy dog.,The capital of France is Paris.,Machine learning is a fascinating field.,Coding is fun and challenging.]forepochinrange(num_epochs):fortextintexts:inputstokenizer(text,return_tensorspt,paddingTrue,truncationTrue).to(target_model.device)withtorch.no_grad():outputstarget_model(**inputs,output_hidden_statesTrue)hidden_statesoutputs.hidden_states[-1]# 使用最后一层的hidden states# 计算困惑度 (作为ground truth)logitsoutputs.logits shift_logitslogits[:,:-1,:].contiguous()shift_labelsinputs[input_ids][:,1:].contiguous()loss_fctnn.CrossEntropyLoss()lossloss_fct(shift_logits.view(-1,shift_logits.size(-1)),shift_labels.view(-1))perplexitytorch.exp(loss)# 使用hidden states预测困惑度predicted_perplexityfeature_predictor(hidden_states[:,:-1,:])# 预测除了第一个token之外的perplexity# 计算损失并更新模型losscriterion(predicted_perplexity,torch.full_like(predicted_perplexity,perplexity.item()))optimizer.zero_grad()loss.backward()optimizer.step()print(fEpoch{epoch1}, Loss:{loss.item()})defeagle_speculative_decode(prompt,draft_model,target_model,draft_tokenizer,target_tokenizer,feature_predictor,num_draft_tokens5,acceptance_threshold0.8,alpha0.5): EAGLE投机采样实现 Args: prompt: 输入prompt (字符串) draft_model: Draft Model target_model: Target Model draft_tokenizer: Draft Model的tokenizer target_tokenizer: Target Model的tokenizer feature_predictor: 特征预测模型 num_draft_tokens: Draft Model生成的token数量 acceptance_threshold: 接受Draft token的概率阈值 alpha: 权重系数用于平衡Draft Model和特征预测 Returns: 生成的文本 (字符串) # 1. Draft 阶段draft_inputdraft_tokenizer(prompt,return_tensorspt).to(draft_model.device)draft_outputdraft_model.generate(**draft_input,max_new_tokensnum_draft_tokens,output_hidden_statesTrue,return_dict_in_generateTrue)draft_tokensdraft_output.sequences[:,draft_input[input_ids].shape[-1]:]draft_textdraft_tokenizer.batch_decode(draft_tokens,skip_special_tokensTrue)[0]draft_hidden_statesdraft_output.hidden_states[-1]# 获取Draft Model的hidden states# 2. 特征预测 (困惑度)predicted_perplexitiesfeature_predictor(draft_hidden_states[:,draft_input[input_ids].shape[-1]:,:]).detach().cpu().numpy()# 预测每个token的困惑度# 3. Verify 阶段target_inputtarget_tokenizer(prompt,return_tensorspt).to(target_model.device)target_logitstarget_model(**target_input).logits initial_target_tokentorch.argmax(target_logits[:,-1,:],dim-1)# 获取第一个token的预测accepted_tokens[initial_target_token.item()]# 存储接受的token, 初始化第一个tokenrejected_indices[]# 迭代验证Draft Model生成的tokenforiinrange(num_draft_tokens):# 构建上下文包含prompt和之前接受的tokencontext_tokenstarget_tokenizer(prompt,return_tensorspt).to(target_model.device)[input_ids]context_tokenstorch.cat([context_tokens,torch.tensor([[accepted_tokens[j]forjinrange(len(accepted_tokens))]])],dim1)target_logitstarget_model(**{input_ids:context_tokens}).logits# 注意使用context_tokens作为输入target_probstorch.softmax(target_logits[:,-1,:],dim-1)target_prob_for_draft_tokentarget_probs[0,draft_tokens[0,i]].item()# 4. 调整采样概率 (这里简化为直接使用困惑度作为调整)# 假设Target Model期望的困惑度较低因此困惑度越低越容易接受feature_prob1.0-np.clip(predicted_perplexities[0,i]/10.0,0.0,1.0)# 将困惑度映射到0-1之间的概率# 加权平均采样概率adjusted_probalpha*target_prob_for_draft_token(1-alpha)*feature_probifadjusted_probacceptance_threshold:accepted_tokens.append(draft_tokens[0,i].item())else:rejected_indices.append(i)break# 一旦有token被拒绝停止验证# 5. 生成剩余部分 (如果还有未验证的token)ifrejected_indices:# 在被拒绝的token位置使用Target Model生成新的tokencontext_tokenstarget_tokenizer(prompt,return_tensorspt).to(target_model.device)[input_ids]context_tokenstorch.cat([context_tokens,torch.tensor([[accepted_tokens[j]forjinrange(len(accepted_tokens))]])],dim1)target_outputtarget_model.generate(**{input_ids:context_tokens},max_new_tokens1)# 生成一个新tokennew_tokentarget_output[:,context_tokens.shape[-1]:]accepted_tokens.append(new_token[0,0].item())# 将新生成的token添加到accepted_tokens# 将所有接受的token转换为文本generated_texttarget_tokenizer.batch_decode([accepted_tokens],skip_special_tokensTrue)[0]returngenerated_text# 初始化模型和tokenizer (这里简化为使用同一个模型)model_namegoogle/gemma-2bdraft_modelAutoModelForCausalLM.from_pretrained(model_name,device_mapauto)target_modelAutoModelForCausalLM.from_pretrained(model_name,device_mapauto)draft_tokenizerAutoTokenizer.from_pretrained(model_name)target_tokenizerAutoTokenizer.from_pretrained(model_name)# 初始化特征预测模型hidden_sizedraft_model.config.hidden_size feature_predictorFeaturePredictor(hidden_size).to(draft_model.device)# 训练特征预测模型train_feature_predictor(target_model,target_tokenizer,feature_predictor)# 示例用法promptThe capital of France isgenerated_texteagle_speculative_decode(prompt,draft_model,target_model,draft_tokenizer,target_tokenizer,feature_predictor)print(fGenerated text (EAGLE):{generated_text})