最新要闻
- 【当前独家】时文选粹摘抄100字_时文选粹 摘抄
- 2023上半年票房冠军公布:《满江红》 由沈腾主演_天天亮点
- 千万别搜索“李斯特菌” 搜完我把冰箱里食物扔了|天天实时
- 3年过去 骁龙888处理器的库存还没清完:三星被逼重发Galaxy S21 FE
- 1999元 LG新款27寸显示器开售:2K 165Hz IPS屏
- 世界聚焦:Kindle真的要拿来盖泡面了!官方店铺正式停运
- 超越南方!大数据看这一个6月北方有多热:专家科普原因
- 素描鼻子的画法(素描鼻子)
- 马斯克:决斗可能约在斗兽场 需要锻炼一下我的耐力
- 世界今头条!湖南遇特大暴雨:市民街头抱团互救 防灾减灾进行时
- 当前时讯:中国创纪录一箭41星揭秘:印度一箭104星弱爆了
- 小学毕业典礼多名学生疑因天热晕倒 学校:当天多云 孩子因低血糖出现状况|速看料
- 3499元 优派新款便携显示器上架:4K OLED屏-环球快报
- 快讯:科普书单·新书|动物园开饭啦
- 月销2.1万辆 大众速腾彻底爆发 反超丰田卡罗拉
- 焦点要闻:胖东来老板称企业家要活得像人:要对得起自己的人生
广告
手机
光庭信息跌4.57% 2021上市超募11亿2022扣非降74% 时快讯
搜狐汽车全球快讯 | 大众汽车最新专利曝光:仪表支持拆卸 可用手机、平板替代-环球关注
- 光庭信息跌4.57% 2021上市超募11亿2022扣非降74% 时快讯
- 搜狐汽车全球快讯 | 大众汽车最新专利曝光:仪表支持拆卸 可用手机、平板替代-环球关注
- 视点!美国首位女总统即将诞生?拜登恐怕要提前下岗,美政坛迎来变局?
- 当前速递!用理想仪器实现更好的颗粒 德国新帕泰克亮相CPHI & PMEC China获好评
- 微粒贷怎么申请开通 开通方法如下
- 焦点简讯:心疼!这位40岁的云南缉毒警,已是满头白发
家电
天天观速讯丨我在树莓派上跑通了bert模型,使用numpy实现bert模型,使用hugging face 或pytorch训练模型,保存参数为numpy格式,然后使用numpy加载模型推理
(资料图)
之前分别用numpy实现了mlp,cnn,lstm,这次搞一个大一点的模型bert,纯numpy实现,最重要的是可在树莓派上或其他不能安装pytorch的板子上运行,推理数据
本次模型是随便在hugging face上找的一个新闻评论的模型,7分类
看这些模型参数,这并不重要,模型占硬盘空间都要400+M
bert.embeddings.word_embeddings.weight torch.Size([21128, 768])bert.embeddings.position_embeddings.weight torch.Size([512, 768])bert.embeddings.token_type_embeddings.weight torch.Size([2, 768])bert.embeddings.LayerNorm.weight torch.Size([768])bert.embeddings.LayerNorm.bias torch.Size([768])bert.encoder.layer.0.attention.self.query.weight torch.Size([768, 768])bert.encoder.layer.0.attention.self.query.bias torch.Size([768])bert.encoder.layer.0.attention.self.key.weight torch.Size([768, 768])bert.encoder.layer.0.attention.self.key.bias torch.Size([768])bert.encoder.layer.0.attention.self.value.weight torch.Size([768, 768])bert.encoder.layer.0.attention.self.value.bias torch.Size([768])bert.encoder.layer.0.attention.output.dense.weight torch.Size([768, 768])bert.encoder.layer.0.attention.output.dense.bias torch.Size([768])bert.encoder.layer.0.attention.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.0.attention.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.0.intermediate.dense.weight torch.Size([3072, 768])bert.encoder.layer.0.intermediate.dense.bias torch.Size([3072])bert.encoder.layer.0.output.dense.weight torch.Size([768, 3072])bert.encoder.layer.0.output.dense.bias torch.Size([768])bert.encoder.layer.0.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.0.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.1.attention.self.query.weight torch.Size([768, 768])bert.encoder.layer.1.attention.self.query.bias torch.Size([768])bert.encoder.layer.1.attention.self.key.weight torch.Size([768, 768])bert.encoder.layer.1.attention.self.key.bias torch.Size([768])bert.encoder.layer.1.attention.self.value.weight torch.Size([768, 768])bert.encoder.layer.1.attention.self.value.bias torch.Size([768])bert.encoder.layer.1.attention.output.dense.weight torch.Size([768, 768])bert.encoder.layer.1.attention.output.dense.bias torch.Size([768])bert.encoder.layer.1.attention.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.1.attention.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.1.intermediate.dense.weight torch.Size([3072, 768])bert.encoder.layer.1.intermediate.dense.bias torch.Size([3072])bert.encoder.layer.1.output.dense.weight torch.Size([768, 3072])bert.encoder.layer.1.output.dense.bias torch.Size([768])bert.encoder.layer.1.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.1.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.2.attention.self.query.weight torch.Size([768, 768])bert.encoder.layer.2.attention.self.query.bias torch.Size([768])bert.encoder.layer.2.attention.self.key.weight torch.Size([768, 768])bert.encoder.layer.2.attention.self.key.bias torch.Size([768])bert.encoder.layer.2.attention.self.value.weight torch.Size([768, 768])bert.encoder.layer.2.attention.self.value.bias torch.Size([768])bert.encoder.layer.2.attention.output.dense.weight torch.Size([768, 768])bert.encoder.layer.2.attention.output.dense.bias torch.Size([768])bert.encoder.layer.2.attention.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.2.attention.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.2.intermediate.dense.weight torch.Size([3072, 768])bert.encoder.layer.2.intermediate.dense.bias torch.Size([3072])bert.encoder.layer.2.output.dense.weight torch.Size([768, 3072])bert.encoder.layer.2.output.dense.bias torch.Size([768])bert.encoder.layer.2.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.2.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.3.attention.self.query.weight torch.Size([768, 768])bert.encoder.layer.3.attention.self.query.bias torch.Size([768])bert.encoder.layer.3.attention.self.key.weight torch.Size([768, 768])bert.encoder.layer.3.attention.self.key.bias torch.Size([768])bert.encoder.layer.3.attention.self.value.weight torch.Size([768, 768])bert.encoder.layer.3.attention.self.value.bias torch.Size([768])bert.encoder.layer.3.attention.output.dense.weight torch.Size([768, 768])bert.encoder.layer.3.attention.output.dense.bias torch.Size([768])bert.encoder.layer.3.attention.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.3.attention.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.3.intermediate.dense.weight torch.Size([3072, 768])bert.encoder.layer.3.intermediate.dense.bias torch.Size([3072])bert.encoder.layer.3.output.dense.weight torch.Size([768, 3072])bert.encoder.layer.3.output.dense.bias torch.Size([768])bert.encoder.layer.3.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.3.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.4.attention.self.query.weight torch.Size([768, 768])bert.encoder.layer.4.attention.self.query.bias torch.Size([768])bert.encoder.layer.4.attention.self.key.weight torch.Size([768, 768])bert.encoder.layer.4.attention.self.key.bias torch.Size([768])bert.encoder.layer.4.attention.self.value.weight torch.Size([768, 768])bert.encoder.layer.4.attention.self.value.bias torch.Size([768])bert.encoder.layer.4.attention.output.dense.weight torch.Size([768, 768])bert.encoder.layer.4.attention.output.dense.bias torch.Size([768])bert.encoder.layer.4.attention.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.4.attention.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.4.intermediate.dense.weight torch.Size([3072, 768])bert.encoder.layer.4.intermediate.dense.bias torch.Size([3072])bert.encoder.layer.4.output.dense.weight torch.Size([768, 3072])bert.encoder.layer.4.output.dense.bias torch.Size([768])bert.encoder.layer.4.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.4.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.5.attention.self.query.weight torch.Size([768, 768])bert.encoder.layer.5.attention.self.query.bias torch.Size([768])bert.encoder.layer.5.attention.self.key.weight torch.Size([768, 768])bert.encoder.layer.5.attention.self.key.bias torch.Size([768])bert.encoder.layer.5.attention.self.value.weight torch.Size([768, 768])bert.encoder.layer.5.attention.self.value.bias torch.Size([768])bert.encoder.layer.5.attention.output.dense.weight torch.Size([768, 768])bert.encoder.layer.5.attention.output.dense.bias torch.Size([768])bert.encoder.layer.5.attention.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.5.attention.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.5.intermediate.dense.weight torch.Size([3072, 768])bert.encoder.layer.5.intermediate.dense.bias torch.Size([3072])bert.encoder.layer.5.output.dense.weight torch.Size([768, 3072])bert.encoder.layer.5.output.dense.bias torch.Size([768])bert.encoder.layer.5.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.5.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.6.attention.self.query.weight torch.Size([768, 768])bert.encoder.layer.6.attention.self.query.bias torch.Size([768])bert.encoder.layer.6.attention.self.key.weight torch.Size([768, 768])bert.encoder.layer.6.attention.self.key.bias torch.Size([768])bert.encoder.layer.6.attention.self.value.weight torch.Size([768, 768])bert.encoder.layer.6.attention.self.value.bias torch.Size([768])bert.encoder.layer.6.attention.output.dense.weight torch.Size([768, 768])bert.encoder.layer.6.attention.output.dense.bias torch.Size([768])bert.encoder.layer.6.attention.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.6.attention.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.6.intermediate.dense.weight torch.Size([3072, 768])bert.encoder.layer.6.intermediate.dense.bias torch.Size([3072])bert.encoder.layer.6.output.dense.weight torch.Size([768, 3072])bert.encoder.layer.6.output.dense.bias torch.Size([768])bert.encoder.layer.6.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.6.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.7.attention.self.query.weight torch.Size([768, 768])bert.encoder.layer.7.attention.self.query.bias torch.Size([768])bert.encoder.layer.7.attention.self.key.weight torch.Size([768, 768])bert.encoder.layer.7.attention.self.key.bias torch.Size([768])bert.encoder.layer.7.attention.self.value.weight torch.Size([768, 768])bert.encoder.layer.7.attention.self.value.bias torch.Size([768])bert.encoder.layer.7.attention.output.dense.weight torch.Size([768, 768])bert.encoder.layer.7.attention.output.dense.bias torch.Size([768])bert.encoder.layer.7.attention.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.7.attention.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.7.intermediate.dense.weight torch.Size([3072, 768])bert.encoder.layer.7.intermediate.dense.bias torch.Size([3072])bert.encoder.layer.7.output.dense.weight torch.Size([768, 3072])bert.encoder.layer.7.output.dense.bias torch.Size([768])bert.encoder.layer.7.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.7.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.8.attention.self.query.weight torch.Size([768, 768])bert.encoder.layer.8.attention.self.query.bias torch.Size([768])bert.encoder.layer.8.attention.self.key.weight torch.Size([768, 768])bert.encoder.layer.8.attention.self.key.bias torch.Size([768])bert.encoder.layer.8.attention.self.value.weight torch.Size([768, 768])bert.encoder.layer.8.attention.self.value.bias torch.Size([768])bert.encoder.layer.8.attention.output.dense.weight torch.Size([768, 768])bert.encoder.layer.8.attention.output.dense.bias torch.Size([768])bert.encoder.layer.8.attention.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.8.attention.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.8.intermediate.dense.weight torch.Size([3072, 768])bert.encoder.layer.8.intermediate.dense.bias torch.Size([3072])bert.encoder.layer.8.output.dense.weight torch.Size([768, 3072])bert.encoder.layer.8.output.dense.bias torch.Size([768])bert.encoder.layer.8.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.8.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.9.attention.self.query.weight torch.Size([768, 768])bert.encoder.layer.9.attention.self.query.bias torch.Size([768])bert.encoder.layer.9.attention.self.key.weight torch.Size([768, 768])bert.encoder.layer.9.attention.self.key.bias torch.Size([768])bert.encoder.layer.9.attention.self.value.weight torch.Size([768, 768])bert.encoder.layer.9.attention.self.value.bias torch.Size([768])bert.encoder.layer.9.attention.output.dense.weight torch.Size([768, 768])bert.encoder.layer.9.attention.output.dense.bias torch.Size([768])bert.encoder.layer.9.attention.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.9.attention.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.9.intermediate.dense.weight torch.Size([3072, 768])bert.encoder.layer.9.intermediate.dense.bias torch.Size([3072])bert.encoder.layer.9.output.dense.weight torch.Size([768, 3072])bert.encoder.layer.9.output.dense.bias torch.Size([768])bert.encoder.layer.9.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.9.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.10.attention.self.query.weight torch.Size([768, 768])bert.encoder.layer.10.attention.self.query.bias torch.Size([768])bert.encoder.layer.10.attention.self.key.weight torch.Size([768, 768])bert.encoder.layer.10.attention.self.key.bias torch.Size([768])bert.encoder.layer.10.attention.self.value.weight torch.Size([768, 768])bert.encoder.layer.10.attention.self.value.bias torch.Size([768])bert.encoder.layer.10.attention.output.dense.weight torch.Size([768, 768])bert.encoder.layer.10.attention.output.dense.bias torch.Size([768])bert.encoder.layer.10.attention.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.10.attention.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.10.intermediate.dense.weight torch.Size([3072, 768])bert.encoder.layer.10.intermediate.dense.bias torch.Size([3072])bert.encoder.layer.10.output.dense.weight torch.Size([768, 3072])bert.encoder.layer.10.output.dense.bias torch.Size([768])bert.encoder.layer.10.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.10.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.11.attention.self.query.weight torch.Size([768, 768])bert.encoder.layer.11.attention.self.query.bias torch.Size([768])bert.encoder.layer.11.attention.self.key.weight torch.Size([768, 768])bert.encoder.layer.11.attention.self.key.bias torch.Size([768])bert.encoder.layer.11.attention.self.value.weight torch.Size([768, 768])bert.encoder.layer.11.attention.self.value.bias torch.Size([768])bert.encoder.layer.11.attention.output.dense.weight torch.Size([768, 768])bert.encoder.layer.11.attention.output.dense.bias torch.Size([768])bert.encoder.layer.11.attention.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.11.attention.output.LayerNorm.bias torch.Size([768])bert.encoder.layer.11.intermediate.dense.weight torch.Size([3072, 768])bert.encoder.layer.11.intermediate.dense.bias torch.Size([3072])bert.encoder.layer.11.output.dense.weight torch.Size([768, 3072])bert.encoder.layer.11.output.dense.bias torch.Size([768])bert.encoder.layer.11.output.LayerNorm.weight torch.Size([768])bert.encoder.layer.11.output.LayerNorm.bias torch.Size([768])bert.pooler.dense.weight torch.Size([768, 768])bert.pooler.dense.bias torch.Size([768])classifier.weight torch.Size([7, 768])classifier.bias torch.Size([7])
为了实现numpy的bert模型,踩了两天的坑,一步步对比huggingface源码实现的,真的太难了~~~
这是使用numpy实现的bert代码,分数上和huggingface有稍微的一点点区别,可能是模型太大,保存的模型参数误差累计造成的!
看下面的代码真的有利于直接了解bert模型结构,各种细节简单又到位,自己都服自己,研究这个东西~~~
import numpy as npdef word_embedding(input_ids, word_embeddings): return word_embeddings[input_ids]def position_embedding(position_ids, position_embeddings): return position_embeddings[position_ids]def token_type_embedding(token_type_ids, token_type_embeddings): return token_type_embeddings[token_type_ids]def softmax(x, axis=None): # e_x = np.exp(x).astype(np.float32) # e_x = np.exp(x - np.max(x, axis=axis, keepdims=True)) sum_ex = np.sum(e_x, axis=axis,keepdims=True).astype(np.float32) return e_x / sum_exdef scaled_dot_product_attention(Q, K, V, mask=None): d_k = Q.shape[-1] scores = np.matmul(Q, K.transpose(0, 2, 1)) / np.sqrt(d_k) if mask is not None: scores = np.where(mask, scores, np.full_like(scores, -np.inf)) attention_weights = softmax(scores, axis=-1) # print(attention_weights) # print(np.sum(attention_weights,axis=-1)) output = np.matmul(attention_weights, V) return output, attention_weightsdef multihead_attention(input, num_heads,W_Q,B_Q,W_K,B_K,W_V,B_V,W_O,B_O): q = np.matmul(input, W_Q.T)+B_Q k = np.matmul(input, W_K.T)+B_K v = np.matmul(input, W_V.T)+B_V # 分割输入为多个头 q = np.split(q, num_heads, axis=-1) k = np.split(k, num_heads, axis=-1) v = np.split(v, num_heads, axis=-1) outputs = [] for q_,k_,v_ in zip(q,k,v): output, attention_weights = scaled_dot_product_attention(q_, k_, v_) outputs.append(output) outputs = np.concatenate(outputs, axis=-1) outputs = np.matmul(outputs, W_O.T)+B_O return outputsdef layer_normalization(x, weight, bias, eps=1e-12): mean = np.mean(x, axis=-1, keepdims=True) variance = np.var(x, axis=-1, keepdims=True) std = np.sqrt(variance + eps) normalized_x = (x - mean) / std output = weight * normalized_x + bias return outputdef feed_forward_layer(inputs, weight, bias, activation="relu"): linear_output = np.matmul(inputs,weight) + bias if activation == "relu": activated_output = np.maximum(0, linear_output) # ReLU激活函数 elif activation == "gelu": activated_output = 0.5 * linear_output * (1 + np.tanh(np.sqrt(2 / np.pi) * (linear_output + 0.044715 * np.power(linear_output, 3)))) # GELU激活函数 elif activation == "tanh" : activated_output = np.tanh(linear_output) else: activated_output = linear_output # 无激活函数 return activated_outputdef residual_connection(inputs, residual): # 残差连接 residual_output = inputs + residual return residual_outputdef tokenize_sentence(sentence, vocab_file = "vocab.txt"): with open(vocab_file, "r", encoding="utf-8") as f: vocab = f.readlines() vocab = [i.strip() for i in vocab] # print(len(vocab)) tokenized_sentence = ["[CLS]"] + list(sentence) + ["[SEP]"] # 在句子开头添加[cls] token_ids = [vocab.index(token) for token in tokenized_sentence] return token_ids# 加载保存的模型数据model_data = np.load("bert_model_params.npz")word_embeddings = model_data["bert.embeddings.word_embeddings.weight"]position_embeddings = model_data["bert.embeddings.position_embeddings.weight"]token_type_embeddings = model_data["bert.embeddings.token_type_embeddings.weight"]def model_input(sentence): token_ids = tokenize_sentence(sentence) input_ids = np.array(token_ids) # 输入的词汇id word_embedded = word_embedding(input_ids, word_embeddings) position_ids = np.array(range(len(input_ids))) # 位置id # 位置嵌入矩阵,形状为 (max_position, embedding_size) position_embedded = position_embedding(position_ids, position_embeddings) token_type_ids = np.array([0]*len(input_ids)) # 片段类型id # 片段类型嵌入矩阵,形状为 (num_token_types, embedding_size) token_type_embedded = token_type_embedding(token_type_ids, token_type_embeddings) embedding_output = np.expand_dims(word_embedded + position_embedded + token_type_embedded, axis=0) return embedding_outputdef bert(input,num_heads): ebd_LayerNorm_weight = model_data["bert.embeddings.LayerNorm.weight"] ebd_LayerNorm_bias = model_data["bert.embeddings.LayerNorm.bias"] input = layer_normalization(input,ebd_LayerNorm_weight,ebd_LayerNorm_bias) #这里和模型输出一致 for i in range(12): # 调用多头自注意力函数 W_Q = model_data["bert.encoder.layer.{}.attention.self.query.weight".format(i)] B_Q = model_data["bert.encoder.layer.{}.attention.self.query.bias".format(i)] W_K = model_data["bert.encoder.layer.{}.attention.self.key.weight".format(i)] B_K = model_data["bert.encoder.layer.{}.attention.self.key.bias".format(i)] W_V = model_data["bert.encoder.layer.{}.attention.self.value.weight".format(i)] B_V = model_data["bert.encoder.layer.{}.attention.self.value.bias".format(i)] W_O = model_data["bert.encoder.layer.{}.attention.output.dense.weight".format(i)] B_O = model_data["bert.encoder.layer.{}.attention.output.dense.bias".format(i)] attention_output_LayerNorm_weight = model_data["bert.encoder.layer.{}.attention.output.LayerNorm.weight".format(i)] attention_output_LayerNorm_bias = model_data["bert.encoder.layer.{}.attention.output.LayerNorm.bias".format(i)] intermediate_weight = model_data["bert.encoder.layer.{}.intermediate.dense.weight".format(i)] intermediate_bias = model_data["bert.encoder.layer.{}.intermediate.dense.bias".format(i)] dense_weight = model_data["bert.encoder.layer.{}.output.dense.weight".format(i)] dense_bias = model_data["bert.encoder.layer.{}.output.dense.bias".format(i)] output_LayerNorm_weight = model_data["bert.encoder.layer.{}.output.LayerNorm.weight".format(i)] output_LayerNorm_bias = model_data["bert.encoder.layer.{}.output.LayerNorm.bias".format(i)] output = multihead_attention(input, num_heads,W_Q,B_Q,W_K,B_K,W_V,B_V,W_O,B_O) output = residual_connection(input,output) output1 = layer_normalization(output,attention_output_LayerNorm_weight,attention_output_LayerNorm_bias) #这里和模型输出一致 output = feed_forward_layer(output1, intermediate_weight.T, intermediate_bias, activation="gelu") output = feed_forward_layer(output, dense_weight.T, dense_bias, activation="") output = residual_connection(output1,output) output2 = layer_normalization(output,output_LayerNorm_weight,output_LayerNorm_bias) #一致 input = output2 bert_pooler_dense_weight = model_data["bert.pooler.dense.weight"] bert_pooler_dense_bias = model_data["bert.pooler.dense.bias"] output = feed_forward_layer(output2, bert_pooler_dense_weight.T, bert_pooler_dense_bias, activation="tanh") #一致 return output# for i in model_data:# # print(i)# print(i,model_data[i].shape)id2label = {0: "mainland China politics", 1: "Hong Kong - Macau politics", 2: "International news", 3: "financial news", 4: "culture", 5: "entertainment", 6: "sports"}classifier_weight = model_data["classifier.weight"]classifier_bias = model_data["classifier.bias"]if __name__ == "__main__": sentences = ["马拉松比赛","香港有群众游行示威","党中央决定制定爱国教育法","俄罗斯和欧美对抗","人民币汇率贬值","端午节吃粽子","大妈们跳广场舞"] while True: # 示例用法 for sentence in sentences: # print(model_input(sentence).shape) output = bert(model_input(sentence),num_heads=12) # print(output) output = feed_forward_layer(output[:,0,:], classifier_weight.T, classifier_bias, activation="") # print(output) output = softmax(output,axis=-1) label_id = np.argmax(output,axis=-1) label_score = output[0][label_id] print("sentence:",sentence,"\tlabels:",id2label[label_id[0]],"\tscore:",label_score)
这是hugging face上找的一个别人训练好的模型,roberta模型作新闻7分类,并且保存模型结构为numpy格式,为了上面的代码加载
import numpy as npfrom transformers import AutoModelForSequenceClassification,AutoTokenizer,pipelinemodel = AutoModelForSequenceClassification.from_pretrained("uer/roberta-base-finetuned-chinanews-chinese")tokenizer = AutoTokenizer.from_pretrained("uer/roberta-base-finetuned-chinanews-chinese")text_classification = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)print(text_classification("马拉松决赛"))# print(model)# 打印BERT模型的权重维度for name, param in model.named_parameters(): print(name, param.data.shape)# # # 保存模型参数为NumPy格式model_params = {name: param.data.cpu().numpy() for name, param in model.named_parameters()}np.savez("bert_model_params.npz", **model_params)# model_params
对比两个结果:
hugging face:[{"label": "sports", "score": 0.9929242134094238}]numpy:sports [0.9928773]
关键词:
-
天天观速讯丨我在树莓派上跑通了bert模型,使用numpy实现bert模型,使用hugging face 或pytorch训练模型,保存参数为numpy格式,然
之前分别用numpy实现了mlp,cnn,lstm,这次搞一个大一点的模型bert,
来源: 天天观速讯丨我在树莓派上跑通了bert模型,使用numpy实现bert模型,使用hugging face 或pytorch训练模型,保存参数为numpy格式,然
开源通用高性能的分布式id序列组件
【当前独家】时文选粹摘抄100字_时文选粹 摘抄
2023上半年票房冠军公布:《满江红》 由沈腾主演_天天亮点
千万别搜索“李斯特菌” 搜完我把冰箱里食物扔了|天天实时
3年过去 骁龙888处理器的库存还没清完:三星被逼重发Galaxy S21 FE
k8s安装环境准备:Virtualbox安装CentOS;复制多个CentOS虚拟机
1999元 LG新款27寸显示器开售:2K 165Hz IPS屏
世界聚焦:Kindle真的要拿来盖泡面了!官方店铺正式停运
超越南方!大数据看这一个6月北方有多热:专家科普原因
世界球精选!Java-语法基础
大数据面试题集锦-Hadoop面试题(三)-MapReduce-每日看点
素描鼻子的画法(素描鼻子)
【金融街发布】外汇局:一季度我国外债规模有所回升 结构保持基本稳定
马斯克:决斗可能约在斗兽场 需要锻炼一下我的耐力
世界今头条!湖南遇特大暴雨:市民街头抱团互救 防灾减灾进行时
当前时讯:中国创纪录一箭41星揭秘:印度一箭104星弱爆了
小学毕业典礼多名学生疑因天热晕倒 学校:当天多云 孩子因低血糖出现状况|速看料
3499元 优派新款便携显示器上架:4K OLED屏-环球快报
快讯:科普书单·新书|动物园开饭啦
北方热过南方 高温屡破纪录!大数据看这个6月北方有多热-新动态
月销2.1万辆 大众速腾彻底爆发 反超丰田卡罗拉
焦点要闻:胖东来老板称企业家要活得像人:要对得起自己的人生
夏季用电高峰来袭 你家能不断电 北斗立大功了
18叶双环扇叶:云米电风扇79元起大促(90元大额券)
佳能晒出EOS系列里程碑:累积产量已达1.1亿台
记录--让整个网站界面无滚动条
保利联合:上半年净利润预计盈利5560万元–7522万元
每日机构分析:6月30日
天天微头条丨6月总票房破40亿!国产悬疑片《消失的她》16亿票房第一:你贡献几张票?
称孩子报新闻学就打晕!大学新闻教授称张雪峰的话不无道理
天际汽车停工停产 被发动机厂商起诉“还钱”
女高材生制售救猫药获刑15年 罚金4000万:没有获批 卖了8000多万
Maven高级相关知识:模块,打包方式,依赖继承,聚合,私服搭建-每日报道
热热热!全球多地出现极端高温|天天快播
【金融街发布】人民银行增加支农支小再贷款、再贴现额度2000亿元-全球最资讯
孟羽童离职格力后 董明珠谈接班人:一把手要把企业视为自己生命
只有4节车厢的绿皮火车:一开就是27年!你见过吗? 世界今热点
天天热讯:售价218万 没有机翼也能飞 北美首款飞行汽车正式发布
阿斯巴甜可能致癌!无糖饮料还能放心喝吗?
焦点资讯:小区一保时捷内燃冒烟 保安拿铁锹拍烂前挡风灭火!网友:这物业费值
git rebase合并多个commit
处理开发者账号到期导致APP下架的方处理开发者账号到期导致APP下架的方法
头条焦点:艾隆科技:智慧医疗领域迎来黄金期,公司布局精准,稳步发展,前景可期!|投资者问答精选
趴桌子午睡是怎么毁掉你的身体的?
7399元 壹号本OneXPlayer 2Pro EVA联名限量版今晚开售:8.4寸大屏|今日热闻
全国最大“吃鸡”外挂案宣判:卖挂赚近3千万 两被告获刑4年
苹果带出来得“歪路”要改 手机重回可拆卸电池是必然
胖东来8页报告调查顾客与员工争执:管理人员携礼登门致歉
环球精选!使用Python字符串访问与修改局部变量
在python中实现使用迭代生成器yield减少内存占用的方法
文心一言 VS 讯飞星火 VS chatgpt (52)-- 算法导论6.2 4题_观焦点
使用Python字符串访问与修改局部变量_每日热议
热点在线丨狂收 3.2k star!百度开源压测工具,可模拟几十亿的并发场景,太强悍了!
【津云镜头】特警三支队、金街治安派出所做好爱警暖警工作
布局高端车载PHY芯片,加速推动高速以太网通信芯片国产化——访裕太微首席运营官李晓刚
科创板收盘播报:科创50指数涨0.37% 新股芯动联科较发行价涨75.21%
满满的回忆!赵雅芝吕良伟同台合唱《上海滩》 头条焦点
印度加入美国载人登月计划 被赞苏醒中的巨人:航天将远超俄罗斯
特斯拉天幕变全景烤炉 开车戴帽子冰袖 女子调侃自己变烤乳猪
这就是差距!AMD ROCm开发平台终于要支持RX 7900 XTX
蓉港高铁明天正式开行:成都10小时直达香港
收评:创业板指涨1.6% 复合集流体概念涨幅居前-焦点速讯
证监会公布第三批企业债注册批文 职责划转后核发募资超千亿元
天热要注意!女子连吃两个超甜菠萝肝衰竭 医生称元凶可能是蜡样芽孢杆菌
上市公司回应给员工放暑假:放假情况属实 但个位数员工在岗不实
比盐粒还小的“LV”手提包卖出46.2万:用显微镜才能看清
从此告别毛巾了:芷雨加大洗脸巾13.9元200抽狂促
世界热讯:索尼第一方大作改编!电影《GT赛车:极速狂飙》官宣引进国内
今日讯!交易所协会再提积极服务低碳转型 绿色债券这三年发展有多快?
「隐私计算黑客松」获奖代码全部开源!快去打卡~
当前资讯!干货|三个维度详解 Taier 本地调试原理和实践
通过 ChatGPT 提高你的创造力 焦点报道
天天微速讯:JetBrains系IDE使用Git很慢的问题
第一批实验室培育肉在美获准销售,人造肉市场发展风口何时来?
世界快报:流水线“造人”成真?科学家成功培养首个人类胚胎模型 全球哗然专家释疑
索尼称Xbox太有钱 能花50亿美元做3款独占《COD》
手握309亿利润却不分红 中芯国际:为股东长期利益考虑 滚动
大棕熊“下班”咣咣砸门急得直跳脚 绝不加班:网友直呼像极了打工人的我
盘点C#中感叹号"!"的作用_全球时讯
数码教程资讯:小米11青春版怎么设置24小时 全球聚看点
热热热!全国多地气温超40度 新能源车主如何经受“烤验”_环球头条
热文:炸串店老鼠啃食香肠被拍下 网友:主打一个明目张胆
315曝光假泰国香米公司被吊销:完全香精勾兑 和香米不沾边
何小鹏:和李斌、李想还在一个群里 但跟李斌聊得多些
上新了!小米平板6 Max首曝:12.4寸屏+10000mAh电池_环球关注
什么是 ChatGPT? 环球短讯
js中的解构赋值
Spring事件监听在业务使用中的优化
我国首个万吨级光伏制氢项目投产-环球今亮点
好作品应如实呈现时代的偏见吗?——《漫长的季节》中的女性角色塑造 焦点
【速看料】国产办公软件WPS出现服务故障 官微道歉:正在紧急排查修复
全球实时:红魔电竞平板官宣:首款骁龙8 Gen2+内置风扇平板
《博德之门3》反向跳票!不会为主机版质量向30帧妥协 全球快讯
你退了吗?网易暴雪国服游戏退款申请今天截止:过期视为自动放弃 全球快消息
男生长胖20斤被公司解约 官方回应:如举报会责令整改_当前短讯
每日播报!Linux 中重置数据库的 root 密码的技巧
海外视频直播源码技术视频直播间的搭建
BabylonJS教程_编程入门自学教程_菜鸟教程-免费教程分享|动态焦点
海康威视产品命名规则