全球报道:使用评价指标工具

手机

消息称谷歌眼镜项目终止，将专注开发“AR 版安卓”平台

怎样开通微信服务_怎样开通飞信|观热点

家电

全球报道:使用评价指标工具

2023-06-27 23:10:41 来源：博客园

(资料图)

评估一个训练好的模型需要评估指标，比如正确率、查准率、查全率、F1值等。当然不同的任务类型有着不同的评估指标，而HuggingFace提供了统一的评价指标工具。

1.列出可用的评价指标通过list_metrics()函数列出可用的评价指标：

deflist_metric_test():#第4章/列出可用的评价指标fromdatasetsimportlist_metricsmetrics_list=list_metrics()print(len(metrics_list),metrics_list[:5])

输出结果如下所示：

157["accuracy","bertscore","bleu","bleurt","brier_score"]

可见目前包含157个评价指标，并且输出了前5个评价指标。

2.加载一个评价指标通过load_metric()加载评价指标，需要说明的是有的评价指标和对应的数据集配套使用，这里以glue数据集的mrpc子集为例：

defload_metric_test():#第4章/加载评价指标fromdatasetsimportload_metricmetric=load_metric(path="accuracy")#加载accuracy指标print(metric)#第4章/加载一个评价指标fromdatasetsimportload_metricmetric=load_metric(path="glue",config_name="mrpc")#加载glue数据集中的mrpc子集print(metric)

3.获取评价指标的使用说明评价指标的inputs_description属性描述了评价指标的使用方法，以及评价指标的使用方法如下所示：

defload_metric_description_test():#第4章/加载一个评价指标fromdatasetsimportload_metricglue_metric=load_metric("glue","mrpc")#加载glue数据集中的mrpc子集print(glue_metric.inputs_description)references=[0,1]predictions=[0,1]results=glue_metric.compute(predictions=predictions,references=references)print(results)#{"accuracy":1.0,"f1":1.0}

输出结果如下所示：

ComputeGLUEevaluationmetricassociatedtoeachGLUEdataset.Args:predictions:listofpredictionstoscore.Eachtranslationshouldbetokenizedintoalistoftokens.references:listoflistsofreferencesforeachtranslation.Eachreferenceshouldbetokenizedintoalistoftokens.Returns:dependingontheGLUEsubset,oneorseveralof:"accuracy":Accuracy"f1":F1score"pearson":PearsonCorrelation"spearmanr":SpearmanCorrelation"matthews_correlation":MatthewCorrelationExamples:>>>glue_metric=datasets.load_metric("glue","sst2")#"sst2"oranyof["mnli","mnli_mismatched","mnli_matched","qnli","rte","wnli","hans"]>>>references=[0,1]>>>predictions=[0,1]>>>results=glue_metric.compute(predictions=predictions,references=references)>>>print(results){"accuracy":1.0}>>>glue_metric=datasets.load_metric("glue","mrpc")#"mrpc"or"qqp">>>references=[0,1]>>>predictions=[0,1]>>>results=glue_metric.compute(predictions=predictions,references=references)>>>print(results){"accuracy":1.0,"f1":1.0}>>>glue_metric=datasets.load_metric("glue","stsb")>>>references=[0.,1.,2.,3.,4.,5.]>>>predictions=[0.,1.,2.,3.,4.,5.]>>>results=glue_metric.compute(predictions=predictions,references=references)>>>print({"pearson":round(results["pearson"],2),"spearmanr":round(results["spearmanr"],2)}){"pearson":1.0,"spearmanr":1.0}>>>glue_metric=datasets.load_metric("glue","cola")>>>references=[0,1]>>>predictions=[0,1]>>>results=glue_metric.compute(predictions=predictions,references=references)>>>print(results){"matthews_correlation":1.0}{"accuracy":1.0,"f1":1.0}

首先描述了评价指标的使用方法，然后计算评价指标accuracy和f1。

关键词：