bert入门
官网地址
https://github.com/hanxiao/bert-as-service
进入conda环境,比如source activate py36_test
1.服务器安装
pip install tensorflow==1.15
pip install bert-serving-server # server
pip install bert-serving-client # client, independent of `bert-serving-server`
#开始服务
bert-serving-start -model_dir /data/chengzhong/chinese_L-12_H-768_A-12 -num_worker=4 -http_port 8125
2.客户端
pip install bert-serving-client
from bert_serving.client import BertClient bc = BertClient('172.xx.xx.xx') bc.encode(['First do it', 'then do it right', 'then do it better']) print( bc.fetch()) bc.close()
开始服务时因为开了8125的http端口,所以也可用http请求
curl -X POST http://172.19.80.61:8125/encode \ -H 'content-type: application/json' \ -d '{"id": 123,"texts": ["hello world"], "is_tokenized": false}'
3.向量的持久化与请求接口
#!/usr/bin/env python # -*- coding: utf-8 -*- # Han Xiao <artex.xh@gmail.com> <https://hanxiao.github.io> # NOTE: First install bert-as-service via # $ # $ pip install bert-serving-server # $ pip install bert-serving-client # $ # simple similarity search on FAQ import numpy as np from bert_serving.client import BertClient from termcolor import colored prefix_q = '##### **Q:** ' topk = 5 with open('README.md') as fp: questions = [v.replace(prefix_q, '').strip() for v in fp if v.strip() and v.startswith(prefix_q)] print('%d questions loaded, avg. len of %d' % (len(questions), np.mean([len(d.split()) for d in questions]))) f = open('vec.txt', 'w') with BertClient('172.19.80.61') as bc: doc_vecs = bc.encode(questions) print(doc_vecs[0]) export_data = '' for i in range(len(questions)): export_data += str(questions[i]) + '\t' export_data += str(doc_vecs[i]) + '\n' f.write(export_data) f.close() while True: query = input(colored('your question: ', 'green')) query_vec = bc.encode([query])[0] #save() # compute normalized dot product as score score = np.sum(query_vec * doc_vecs, axis=1) / np.linalg.norm(doc_vecs, axis=1) topk_idx = np.argsort(score)[::-1][:topk] print('top %d questions similar to "%s"' % (topk, colored(query, 'green'))) for idx in topk_idx: print('> %s\t%s' % (colored('%.1f' % score[idx], 'cyan'), colored(questions[idx], 'yellow')))
相关阅读
评论:
↓ 广告开始-头部带绿为生活 ↓
↑ 广告结束-尾部支持多点击 ↑