bert入门

官网地址

https://github.com/hanxiao/bert-as-service

进入conda环境，比如source activate py36_test

1.服务器安装

pip install tensorflow==1.15
pip install bert-serving-server # server
pip install bert-serving-client # client, independent of `bert-serving-server`

#开始服务

bert-serving-start -model_dir /data/chengzhong/chinese_L-12_H-768_A-12 -num_worker=4 -http_port 8125

2.客户端

pip install bert-serving-client

from bert_serving.client import BertClient
bc = BertClient('172.xx.xx.xx')
bc.encode(['First do it', 'then do it right', 'then do it better'])
print( bc.fetch())
bc.close()

开始服务时因为开了8125的http端口，所以也可用http请求

curl -X POST http://172.19.80.61:8125/encode \
  -H 'content-type: application/json' \
  -d '{"id": 123,"texts": ["hello world"], "is_tokenized": false}'

3.向量的持久化与请求接口

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Han Xiao <artex.xh@gmail.com> <https://hanxiao.github.io>

# NOTE: First install bert-as-service via
# $
# $ pip install bert-serving-server
# $ pip install bert-serving-client
# $

# simple similarity search on FAQ

import numpy as np
from bert_serving.client import BertClient
from termcolor import colored

prefix_q = '##### **Q:** '
topk = 5

with open('README.md') as fp:
    questions = [v.replace(prefix_q, '').strip() for v in fp if v.strip() and v.startswith(prefix_q)]
    print('%d questions loaded, avg. len of %d' % (len(questions), np.mean([len(d.split()) for d in questions])))

f = open('vec.txt', 'w')

with BertClient('172.19.80.61') as bc:
    doc_vecs = bc.encode(questions)
    print(doc_vecs[0])

    export_data = ''
    for i in range(len(questions)):
        export_data += str(questions[i]) + '\t'
        export_data += str(doc_vecs[i]) + '\n'
    f.write(export_data)
    f.close()

    while True:
        query = input(colored('your question: ', 'green'))
        query_vec = bc.encode([query])[0]
        #save()
        # compute normalized dot product as score
        score = np.sum(query_vec * doc_vecs, axis=1) / np.linalg.norm(doc_vecs, axis=1)
        topk_idx = np.argsort(score)[::-1][:topk]
        print('top %d questions similar to "%s"' % (topk, colored(query, 'green')))
        for idx in topk_idx:
            print('> %s\t%s' % (colored('%.1f' % score[idx], 'cyan'), colored(questions[idx], 'yellow')))

文/程忠浏览次数：0次 2021-12-06 16:01:41

bert入门

相关阅读