Skip to content

Instantly share code, notes, and snippets.

View billpku's full-sized avatar
🌊
Focusing

billpku

🌊
Focusing
View GitHub Profile
@billpku
billpku / NER with BERT in Action- train model
Created July 30, 2019 15:05
NER with BERT in Action
# It's highly recommended to download bert prtrained model first, then save them into local file
# Use the cased verion for better performance
model_file_address = 'data/bert-base-cased'
# Will load config and weight with from_pretrained()
model = BertForTokenClassification.from_pretrained(model_file_address,num_labels=len(tag2idx))
# Since only one sentence, all the segment set to 0
segment_ids = [[0] * len(input_id) for input_id in input_ids]
segment_ids[0];
# Real token mask is 1,pad token(meaning a place holder for the empty space) is 0
attention_masks = [[int(i>0) for i in ii] for ii in input_ids]
attention_masks[0];
from keras.preprocessing.sequence import pad_sequences
# Make text token into id
input_ids = pad_sequences([tokenizer.convert_tokens_to_ids(txt) for txt in tokenized_texts],
maxlen=max_len, dtype="long", truncating="post", padding="post")
print(input_ids[0])