billpku

🌊

Focusing

Focus on deep learning and NLP, develop with Java, Python, LangChain, Tensorflow and Pytorch

billpku / NER with BERT in Action- train model

Created July 30, 2019 15:05

NER with BERT in Action

	# It's highly recommended to download bert prtrained model first, then save them into local file
	# Use the cased verion for better performance
	model_file_address = 'data/bert-base-cased'

	# Will load config and weight with from_pretrained()
	model = BertForTokenClassification.from_pretrained(model_file_address,num_labels=len(tag2idx))

billpku / NER with BERT in Action- set embedding3

Created July 30, 2019 15:02

NER with BERT in Action

	# Since only one sentence, all the segment set to 0
	segment_ids = [[0] * len(input_id) for input_id in input_ids]
	segment_ids[0];

billpku / NER with BERT in Action- set embedding2

Created July 30, 2019 15:00

NER with BERT in Action

	# Real token mask is 1,pad token(meaning a place holder for the empty space) is 0
	attention_masks = [[int(i>0) for i in ii] for ii in input_ids]
	attention_masks[0];

billpku / NER with BERT in Action- set embedding

Created July 30, 2019 14:59

NER with BERT in Action

	from keras.preprocessing.sequence import pad_sequences

	# Make text token into id
	input_ids = pad_sequences([tokenizer.convert_tokens_to_ids(txt) for txt in tokenized_texts],
	maxlen=max_len, dtype="long", truncating="post", padding="post")
	print(input_ids[0])