Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models
Paper • 2303.10893 • Published
How to use EricLiang98/MigBERT-base with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("fill-mask", model="EricLiang98/MigBERT-base") # Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("EricLiang98/MigBERT-base")
model = AutoModelForMaskedLM.from_pretrained("EricLiang98/MigBERT-base")Please use 'XLMRoberta' related functions to load this model!
https://github.com/xnliang98/MigBERT
如果你觉得我们的工作对你有用,请在您的工作中引用我们的文章。
If you find our resource or paper is useful, please consider including the following citation in your paper.
@misc{liang2023character,
title={Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models},
author={Xinnian Liang and Zefan Zhou and Hui Huang and Shuangzhi Wu and Tong Xiao and Muyun Yang and Zhoujun Li and Chao Bian},
year={2023},
eprint={2303.10893},
archivePrefix={arXiv},
primaryClass={cs.CL}
}