掩码语言模型(Masked Language Model)mlm
https://www.cnblogs.com/anai/p/11645953.html
bert 论文
从语言模型到Seq2Seq:Transformer如戏,全靠Mask
https://zhuanlan.zhihu.com/p/69106080
深度学习 — > NLP — >Improving Language Understanding by Generative Pre-Training
https://zhuanlan.zhihu.com/p/44121378
https://zhuanlan.zhihu.com/p/32544778
https://blog.csdn.net/qq_33876194/article/details/98943383
https://zhuanlan.zhihu.com/p/93061413
[# Transformer 源码中 Mask 机制的实现
GPT解读(论文 + TensorFlow实现)
BERT源码分析(PART III)
Bert系列(三)——源码解读之Pre-trainhttps://www.jianshu.com/p/22e462f01d8c
https://www.jianshu.com/p/ff43575ab2b0