You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[MobileBERT](https://huggingface.co/papers/2004.02984) is a lightweight and efficient variant of BERT, specifically designed for resource-limited devices such as mobile phones. It retains BERT's architecture but significantly reduces model size and inference latency while maintaining strong performance on NLP tasks. MobileBERT achieves this through a bottleneck structure and carefully balanced self-attention and feedforward networks. The model is trained by knowledge transfer from a large BERT model with an inverted bottleneck structure.
28
+
29
+
You can find the original MobileBERT checkpoint under the [Google](https://huggingface.co/google/mobilebert-uncased) organization.
30
+
> [!TIP]
31
+
> Click on the MobileBERT models in the right sidebar for more examples of how to apply MobileBERT to different language tasks.
32
+
33
+
The example below demonstrates how to predict the `[MASK]` token with [`Pipeline`], [`AutoModel`], and from the command line.
34
+
35
+
<hfoptionsid="usage">
36
+
<hfoptionid="Pipeline">
37
+
38
+
```py
39
+
import torch
40
+
from transformers import pipeline
41
+
42
+
pipeline = pipeline(
43
+
task="fill-mask",
44
+
model="google/mobilebert-uncased",
45
+
torch_dtype=torch.float16,
46
+
device=0
47
+
)
48
+
pipeline("The capital of France is [MASK].")
49
+
```
50
+
</hfoption>
51
+
<hfoptionid="AutoModel">
52
+
53
+
```py
54
+
import torch
55
+
from transformers import AutoModelForMaskedLM, AutoTokenizer
56
+
57
+
tokenizer = AutoTokenizer.from_pretrained(
58
+
"google/mobilebert-uncased",
59
+
)
60
+
model = AutoModelForMaskedLM.from_pretrained(
61
+
"google/mobilebert-uncased",
62
+
torch_dtype=torch.float16,
63
+
device_map="auto",
64
+
)
65
+
inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt").to("cuda")
25
66
26
-
The MobileBERT model was proposed in [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny
27
-
Zhou. It's a bidirectional transformer based on the BERT model, which is compressed and accelerated using several
*Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds
33
-
of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot
34
-
be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing and accelerating
35
-
the popular BERT model. Like the original BERT, MobileBERT is task-agnostic, that is, it can be generically applied to
36
-
various downstream NLP tasks via simple fine-tuning. Basically, MobileBERT is a thin version of BERT_LARGE, while
37
-
equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks.
38
-
To train MobileBERT, we first train a specially designed teacher model, an inverted-bottleneck incorporated BERT_LARGE
39
-
model. Then, we conduct knowledge transfer from this teacher to MobileBERT. Empirical studies show that MobileBERT is
40
-
4.3x smaller and 5.5x faster than BERT_BASE while achieving competitive results on well-known benchmarks. On the
41
-
natural language inference tasks of GLUE, MobileBERT achieves a GLUEscore o 77.7 (0.6 lower than BERT_BASE), and 62 ms
42
-
latency on a Pixel 4 phone. On the SQuAD v1.1/v2.0 question answering task, MobileBERT achieves a dev F1 score of
43
-
90.0/79.2 (1.5/2.1 higher than BERT_BASE).*
75
+
print(f"The predicted token is: {predicted_token}")
76
+
```
44
77
45
-
This model was contributed by [vshampor](https://huggingface.co/vshampor). The original code can be found [here](https://github.com/google-research/google-research/tree/master/mobilebert).
78
+
</hfoption>
79
+
<hfoptionid="transformers-cli">
46
80
47
-
## Usage tips
81
+
```bash
82
+
echo -e "The capital of France is [MASK]."| transformers-cli run --task fill-mask --model google/mobilebert-uncased --device 0
83
+
```
48
84
49
-
- MobileBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather
50
-
than the left.
51
-
- MobileBERT is similar to BERT and therefore relies on the masked language modeling (MLM) objective. It is therefore
52
-
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained
53
-
with a causal language modeling (CLM) objective are better in that regard.
0 commit comments