Skip to content

Commit bb9a155

Browse files
2023-04-20-distilbert_base_uncased_mnli_en (#13761)
* Add model 2023-04-20-distilbert_base_uncased_mnli_en * Add model 2023-04-20-distilbert_base_turkish_cased_allnli_tr * Add model 2023-04-20-distilbert_base_turkish_cased_snli_tr * Add model 2023-04-20-distilbert_base_turkish_cased_multinli_tr * Update and rename 2023-04-20-distilbert_base_turkish_cased_allnli_tr.md to 2023-04-20-distilbert_base_zero_shot_classifier_turkish_cased_allnli_tr.md * Update and rename 2023-04-20-distilbert_base_turkish_cased_multinli_tr.md to 2023-04-20-distilbert_base_zero_shot_classifier_turkish_cased_multinli_tr.md * Update and rename 2023-04-20-distilbert_base_turkish_cased_snli_tr.md to 2023-04-20-distilbert_base_zero_shot_classifier_turkish_cased_snli_tr.md * Update and rename 2023-04-20-distilbert_base_uncased_mnli_en.md to distilbert_base_zero_shot_classifier_turkish_cased_snli * Rename distilbert_base_zero_shot_classifier_turkish_cased_snli to distilbert_base_zero_shot_classifier_turkish_cased_snli_en.md * Update 2023-04-20-distilbert_base_zero_shot_classifier_turkish_cased_snli_tr.md * Update 2023-04-20-distilbert_base_zero_shot_classifier_turkish_cased_multinli_tr.md * Update 2023-04-20-distilbert_base_zero_shot_classifier_turkish_cased_allnli_tr.md --------- Co-authored-by: ahmedlone127 <[email protected]>
1 parent afb700e commit bb9a155

4 files changed

+429
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
layout: model
3+
title: DistilBERTZero-Shot Classification Base - distilbert_base_zero_shot_classifier_turkish_cased_allnli
4+
author: John Snow Labs
5+
name: distilbert_base_zero_shot_classifier_turkish_cased_allnli
6+
date: 2023-04-20
7+
tags: [zero_shot, distilbert, base, tr, turkish, cased, open_source, tensorflow]
8+
task: Zero-Shot Classification
9+
language: tr
10+
edition: Spark NLP 4.4.1
11+
spark_version: [3.2, 3.0]
12+
supported: true
13+
engine: tensorflow
14+
annotator: DistilBertForZeroShotClassification
15+
article_header:
16+
type: cover
17+
use_language_switcher: "Python-Scala-Java"
18+
---
19+
20+
## Description
21+
22+
This model is intended to be used for zero-shot text classification, especially in Trukish. It is fine-tuned on MNLI by using DistilBERT Base Uncased model.
23+
24+
DistilBertForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Equivalent of DistilBertForSequenceClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible.
25+
26+
We used TFDistilBertForSequenceClassification to train this model and used DistilBertForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale!
27+
28+
## Predicted Entities
29+
30+
31+
32+
{:.btn-box}
33+
<button class="button button-orange" disabled>Live Demo</button>
34+
<button class="button button-orange" disabled>Open in Colab</button>
35+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_zero_shot_classifier_turkish_cased_allnli_4.4.1_3.2_1681950583033.zip){:.button.button-orange}
36+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_zero_shot_classifier_turkish_cased_allnli_tr_4.4.1_3.2_1681950583033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
37+
38+
## How to use
39+
40+
41+
42+
<div class="tabs-box" markdown="1">
43+
{% include programmingLanguageSelectScalaPythonNLU.html %}
44+
```python
45+
document_assembler = DocumentAssembler() \
46+
.setInputCol('text') \
47+
.setOutputCol('document')
48+
49+
tokenizer = Tokenizer() \
50+
.setInputCols(['document']) \
51+
.setOutputCol('token')
52+
53+
zeroShotClassifier = DistilBertForZeroShotClassification \
54+
.pretrained('distilbert_base_zero_shot_classifier_turkish_cased_allnli', 'en') \
55+
.setInputCols(['token', 'document']) \
56+
.setOutputCol('class') \
57+
.setCaseSensitive(True) \
58+
.setMaxSentenceLength(512) \
59+
.setCandidateLabels(["olumsuz", "olumlu"])
60+
61+
pipeline = Pipeline(stages=[
62+
document_assembler,
63+
tokenizer,
64+
zeroShotClassifier
65+
])
66+
67+
example = spark.createDataFrame([['Senaryo çok saçmaydı, beğendim diyemem.']]).toDF("text")
68+
result = pipeline.fit(example).transform(example)
69+
```
70+
```scala
71+
val document_assembler = DocumentAssembler()
72+
.setInputCol("text")
73+
.setOutputCol("document")
74+
75+
val tokenizer = Tokenizer()
76+
.setInputCols("document")
77+
.setOutputCol("token")
78+
79+
val zeroShotClassifier = DistilBertForZeroShotClassification.pretrained("distilbert_base_zero_shot_classifier_turkish_cased_allnli", "en")
80+
.setInputCols("document", "token")
81+
.setOutputCol("class")
82+
.setCaseSensitive(true)
83+
.setMaxSentenceLength(512)
84+
.setCandidateLabels(Array("olumsuz", "olumlu"))
85+
86+
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, zeroShotClassifier))
87+
88+
val example = Seq("Senaryo çok saçmaydı, beğendim diyemem.").toDS.toDF("text")
89+
90+
val result = pipeline.fit(example).transform(example)
91+
```
92+
</div>
93+
94+
{:.model-param}
95+
## Model Information
96+
97+
{:.table-model}
98+
|---|---|
99+
|Model Name:|distilbert_base_zero_shot_classifier_turkish_cased_allnli|
100+
|Compatibility:|Spark NLP 4.4.1+|
101+
|License:|Open Source|
102+
|Edition:|Official|
103+
|Input Labels:|[token, document]|
104+
|Output Labels:|[multi_class]|
105+
|Language:|tr|
106+
|Size:|254.3 MB|
107+
|Case sensitive:|true|
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
---
2+
layout: model
3+
title: DistilBERTZero-Shot Classification Base - distilbert_base_zero_shot_classifier_turkish_cased_multinli
4+
author: John Snow Labs
5+
name: distilbert_base_zero_shot_classifier_turkish_cased_multinli
6+
date: 2023-04-20
7+
tags: [zero_shot, tr, turkish, distilbert, base, cased, open_source, tensorflow]
8+
task: Zero-Shot Classification
9+
language: tr
10+
edition: Spark NLP 4.4.1
11+
spark_version: [3.2, 3.0]
12+
supported: true
13+
engine: tensorflow
14+
annotator: DistilBertForZeroShotClassification
15+
article_header:
16+
type: cover
17+
use_language_switcher: "Python-Scala-Java"
18+
---
19+
20+
## Description
21+
22+
This model is intended to be used for zero-shot text classification, especially in Trukish. It is fine-tuned on MNLI by using DistilBERT Base Uncased model.
23+
24+
DistilBertForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Equivalent of DistilBertForSequenceClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible.
25+
26+
We used TFDistilBertForSequenceClassification to train this model and used DistilBertForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale!
27+
28+
## Predicted Entities
29+
30+
31+
32+
{:.btn-box}
33+
<button class="button button-orange" disabled>Live Demo</button>
34+
<button class="button button-orange" disabled>Open in Colab</button>
35+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_zero_shot_classifier_turkish_cased_multinli_tr_4.4.1_3.2_1681952299918.zip){:.button.button-orange}
36+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_zero_shot_classifier_turkish_cased_multinli_tr_4.4.1_3.2_1681952299918.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
37+
38+
## How to use
39+
40+
41+
42+
<div class="tabs-box" markdown="1">
43+
{% include programmingLanguageSelectScalaPythonNLU.html %}
44+
```python
45+
document_assembler = DocumentAssembler() \
46+
.setInputCol('text') \
47+
.setOutputCol('document')
48+
49+
tokenizer = Tokenizer() \
50+
.setInputCols(['document']) \
51+
.setOutputCol('token')
52+
53+
zeroShotClassifier = DistilBertForZeroShotClassification \
54+
.pretrained('distilbert_base_zero_shot_classifier_turkish_cased_multinli', 'en') \
55+
.setInputCols(['token', 'document']) \
56+
.setOutputCol('class') \
57+
.setCaseSensitive(True) \
58+
.setMaxSentenceLength(512) \
59+
.setCandidateLabels(["ekonomi", "siyaset","spor"])
60+
61+
pipeline = Pipeline(stages=[
62+
document_assembler,
63+
tokenizer,
64+
zeroShotClassifier
65+
])
66+
67+
example = spark.createDataFrame([['Dolar yükselmeye devam ediyor.']]).toDF("text")
68+
result = pipeline.fit(example).transform(example)
69+
70+
```
71+
```scala
72+
val document_assembler = DocumentAssembler()
73+
.setInputCol("text")
74+
.setOutputCol("document")
75+
76+
val tokenizer = Tokenizer()
77+
.setInputCols("document")
78+
.setOutputCol("token")
79+
80+
val zeroShotClassifier = DistilBertForZeroShotClassification.pretrained("distilbert_base_zero_shot_classifier_turkish_cased_multinli", "en")
81+
.setInputCols("document", "token")
82+
.setOutputCol("class")
83+
.setCaseSensitive(true)
84+
.setMaxSentenceLength(512)
85+
.setCandidateLabels(Array("ekonomi", "siyaset","spor"))
86+
87+
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, zeroShotClassifier))
88+
89+
val example = Seq("Dolar yükselmeye devam ediyor.").toDS.toDF("text")
90+
91+
val result = pipeline.fit(example).transform(example)
92+
```
93+
</div>
94+
95+
{:.model-param}
96+
## Model Information
97+
98+
{:.table-model}
99+
|---|---|
100+
|Model Name:|distilbert_base_zero_shot_classifier_turkish_cased_multinli|
101+
|Compatibility:|Spark NLP 4.4.1+|
102+
|License:|Open Source|
103+
|Edition:|Official|
104+
|Input Labels:|[token, document]|
105+
|Output Labels:|[multi_class]|
106+
|Language:|tr|
107+
|Size:|254.3 MB|
108+
|Case sensitive:|true|
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
layout: model
3+
title: DistilBERTZero-Shot Classification Base - distilbert_base_zero_shot_classifier_turkish_cased_snli
4+
author: John Snow Labs
5+
name: distilbert_base_zero_shot_classifier_turkish_cased_snli
6+
date: 2023-04-20
7+
tags: [zero_shot, tr, turkish, distilbert, base, cased, open_source, tensorflow]
8+
task: Zero-Shot Classification
9+
language: tr
10+
edition: Spark NLP 4.4.1
11+
spark_version: [3.2, 3.0]
12+
supported: true
13+
engine: tensorflow
14+
annotator: DistilBertForZeroShotClassification
15+
article_header:
16+
type: cover
17+
use_language_switcher: "Python-Scala-Java"
18+
---
19+
20+
## Description
21+
22+
This model is intended to be used for zero-shot text classification, especially in Trukish. It is fine-tuned on MNLI by using DistilBERT Base Uncased model.
23+
24+
DistilBertForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Equivalent of DistilBertForSequenceClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible.
25+
26+
We used TFDistilBertForSequenceClassification to train this model and used DistilBertForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale!
27+
28+
## Predicted Entities
29+
30+
31+
32+
{:.btn-box}
33+
<button class="button button-orange" disabled>Live Demo</button>
34+
<button class="button button-orange" disabled>Open in Colab</button>
35+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_zero_shot_classifier_turkish_cased_snli_tr_4.4.1_3.2_1681951486863.zip){:.button.button-orange}
36+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_zero_shot_classifier_turkish_cased_snli_tr_4.4.1_3.2_1681951486863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
37+
38+
## How to use
39+
40+
41+
42+
<div class="tabs-box" markdown="1">
43+
{% include programmingLanguageSelectScalaPythonNLU.html %}
44+
```python
45+
document_assembler = DocumentAssembler() \
46+
.setInputCol('text') \
47+
.setOutputCol('document')
48+
49+
tokenizer = Tokenizer() \
50+
.setInputCols(['document']) \
51+
.setOutputCol('token')
52+
53+
zeroShotClassifier = DistilBertForZeroShotClassification \
54+
.pretrained('distilbert_base_zero_shot_classifier_turkish_cased_snli', 'en') \
55+
.setInputCols(['token', 'document']) \
56+
.setOutputCol('class') \
57+
.setCaseSensitive(True) \
58+
.setMaxSentenceLength(512) \
59+
.setCandidateLabels(["olumsuz", "olumlu"])
60+
61+
pipeline = Pipeline(stages=[
62+
document_assembler,
63+
tokenizer,
64+
zeroShotClassifier
65+
])
66+
67+
example = spark.createDataFrame([['Senaryo çok saçmaydı, beğendim diyemem.']]).toDF("text")
68+
result = pipeline.fit(example).transform(example)
69+
```
70+
```scala
71+
val document_assembler = DocumentAssembler()
72+
.setInputCol("text")
73+
.setOutputCol("document")
74+
75+
val tokenizer = Tokenizer()
76+
.setInputCols("document")
77+
.setOutputCol("token")
78+
79+
val zeroShotClassifier = DistilBertForZeroShotClassification.pretrained("distilbert_base_zero_shot_classifier_turkish_cased_snli", "en")
80+
.setInputCols("document", "token")
81+
.setOutputCol("class")
82+
.setCaseSensitive(true)
83+
.setMaxSentenceLength(512)
84+
.setCandidateLabels(Array("olumsuz", "olumlu"))
85+
86+
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, zeroShotClassifier))
87+
88+
val example = Seq("Senaryo çok saçmaydı, beğendim diyemem.").toDS.toDF("text")
89+
90+
val result = pipeline.fit(example).transform(example)
91+
```
92+
</div>
93+
94+
{:.model-param}
95+
## Model Information
96+
97+
{:.table-model}
98+
|---|---|
99+
|Model Name:|distilbert_base_zero_shot_classifier_turkish_cased_snli|
100+
|Compatibility:|Spark NLP 4.4.1+|
101+
|License:|Open Source|
102+
|Edition:|Official|
103+
|Input Labels:|[token, document]|
104+
|Output Labels:|[multi_class]|
105+
|Language:|tr|
106+
|Size:|254.3 MB|
107+
|Case sensitive:|true|

0 commit comments

Comments
 (0)