|
| 1 | +--- |
| 2 | +layout: model |
| 3 | +title: DistilBERTZero-Shot Classification Base - distilbert_base_zero_shot_classifier_turkish_cased_multinli |
| 4 | +author: John Snow Labs |
| 5 | +name: distilbert_base_zero_shot_classifier_turkish_cased_multinli |
| 6 | +date: 2023-04-20 |
| 7 | +tags: [zero_shot, tr, turkish, distilbert, base, cased, open_source, tensorflow] |
| 8 | +task: Zero-Shot Classification |
| 9 | +language: tr |
| 10 | +edition: Spark NLP 4.4.1 |
| 11 | +spark_version: [3.2, 3.0] |
| 12 | +supported: true |
| 13 | +engine: tensorflow |
| 14 | +annotator: DistilBertForZeroShotClassification |
| 15 | +article_header: |
| 16 | + type: cover |
| 17 | +use_language_switcher: "Python-Scala-Java" |
| 18 | +--- |
| 19 | + |
| 20 | +## Description |
| 21 | + |
| 22 | +This model is intended to be used for zero-shot text classification, especially in Trukish. It is fine-tuned on MNLI by using DistilBERT Base Uncased model. |
| 23 | + |
| 24 | +DistilBertForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks. Equivalent of DistilBertForSequenceClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible. |
| 25 | + |
| 26 | +We used TFDistilBertForSequenceClassification to train this model and used DistilBertForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale! |
| 27 | + |
| 28 | +## Predicted Entities |
| 29 | + |
| 30 | + |
| 31 | + |
| 32 | +{:.btn-box} |
| 33 | +<button class="button button-orange" disabled>Live Demo</button> |
| 34 | +<button class="button button-orange" disabled>Open in Colab</button> |
| 35 | +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_zero_shot_classifier_turkish_cased_multinli_tr_4.4.1_3.2_1681952299918.zip){:.button.button-orange} |
| 36 | +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_zero_shot_classifier_turkish_cased_multinli_tr_4.4.1_3.2_1681952299918.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} |
| 37 | + |
| 38 | +## How to use |
| 39 | + |
| 40 | + |
| 41 | + |
| 42 | +<div class="tabs-box" markdown="1"> |
| 43 | +{% include programmingLanguageSelectScalaPythonNLU.html %} |
| 44 | +```python |
| 45 | +document_assembler = DocumentAssembler() \ |
| 46 | +.setInputCol('text') \ |
| 47 | +.setOutputCol('document') |
| 48 | + |
| 49 | +tokenizer = Tokenizer() \ |
| 50 | +.setInputCols(['document']) \ |
| 51 | +.setOutputCol('token') |
| 52 | + |
| 53 | +zeroShotClassifier = DistilBertForZeroShotClassification \ |
| 54 | +.pretrained('distilbert_base_zero_shot_classifier_turkish_cased_multinli', 'en') \ |
| 55 | +.setInputCols(['token', 'document']) \ |
| 56 | +.setOutputCol('class') \ |
| 57 | +.setCaseSensitive(True) \ |
| 58 | +.setMaxSentenceLength(512) \ |
| 59 | +.setCandidateLabels(["ekonomi", "siyaset","spor"]) |
| 60 | + |
| 61 | +pipeline = Pipeline(stages=[ |
| 62 | +document_assembler, |
| 63 | +tokenizer, |
| 64 | +zeroShotClassifier |
| 65 | +]) |
| 66 | + |
| 67 | +example = spark.createDataFrame([['Dolar yükselmeye devam ediyor.']]).toDF("text") |
| 68 | +result = pipeline.fit(example).transform(example) |
| 69 | + |
| 70 | +``` |
| 71 | +```scala |
| 72 | +val document_assembler = DocumentAssembler() |
| 73 | +.setInputCol("text") |
| 74 | +.setOutputCol("document") |
| 75 | + |
| 76 | +val tokenizer = Tokenizer() |
| 77 | +.setInputCols("document") |
| 78 | +.setOutputCol("token") |
| 79 | + |
| 80 | +val zeroShotClassifier = DistilBertForZeroShotClassification.pretrained("distilbert_base_zero_shot_classifier_turkish_cased_multinli", "en") |
| 81 | +.setInputCols("document", "token") |
| 82 | +.setOutputCol("class") |
| 83 | +.setCaseSensitive(true) |
| 84 | +.setMaxSentenceLength(512) |
| 85 | +.setCandidateLabels(Array("ekonomi", "siyaset","spor")) |
| 86 | + |
| 87 | +val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, zeroShotClassifier)) |
| 88 | + |
| 89 | +val example = Seq("Dolar yükselmeye devam ediyor.").toDS.toDF("text") |
| 90 | + |
| 91 | +val result = pipeline.fit(example).transform(example) |
| 92 | +``` |
| 93 | +</div> |
| 94 | + |
| 95 | +{:.model-param} |
| 96 | +## Model Information |
| 97 | + |
| 98 | +{:.table-model} |
| 99 | +|---|---| |
| 100 | +|Model Name:|distilbert_base_zero_shot_classifier_turkish_cased_multinli| |
| 101 | +|Compatibility:|Spark NLP 4.4.1+| |
| 102 | +|License:|Open Source| |
| 103 | +|Edition:|Official| |
| 104 | +|Input Labels:|[token, document]| |
| 105 | +|Output Labels:|[multi_class]| |
| 106 | +|Language:|tr| |
| 107 | +|Size:|254.3 MB| |
| 108 | +|Case sensitive:|true| |
0 commit comments