diff --git a/docs/_posts/ahmedlone127/2024-09-04-burmese_awesome_wnut_model_sgonzalezsilot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-04-burmese_awesome_wnut_model_sgonzalezsilot_pipeline_en.md new file mode 100644 index 00000000000000..4fac412ca9f8de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-04-burmese_awesome_wnut_model_sgonzalezsilot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_sgonzalezsilot_pipeline pipeline DistilBertForTokenClassification from sgonzalezsilot +author: John Snow Labs +name: burmese_awesome_wnut_model_sgonzalezsilot_pipeline +date: 2024-09-04 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_sgonzalezsilot_pipeline` is a English model originally trained by sgonzalezsilot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_sgonzalezsilot_pipeline_en_5.5.0_3.0_1725493021938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_sgonzalezsilot_pipeline_en_5.5.0_3.0_1725493021938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_model_sgonzalezsilot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_model_sgonzalezsilot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_sgonzalezsilot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/sgonzalezsilot/my_awesome_wnut_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-06-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline_en.md new file mode 100644 index 00000000000000..05ab6a6f3648d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline pipeline MarianTransformer from MicMer17 +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline +date: 2024-09-06 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline` is a English model originally trained by MicMer17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline_en_5.5.0_3.0_1725635684279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline_en_5.5.0_3.0_1725635684279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_german_finetuned_english_tonga_tonga_islands_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.1 MB| + +## References + +https://huggingface.co/MicMer17/opus-mt-en-ro-finetuned-en-to-de-finetuned-en-to-de + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-xlm_roberta_base_finetuned_emotion_37_labels_en.md b/docs/_posts/ahmedlone127/2024-09-06-xlm_roberta_base_finetuned_emotion_37_labels_en.md new file mode 100644 index 00000000000000..0fc405696689b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-xlm_roberta_base_finetuned_emotion_37_labels_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_emotion_37_labels XlmRoBertaForSequenceClassification from upsalite +author: John Snow Labs +name: xlm_roberta_base_finetuned_emotion_37_labels +date: 2024-09-06 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_emotion_37_labels` is a English model originally trained by upsalite. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_emotion_37_labels_en_5.5.0_3.0_1725617257200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_emotion_37_labels_en_5.5.0_3.0_1725617257200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_emotion_37_labels","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_emotion_37_labels", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_emotion_37_labels| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|877.5 MB| + +## References + +https://huggingface.co/upsalite/xlm-roberta-base-finetuned-emotion-37-labels \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_model_pavement_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_model_pavement_en.md new file mode 100644 index 00000000000000..7ae54f250576c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_model_pavement_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_pavement DistilBertForTokenClassification from pavement +author: John Snow Labs +name: burmese_awesome_wnut_model_pavement +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_pavement` is a English model originally trained by pavement. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_pavement_en_5.5.0_3.0_1725730756627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_pavement_en_5.5.0_3.0_1725730756627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_pavement","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_pavement", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_pavement| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/pavement/my_awesome_wnut_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-distilbert_base_uncased_finetuned_squad_soikit_en.md b/docs/_posts/ahmedlone127/2024-09-07-distilbert_base_uncased_finetuned_squad_soikit_en.md new file mode 100644 index 00000000000000..80ec0b755d5d82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-distilbert_base_uncased_finetuned_squad_soikit_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_soikit DistilBertForQuestionAnswering from soikit +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_soikit +date: 2024-09-07 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_soikit` is a English model originally trained by soikit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_soikit_en_5.5.0_3.0_1725695167307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_soikit_en_5.5.0_3.0_1725695167307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_soikit","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_soikit", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_soikit| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/soikit/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-multilingual_xlm_roberta_for_ner_c4n11_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-08-multilingual_xlm_roberta_for_ner_c4n11_pipeline_xx.md new file mode 100644 index 00000000000000..dab5043c138f75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-multilingual_xlm_roberta_for_ner_c4n11_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual multilingual_xlm_roberta_for_ner_c4n11_pipeline pipeline XlmRoBertaForTokenClassification from c4n11 +author: John Snow Labs +name: multilingual_xlm_roberta_for_ner_c4n11_pipeline +date: 2024-09-08 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_xlm_roberta_for_ner_c4n11_pipeline` is a Multilingual model originally trained by c4n11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_c4n11_pipeline_xx_5.5.0_3.0_1725773335687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_c4n11_pipeline_xx_5.5.0_3.0_1725773335687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multilingual_xlm_roberta_for_ner_c4n11_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multilingual_xlm_roberta_for_ner_c4n11_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_xlm_roberta_for_ner_c4n11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|839.7 MB| + +## References + +https://huggingface.co/c4n11/multilingual-xlm-roberta-for-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-q2d_ep3_35_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-q2d_ep3_35_pipeline_en.md new file mode 100644 index 00000000000000..7549bac7dc8820 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-q2d_ep3_35_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English q2d_ep3_35_pipeline pipeline MPNetEmbeddings from ingeol +author: John Snow Labs +name: q2d_ep3_35_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`q2d_ep3_35_pipeline` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/q2d_ep3_35_pipeline_en_5.5.0_3.0_1725769074834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/q2d_ep3_35_pipeline_en_5.5.0_3.0_1725769074834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("q2d_ep3_35_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("q2d_ep3_35_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|q2d_ep3_35_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/q2d_ep3_35 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-qqp_microsoft_deberta_v3_base_seed_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-qqp_microsoft_deberta_v3_base_seed_3_pipeline_en.md new file mode 100644 index 00000000000000..c61dcdfdcb243a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-qqp_microsoft_deberta_v3_base_seed_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English qqp_microsoft_deberta_v3_base_seed_3_pipeline pipeline DeBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: qqp_microsoft_deberta_v3_base_seed_3_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qqp_microsoft_deberta_v3_base_seed_3_pipeline` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qqp_microsoft_deberta_v3_base_seed_3_pipeline_en_5.5.0_3.0_1725803323949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qqp_microsoft_deberta_v3_base_seed_3_pipeline_en_5.5.0_3.0_1725803323949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qqp_microsoft_deberta_v3_base_seed_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qqp_microsoft_deberta_v3_base_seed_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qqp_microsoft_deberta_v3_base_seed_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|648.9 MB| + +## References + +https://huggingface.co/utahnlp/qqp_microsoft_deberta-v3-base_seed-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-test_model_12944qwerty_en.md b/docs/_posts/ahmedlone127/2024-09-08-test_model_12944qwerty_en.md new file mode 100644 index 00000000000000..c3087a05fd3503 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-test_model_12944qwerty_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English test_model_12944qwerty DistilBertForQuestionAnswering from 12944qwerty +author: John Snow Labs +name: test_model_12944qwerty +date: 2024-09-08 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_12944qwerty` is a English model originally trained by 12944qwerty. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_12944qwerty_en_5.5.0_3.0_1725798023359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_12944qwerty_en_5.5.0_3.0_1725798023359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("test_model_12944qwerty","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("test_model_12944qwerty", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_12944qwerty| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/12944qwerty/test_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_copypaste_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_copypaste_en.md new file mode 100644 index 00000000000000..8acda05918149d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_copypaste_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_copypaste DistilBertEmbeddings from CopyPaste +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_copypaste +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_copypaste` is a English model originally trained by CopyPaste. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_copypaste_en_5.5.0_3.0_1725905426089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_copypaste_en_5.5.0_3.0_1725905426089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_imdb_copypaste","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_imdb_copypaste","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_copypaste| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/CopyPaste/distilbert-base-uncased-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_base_fine_freq_wce_unsampled_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_base_fine_freq_wce_unsampled_en.md new file mode 100644 index 00000000000000..05f7497f3ea337 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_base_fine_freq_wce_unsampled_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_base_fine_freq_wce_unsampled MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_base_fine_freq_wce_unsampled +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_base_fine_freq_wce_unsampled` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_base_fine_freq_wce_unsampled_en_5.5.0_3.0_1725891714649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_base_fine_freq_wce_unsampled_en_5.5.0_3.0_1725891714649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_base_fine_freq_wce_unsampled","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_base_fine_freq_wce_unsampled","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_base_fine_freq_wce_unsampled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/ethansimrm/opus_base_fine_freq_wce_unsampled \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-sentiment_analysis_benlitzen43_en.md b/docs/_posts/ahmedlone127/2024-09-09-sentiment_analysis_benlitzen43_en.md new file mode 100644 index 00000000000000..5c6dc8cde9d4de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-sentiment_analysis_benlitzen43_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_benlitzen43 DistilBertForSequenceClassification from Benlitzen43 +author: John Snow Labs +name: sentiment_analysis_benlitzen43 +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_benlitzen43` is a English model originally trained by Benlitzen43. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_benlitzen43_en_5.5.0_3.0_1725873353980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_benlitzen43_en_5.5.0_3.0_1725873353980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_benlitzen43","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_benlitzen43", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_benlitzen43| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Benlitzen43/Sentiment-Analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-albert_model__29_4_en.md b/docs/_posts/ahmedlone127/2024-09-10-albert_model__29_4_en.md new file mode 100644 index 00000000000000..2984b92630281f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-albert_model__29_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_model__29_4 DistilBertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: albert_model__29_4 +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_model__29_4` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_model__29_4_en_5.5.0_3.0_1726009637645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_model__29_4_en_5.5.0_3.0_1726009637645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("albert_model__29_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("albert_model__29_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_model__29_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KalaiselvanD/albert_model__29_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-burmese_nepal_bhasa_model_en.md b/docs/_posts/ahmedlone127/2024-09-10-burmese_nepal_bhasa_model_en.md new file mode 100644 index 00000000000000..e3f20ef3e91028 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-burmese_nepal_bhasa_model_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English burmese_nepal_bhasa_model DistilBertForSequenceClassification from CohleM +author: John Snow Labs +name: burmese_nepal_bhasa_model +date: 2024-09-10 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_nepal_bhasa_model` is a English model originally trained by CohleM. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_en_5.5.0_3.0_1725936036718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_en_5.5.0_3.0_1725936036718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_nepal_bhasa_model","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_nepal_bhasa_model","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_nepal_bhasa_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.8 MB| + +## References + +References + +https://huggingface.co/CohleM/my_new_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-cuad_distil_document_name_cased_08_31_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-cuad_distil_document_name_cased_08_31_v1_pipeline_en.md new file mode 100644 index 00000000000000..10edc7c0f8de82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-cuad_distil_document_name_cased_08_31_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English cuad_distil_document_name_cased_08_31_v1_pipeline pipeline DistilBertForQuestionAnswering from saraks +author: John Snow Labs +name: cuad_distil_document_name_cased_08_31_v1_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cuad_distil_document_name_cased_08_31_v1_pipeline` is a English model originally trained by saraks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cuad_distil_document_name_cased_08_31_v1_pipeline_en_5.5.0_3.0_1725960078109.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cuad_distil_document_name_cased_08_31_v1_pipeline_en_5.5.0_3.0_1725960078109.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cuad_distil_document_name_cased_08_31_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cuad_distil_document_name_cased_08_31_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cuad_distil_document_name_cased_08_31_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/saraks/cuad-distil-document_name-cased-08-31-v1 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_detect_ai_generated_text_luciayn_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_detect_ai_generated_text_luciayn_en.md new file mode 100644 index 00000000000000..34b7df8f153888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_detect_ai_generated_text_luciayn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_detect_ai_generated_text_luciayn DistilBertForSequenceClassification from luciayn +author: John Snow Labs +name: distilbert_base_uncased_detect_ai_generated_text_luciayn +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_detect_ai_generated_text_luciayn` is a English model originally trained by luciayn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_detect_ai_generated_text_luciayn_en_5.5.0_3.0_1726008980329.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_detect_ai_generated_text_luciayn_en_5.5.0_3.0_1726008980329.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_detect_ai_generated_text_luciayn","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_detect_ai_generated_text_luciayn", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_detect_ai_generated_text_luciayn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/luciayn/distilbert-base-uncased-detect_ai_generated_text \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline_en.md new file mode 100644 index 00000000000000..49f06754ee27b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline pipeline DistilBertEmbeddings from MightyVuAI +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline` is a English model originally trained by MightyVuAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline_en_5.5.0_3.0_1725935213931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline_en_5.5.0_3.0_1725935213931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_mightyvuai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/MightyVuAI/distilbert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-output_en.md b/docs/_posts/ahmedlone127/2024-09-10-output_en.md new file mode 100644 index 00000000000000..59c6774f200871 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-output_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English output DistilBertEmbeddings from soyisauce +author: John Snow Labs +name: output +date: 2024-09-10 +tags: [distilbert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`output` is a English model originally trained by soyisauce. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/output_en_5.5.0_3.0_1725980309782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/output_en_5.5.0_3.0_1725980309782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =DistilBertEmbeddings.pretrained("output","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = DistilBertEmbeddings + .pretrained("output", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|output| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +References + +References + +https://huggingface.co/soyisauce/output \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-whisper_small_yue_chinese_full_en.md b/docs/_posts/ahmedlone127/2024-09-10-whisper_small_yue_chinese_full_en.md new file mode 100644 index 00000000000000..2272dc194239f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-whisper_small_yue_chinese_full_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_yue_chinese_full WhisperForCTC from safecantonese +author: John Snow Labs +name: whisper_small_yue_chinese_full +date: 2024-09-10 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_yue_chinese_full` is a English model originally trained by safecantonese. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_yue_chinese_full_en_5.5.0_3.0_1725949325788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_yue_chinese_full_en_5.5.0_3.0_1725949325788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_yue_chinese_full","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_yue_chinese_full", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_yue_chinese_full| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/safecantonese/whisper-small-yue-full \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline_en.md new file mode 100644 index 00000000000000..b4032f1ac4e9de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline pipeline XlmRoBertaForTokenClassification from g22tk021 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline` is a English model originally trained by g22tk021. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline_en_5.5.0_3.0_1725973335366.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline_en_5.5.0_3.0_1725973335366.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_g22tk021_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/g22tk021/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-bert_base_uncased_finetuned_srl_arg_en.md b/docs/_posts/ahmedlone127/2024-09-11-bert_base_uncased_finetuned_srl_arg_en.md new file mode 100644 index 00000000000000..0a736dffa2fb8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-bert_base_uncased_finetuned_srl_arg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_srl_arg BertForTokenClassification from dannashao +author: John Snow Labs +name: bert_base_uncased_finetuned_srl_arg +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_srl_arg` is a English model originally trained by dannashao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_srl_arg_en_5.5.0_3.0_1726026192173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_srl_arg_en_5.5.0_3.0_1726026192173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_finetuned_srl_arg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_finetuned_srl_arg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_srl_arg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/dannashao/bert-base-uncased-finetuned-srl_arg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-bert_uncased_slot_filling_en.md b/docs/_posts/ahmedlone127/2024-09-11-bert_uncased_slot_filling_en.md new file mode 100644 index 00000000000000..38046db6d81b6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-bert_uncased_slot_filling_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_uncased_slot_filling BertForTokenClassification from andgonzalez +author: John Snow Labs +name: bert_uncased_slot_filling +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_uncased_slot_filling` is a English model originally trained by andgonzalez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_uncased_slot_filling_en_5.5.0_3.0_1726026081713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_uncased_slot_filling_en_5.5.0_3.0_1726026081713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_uncased_slot_filling","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_uncased_slot_filling", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_uncased_slot_filling| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/andgonzalez/bert-uncased-slot-filling \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline_en.md new file mode 100644 index 00000000000000..3200728e7c7a74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline pipeline DistilBertForSequenceClassification from LeBruse +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline` is a English model originally trained by LeBruse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline_en_5.5.0_3.0_1726014530074.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline_en_5.5.0_3.0_1726014530074.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_overall_normalised_text_3_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeBruse/distilbert-base-uncased-finetuned-emotion-overall-normalised-text-3.0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-exxon_semantic_search_en.md b/docs/_posts/ahmedlone127/2024-09-11-exxon_semantic_search_en.md new file mode 100644 index 00000000000000..2e25b48bc0fefb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-exxon_semantic_search_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English exxon_semantic_search MPNetEmbeddings from akshitguptafintek24 +author: John Snow Labs +name: exxon_semantic_search +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`exxon_semantic_search` is a English model originally trained by akshitguptafintek24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/exxon_semantic_search_en_5.5.0_3.0_1726033590658.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/exxon_semantic_search_en_5.5.0_3.0_1726033590658.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("exxon_semantic_search","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("exxon_semantic_search","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|exxon_semantic_search| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/akshitguptafintek24/exxon-semantic-search \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-n_roberta_sst5_padding0model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-n_roberta_sst5_padding0model_pipeline_en.md new file mode 100644 index 00000000000000..ebac64c8cc1ce5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-n_roberta_sst5_padding0model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_roberta_sst5_padding0model_pipeline pipeline RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: n_roberta_sst5_padding0model_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_roberta_sst5_padding0model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_roberta_sst5_padding0model_pipeline_en_5.5.0_3.0_1726053242992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_roberta_sst5_padding0model_pipeline_en_5.5.0_3.0_1726053242992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_roberta_sst5_padding0model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_roberta_sst5_padding0model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_roberta_sst5_padding0model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|437.8 MB| + +## References + +https://huggingface.co/Realgon/N_roberta_sst5_padding0model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-11-nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..a68358859083e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726071125925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726071125925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random3_seed0_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random3_seed0-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_indonesian_opus100_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_indonesian_opus100_en.md new file mode 100644 index 00000000000000..0c9361b097d41c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_english_indonesian_opus100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_indonesian_opus100 MarianTransformer from yonathanstwn +author: John Snow Labs +name: opus_maltese_english_indonesian_opus100 +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_indonesian_opus100` is a English model originally trained by yonathanstwn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_opus100_en_5.5.0_3.0_1726038969534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_opus100_en_5.5.0_3.0_1726038969534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_indonesian_opus100","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_indonesian_opus100","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_indonesian_opus100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|481.7 MB| + +## References + +https://huggingface.co/yonathanstwn/opus-mt-en-id-opus100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_base_squad_i8_f32_p50_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_squad_i8_f32_p50_en.md new file mode 100644 index 00000000000000..7b4ec6b23f67df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_base_squad_i8_f32_p50_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_squad_i8_f32_p50 RoBertaForQuestionAnswering from pminha +author: John Snow Labs +name: roberta_base_squad_i8_f32_p50 +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_squad_i8_f32_p50` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_squad_i8_f32_p50_en_5.5.0_3.0_1726036093510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_squad_i8_f32_p50_en_5.5.0_3.0_1726036093510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_squad_i8_f32_p50","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_squad_i8_f32_p50", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_squad_i8_f32_p50| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|228.5 MB| + +## References + +https://huggingface.co/pminha/roberta-base-squad-i8-f32-p50 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sentimentanalysis_yelpreviews_optimizedmodel_en.md b/docs/_posts/ahmedlone127/2024-09-11-sentimentanalysis_yelpreviews_optimizedmodel_en.md new file mode 100644 index 00000000000000..735649cbf2bbdf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sentimentanalysis_yelpreviews_optimizedmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentimentanalysis_yelpreviews_optimizedmodel DistilBertForSequenceClassification from ElizaClaPa +author: John Snow Labs +name: sentimentanalysis_yelpreviews_optimizedmodel +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentimentanalysis_yelpreviews_optimizedmodel` is a English model originally trained by ElizaClaPa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentimentanalysis_yelpreviews_optimizedmodel_en_5.5.0_3.0_1726052441672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentimentanalysis_yelpreviews_optimizedmodel_en_5.5.0_3.0_1726052441672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentimentanalysis_yelpreviews_optimizedmodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentimentanalysis_yelpreviews_optimizedmodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentimentanalysis_yelpreviews_optimizedmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/ElizaClaPa/SentimentAnalysis-YelpReviews-OptimizedModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline_en.md new file mode 100644 index 00000000000000..2fbc999fcbebbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline pipeline BertForSequenceClassification from Sonatafyai +author: John Snow Labs +name: symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline` is a English model originally trained by Sonatafyai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline_en_5.5.0_3.0_1726015182769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline_en_5.5.0_3.0_1726015182769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_sonatafyai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/Sonatafyai/Symptoms_to_Diagnosis_SonatafyAI_BERT_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-tesla_earningscall_sentiment_analysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-tesla_earningscall_sentiment_analysis_pipeline_en.md new file mode 100644 index 00000000000000..38494d0afb8fc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-tesla_earningscall_sentiment_analysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tesla_earningscall_sentiment_analysis_pipeline pipeline RoBertaForSequenceClassification from weip9012 +author: John Snow Labs +name: tesla_earningscall_sentiment_analysis_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tesla_earningscall_sentiment_analysis_pipeline` is a English model originally trained by weip9012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tesla_earningscall_sentiment_analysis_pipeline_en_5.5.0_3.0_1726060975160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tesla_earningscall_sentiment_analysis_pipeline_en_5.5.0_3.0_1726060975160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tesla_earningscall_sentiment_analysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tesla_earningscall_sentiment_analysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tesla_earningscall_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.8 MB| + +## References + +https://huggingface.co/weip9012/tesla_earningscall_sentiment_analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-financial_phrasebank_oversampling_10perc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-financial_phrasebank_oversampling_10perc_pipeline_en.md new file mode 100644 index 00000000000000..ab179696473979 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-financial_phrasebank_oversampling_10perc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English financial_phrasebank_oversampling_10perc_pipeline pipeline RoBertaForSequenceClassification from kruthof +author: John Snow Labs +name: financial_phrasebank_oversampling_10perc_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_phrasebank_oversampling_10perc_pipeline` is a English model originally trained by kruthof. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_phrasebank_oversampling_10perc_pipeline_en_5.5.0_3.0_1726117934012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_phrasebank_oversampling_10perc_pipeline_en_5.5.0_3.0_1726117934012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("financial_phrasebank_oversampling_10perc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("financial_phrasebank_oversampling_10perc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_phrasebank_oversampling_10perc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.8 MB| + +## References + +https://huggingface.co/kruthof/financial_phrasebank_oversampling_10perc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-rbt8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-rbt8_pipeline_en.md new file mode 100644 index 00000000000000..c6e17291b5b08c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-rbt8_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English rbt8_pipeline pipeline RoBertaForQuestionAnswering from SUTS102779289 +author: John Snow Labs +name: rbt8_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rbt8_pipeline` is a English model originally trained by SUTS102779289. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rbt8_pipeline_en_5.5.0_3.0_1726106232059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rbt8_pipeline_en_5.5.0_3.0_1726106232059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rbt8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rbt8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rbt8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/SUTS102779289/rbt8 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline_en.md new file mode 100644 index 00000000000000..c48125f3b1a04f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline pipeline RoBertaForQuestionAnswering from rizquuula +author: John Snow Labs +name: roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline` is a English model originally trained by rizquuula. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline_en_5.5.0_3.0_1726175807702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline_en_5.5.0_3.0_1726175807702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_indosquadv2_1691593432_16_2e_06_0_01_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|447.2 MB| + +## References + +https://huggingface.co/rizquuula/RoBERTa-IndoSQuADv2_1691593432-16-2e-06-0.01-5 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_spanish_v1_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_spanish_v1_en.md new file mode 100644 index 00000000000000..2282355120e85f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_spanish_v1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_spanish_v1 RoBertaForQuestionAnswering from enriquesaou +author: John Snow Labs +name: roberta_spanish_v1 +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_spanish_v1` is a English model originally trained by enriquesaou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_spanish_v1_en_5.5.0_3.0_1726106625576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_spanish_v1_en_5.5.0_3.0_1726106625576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_spanish_v1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_spanish_v1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_spanish_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/enriquesaou/roberta_es_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-spamai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-spamai_pipeline_en.md new file mode 100644 index 00000000000000..a3583eb7979e8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-spamai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spamai_pipeline pipeline BertForSequenceClassification from cybert79 +author: John Snow Labs +name: spamai_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spamai_pipeline` is a English model originally trained by cybert79. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spamai_pipeline_en_5.5.0_3.0_1726123203710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spamai_pipeline_en_5.5.0_3.0_1726123203710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spamai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spamai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spamai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cybert79/spamai + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-whisper_small_afrikaans_za_ptah23_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-whisper_small_afrikaans_za_ptah23_pipeline_en.md new file mode 100644 index 00000000000000..f2fed0f8cbcf34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-whisper_small_afrikaans_za_ptah23_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_afrikaans_za_ptah23_pipeline pipeline WhisperForCTC from ptah23 +author: John Snow Labs +name: whisper_small_afrikaans_za_ptah23_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_afrikaans_za_ptah23_pipeline` is a English model originally trained by ptah23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_afrikaans_za_ptah23_pipeline_en_5.5.0_3.0_1726150689066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_afrikaans_za_ptah23_pipeline_en_5.5.0_3.0_1726150689066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_afrikaans_za_ptah23_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_afrikaans_za_ptah23_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_afrikaans_za_ptah23_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ptah23/whisper-small-af-ZA + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-whisper_tiny_english_jonnhan_en.md b/docs/_posts/ahmedlone127/2024-09-12-whisper_tiny_english_jonnhan_en.md new file mode 100644 index 00000000000000..603d0d425700bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-whisper_tiny_english_jonnhan_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_jonnhan WhisperForCTC from Jonnhan +author: John Snow Labs +name: whisper_tiny_english_jonnhan +date: 2024-09-12 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_jonnhan` is a English model originally trained by Jonnhan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_jonnhan_en_5.5.0_3.0_1726134921342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_jonnhan_en_5.5.0_3.0_1726134921342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_jonnhan","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_jonnhan", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_jonnhan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/Jonnhan/whisper-tiny-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_mikeldiez_en.md b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_mikeldiez_en.md new file mode 100644 index 00000000000000..f40f519921ffd8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_mikeldiez_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_mikeldiez DistilBertForQuestionAnswering from mikeldiez +author: John Snow Labs +name: burmese_awesome_qa_model_mikeldiez +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_mikeldiez` is a English model originally trained by mikeldiez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_mikeldiez_en_5.5.0_3.0_1726267117335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_mikeldiez_en_5.5.0_3.0_1726267117335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_mikeldiez","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_mikeldiez", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_mikeldiez| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|505.4 MB| + +## References + +https://huggingface.co/mikeldiez/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_igniter909_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_igniter909_en.md new file mode 100644 index 00000000000000..8ab47689809f6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_emotion_igniter909_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_igniter909 DistilBertForSequenceClassification from Igniter909 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_igniter909 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_igniter909` is a English model originally trained by Igniter909. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_igniter909_en_5.5.0_3.0_1726242656901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_igniter909_en_5.5.0_3.0_1726242656901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_igniter909","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_igniter909", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_igniter909| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Igniter909/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-rmse_1_en.md b/docs/_posts/ahmedlone127/2024-09-13-rmse_1_en.md new file mode 100644 index 00000000000000..3a9c95ecec5fe4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-rmse_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rmse_1 RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: rmse_1 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rmse_1` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rmse_1_en_5.5.0_3.0_1726248042220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rmse_1_en_5.5.0_3.0_1726248042220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("rmse_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("rmse_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rmse_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/RMSE_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline_en.md new file mode 100644 index 00000000000000..a5562b4cd7c785 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline pipeline RoBertaForQuestionAnswering from ayoubsassi +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline` is a English model originally trained by ayoubsassi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline_en_5.5.0_3.0_1726207102885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline_en_5.5.0_3.0_1726207102885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_ayoubsassi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/ayoubsassi/roberta-finetuned-subjqa-movies_2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_small_turkish_clean_uncased_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_small_turkish_clean_uncased_pipeline_tr.md new file mode 100644 index 00000000000000..a3a4d017174dcb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_small_turkish_clean_uncased_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish roberta_small_turkish_clean_uncased_pipeline pipeline RoBertaEmbeddings from burakaytan +author: John Snow Labs +name: roberta_small_turkish_clean_uncased_pipeline +date: 2024-09-13 +tags: [tr, open_source, pipeline, onnx] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_small_turkish_clean_uncased_pipeline` is a Turkish model originally trained by burakaytan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_small_turkish_clean_uncased_pipeline_tr_5.5.0_3.0_1726264761396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_small_turkish_clean_uncased_pipeline_tr_5.5.0_3.0_1726264761396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_small_turkish_clean_uncased_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_small_turkish_clean_uncased_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_small_turkish_clean_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|222.4 MB| + +## References + +https://huggingface.co/burakaytan/roberta-small-turkish-clean-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en.md new file mode 100644 index 00000000000000..6b07b1f301e1bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline pipeline BertSentenceEmbeddings from betteib +author: John Snow Labs +name: sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline` is a English model originally trained by betteib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en_5.5.0_3.0_1726246349573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline_en_5.5.0_3.0_1726246349573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabert_finetuned_mdeberta_tswana_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|505.1 MB| + +## References + +https://huggingface.co/betteib/bert-base-arabert-finetuned-mdeberta-tn-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_qe3_ar.md b/docs/_posts/ahmedlone127/2024-09-13-sent_qe3_ar.md new file mode 100644 index 00000000000000..9f807c3b2a0441 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_qe3_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_qe3 BertSentenceEmbeddings from NLP-EXP +author: John Snow Labs +name: sent_qe3 +date: 2024-09-13 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_qe3` is a Arabic model originally trained by NLP-EXP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_qe3_ar_5.5.0_3.0_1726233105019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_qe3_ar_5.5.0_3.0_1726233105019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_qe3","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_qe3","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_qe3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|504.1 MB| + +## References + +https://huggingface.co/NLP-EXP/QE3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-test1_jliucy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-test1_jliucy_pipeline_en.md new file mode 100644 index 00000000000000..e6ca31ed4421ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-test1_jliucy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test1_jliucy_pipeline pipeline DistilBertForSequenceClassification from jliucy +author: John Snow Labs +name: test1_jliucy_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test1_jliucy_pipeline` is a English model originally trained by jliucy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test1_jliucy_pipeline_en_5.5.0_3.0_1726262143495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test1_jliucy_pipeline_en_5.5.0_3.0_1726262143495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test1_jliucy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test1_jliucy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test1_jliucy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jliucy/test1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-trainer_chapter4_lixiwu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-trainer_chapter4_lixiwu_pipeline_en.md new file mode 100644 index 00000000000000..ada8c1f15b8bc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-trainer_chapter4_lixiwu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trainer_chapter4_lixiwu_pipeline pipeline DistilBertForSequenceClassification from lixiwu +author: John Snow Labs +name: trainer_chapter4_lixiwu_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer_chapter4_lixiwu_pipeline` is a English model originally trained by lixiwu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer_chapter4_lixiwu_pipeline_en_5.5.0_3.0_1726262553116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer_chapter4_lixiwu_pipeline_en_5.5.0_3.0_1726262553116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trainer_chapter4_lixiwu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trainer_chapter4_lixiwu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer_chapter4_lixiwu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lixiwu/trainer-chapter4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..ad4919b6d16894 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline_en_5.5.0_3.0_1726242379802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline_en_5.5.0_3.0_1726242379802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mantisbt_test_content_tags_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mantisbt_test-content_tags-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-albert_hatespeech_classifier6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-albert_hatespeech_classifier6_pipeline_en.md new file mode 100644 index 00000000000000..6409c4a28d6102 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-albert_hatespeech_classifier6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_hatespeech_classifier6_pipeline pipeline AlbertForSequenceClassification from samuelcolvin26 +author: John Snow Labs +name: albert_hatespeech_classifier6_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_hatespeech_classifier6_pipeline` is a English model originally trained by samuelcolvin26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_hatespeech_classifier6_pipeline_en_5.5.0_3.0_1726336546941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_hatespeech_classifier6_pipeline_en_5.5.0_3.0_1726336546941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_hatespeech_classifier6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_hatespeech_classifier6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_hatespeech_classifier6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/samuelcolvin26/Albert_Hatespeech_Classifier6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_eli5_mlm_model_2_en.md b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_eli5_mlm_model_2_en.md new file mode 100644 index 00000000000000..edf3f11d8af368 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_eli5_mlm_model_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_2 RoBertaEmbeddings from amirhamza11 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_2 +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_2` is a English model originally trained by amirhamza11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_2_en_5.5.0_3.0_1726338676643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_2_en_5.5.0_3.0_1726338676643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/amirhamza11/my_awesome_eli5_mlm_model_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-finetuned_twitter_sentiment_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-finetuned_twitter_sentiment_roberta_pipeline_en.md new file mode 100644 index 00000000000000..5ec234eff2d5c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-finetuned_twitter_sentiment_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_twitter_sentiment_roberta_pipeline pipeline XlmRoBertaForSequenceClassification from coderSounak +author: John Snow Labs +name: finetuned_twitter_sentiment_roberta_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_twitter_sentiment_roberta_pipeline` is a English model originally trained by coderSounak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_twitter_sentiment_roberta_pipeline_en_5.5.0_3.0_1726318129570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_twitter_sentiment_roberta_pipeline_en_5.5.0_3.0_1726318129570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_twitter_sentiment_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_twitter_sentiment_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_twitter_sentiment_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/coderSounak/finetuned_twitter_sentiment_roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-greeklegalroberta_v4_en.md b/docs/_posts/ahmedlone127/2024-09-14-greeklegalroberta_v4_en.md new file mode 100644 index 00000000000000..d107aa5373c1cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-greeklegalroberta_v4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English greeklegalroberta_v4 RoBertaEmbeddings from AI-team-UoA +author: John Snow Labs +name: greeklegalroberta_v4 +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`greeklegalroberta_v4` is a English model originally trained by AI-team-UoA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/greeklegalroberta_v4_en_5.5.0_3.0_1726300053283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/greeklegalroberta_v4_en_5.5.0_3.0_1726300053283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("greeklegalroberta_v4","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("greeklegalroberta_v4","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|greeklegalroberta_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.6 MB| + +## References + +https://huggingface.co/AI-team-UoA/GreekLegalRoBERTa_v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-iwslt17_marian_big_ctx2_cwd0_english_french_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-iwslt17_marian_big_ctx2_cwd0_english_french_pipeline_en.md new file mode 100644 index 00000000000000..6404e53778a825 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-iwslt17_marian_big_ctx2_cwd0_english_french_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English iwslt17_marian_big_ctx2_cwd0_english_french_pipeline pipeline MarianTransformer from context-mt +author: John Snow Labs +name: iwslt17_marian_big_ctx2_cwd0_english_french_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`iwslt17_marian_big_ctx2_cwd0_english_french_pipeline` is a English model originally trained by context-mt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/iwslt17_marian_big_ctx2_cwd0_english_french_pipeline_en_5.5.0_3.0_1726350872257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/iwslt17_marian_big_ctx2_cwd0_english_french_pipeline_en_5.5.0_3.0_1726350872257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("iwslt17_marian_big_ctx2_cwd0_english_french_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("iwslt17_marian_big_ctx2_cwd0_english_french_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|iwslt17_marian_big_ctx2_cwd0_english_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/context-mt/iwslt17-marian-big-ctx2-cwd0-en-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-limiwhisper_small_korean_dia_gs_ko.md b/docs/_posts/ahmedlone127/2024-09-14-limiwhisper_small_korean_dia_gs_ko.md new file mode 100644 index 00000000000000..d561e9e38bbfe2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-limiwhisper_small_korean_dia_gs_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean limiwhisper_small_korean_dia_gs WhisperForCTC from p4b +author: John Snow Labs +name: limiwhisper_small_korean_dia_gs +date: 2024-09-14 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`limiwhisper_small_korean_dia_gs` is a Korean model originally trained by p4b. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/limiwhisper_small_korean_dia_gs_ko_5.5.0_3.0_1726330725287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/limiwhisper_small_korean_dia_gs_ko_5.5.0_3.0_1726330725287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("limiwhisper_small_korean_dia_gs","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("limiwhisper_small_korean_dia_gs", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|limiwhisper_small_korean_dia_gs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|1.1 GB| + +## References + +https://huggingface.co/p4b/limiwhisper-small-ko-dia-gs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-medrurobertalarge_sayula_popoluca_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-medrurobertalarge_sayula_popoluca_pipeline_en.md new file mode 100644 index 00000000000000..98eb9cbeb206e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-medrurobertalarge_sayula_popoluca_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English medrurobertalarge_sayula_popoluca_pipeline pipeline RoBertaForTokenClassification from DimasikKurd +author: John Snow Labs +name: medrurobertalarge_sayula_popoluca_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medrurobertalarge_sayula_popoluca_pipeline` is a English model originally trained by DimasikKurd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medrurobertalarge_sayula_popoluca_pipeline_en_5.5.0_3.0_1726315195837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medrurobertalarge_sayula_popoluca_pipeline_en_5.5.0_3.0_1726315195837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("medrurobertalarge_sayula_popoluca_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("medrurobertalarge_sayula_popoluca_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medrurobertalarge_sayula_popoluca_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DimasikKurd/MedRuRobertaLarge_pos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-mobilebert_sst2_en.md b/docs/_posts/ahmedlone127/2024-09-14-mobilebert_sst2_en.md new file mode 100644 index 00000000000000..f28fcbc5a5bf8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-mobilebert_sst2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mobilebert_sst2 BertForSequenceClassification from Alireza1044 +author: John Snow Labs +name: mobilebert_sst2 +date: 2024-09-14 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_sst2` is a English model originally trained by Alireza1044. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_sst2_en_5.5.0_3.0_1726348524905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_sst2_en_5.5.0_3.0_1726348524905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("mobilebert_sst2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("mobilebert_sst2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_sst2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/Alireza1044/mobilebert_sst2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-robertabase_ppt_occitan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-robertabase_ppt_occitan_pipeline_en.md new file mode 100644 index 00000000000000..488bd14bce116a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-robertabase_ppt_occitan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertabase_ppt_occitan_pipeline pipeline RoBertaEmbeddings from mehrshadk +author: John Snow Labs +name: robertabase_ppt_occitan_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertabase_ppt_occitan_pipeline` is a English model originally trained by mehrshadk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertabase_ppt_occitan_pipeline_en_5.5.0_3.0_1726338440401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertabase_ppt_occitan_pipeline_en_5.5.0_3.0_1726338440401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertabase_ppt_occitan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertabase_ppt_occitan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertabase_ppt_occitan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/mehrshadk/robertaBase_ppt_OC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sent_bert_base_multilingual_cased_finetuned_luganda_xx.md b/docs/_posts/ahmedlone127/2024-09-14-sent_bert_base_multilingual_cased_finetuned_luganda_xx.md new file mode 100644 index 00000000000000..98dd009d182b80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sent_bert_base_multilingual_cased_finetuned_luganda_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_luganda BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_luganda +date: 2024-09-14 +tags: [xx, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_luganda` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_luganda_xx_5.5.0_3.0_1726310900295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_luganda_xx_5.5.0_3.0_1726310900295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_cased_finetuned_luganda","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_cased_finetuned_luganda","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_luganda| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|665.0 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-luganda \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_occupy1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_occupy1_pipeline_en.md new file mode 100644 index 00000000000000..7cacaeec662661 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_occupy1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_occupy1_pipeline pipeline XlmRoBertaForTokenClassification from occupy1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_occupy1_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_occupy1_pipeline` is a English model originally trained by occupy1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_occupy1_pipeline_en_5.5.0_3.0_1726345485770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_occupy1_pipeline_en_5.5.0_3.0_1726345485770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_occupy1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_occupy1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_occupy1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/occupy1/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bert_large2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-bert_large2_pipeline_en.md new file mode 100644 index 00000000000000..fef83d20a46464 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bert_large2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large2_pipeline pipeline RoBertaForSequenceClassification from RogerKam +author: John Snow Labs +name: bert_large2_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large2_pipeline` is a English model originally trained by RogerKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large2_pipeline_en_5.5.0_3.0_1726401399024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large2_pipeline_en_5.5.0_3.0_1726401399024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|434.3 MB| + +## References + +https://huggingface.co/RogerKam/BERT-large2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bsc_bio_ehr_spanish_nubes_es.md b/docs/_posts/ahmedlone127/2024-09-15-bsc_bio_ehr_spanish_nubes_es.md new file mode 100644 index 00000000000000..83082d68568412 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bsc_bio_ehr_spanish_nubes_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_nubes RoBertaForTokenClassification from IIC +author: John Snow Labs +name: bsc_bio_ehr_spanish_nubes +date: 2024-09-15 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_nubes` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_nubes_es_5.5.0_3.0_1726403494287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_nubes_es_5.5.0_3.0_1726403494287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_nubes","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_nubes", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_nubes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|434.7 MB| + +## References + +https://huggingface.co/IIC/bsc-bio-ehr-es-nubes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_faaany_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_faaany_pipeline_en.md new file mode 100644 index 00000000000000..ec1fd58d20f72f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_faaany_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_faaany_pipeline pipeline DistilBertForQuestionAnswering from faaany +author: John Snow Labs +name: burmese_awesome_qa_model_faaany_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_faaany_pipeline` is a English model originally trained by faaany. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_faaany_pipeline_en_5.5.0_3.0_1726382877012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_faaany_pipeline_en_5.5.0_3.0_1726382877012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_faaany_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_faaany_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_faaany_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/faaany/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_adlv_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_adlv_pipeline_en.md new file mode 100644 index 00000000000000..cc78f2f7a58241 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_adlv_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_adlv_pipeline pipeline DistilBertForSequenceClassification from adlv +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_adlv_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_adlv_pipeline` is a English model originally trained by adlv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adlv_pipeline_en_5.5.0_3.0_1726366204215.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adlv_pipeline_en_5.5.0_3.0_1726366204215.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_adlv_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_adlv_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_adlv_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/adlv/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline_en.md new file mode 100644 index 00000000000000..efc3fae752199d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline pipeline DistilBertForSequenceClassification from Ashkanero +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline` is a English model originally trained by Ashkanero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline_en_5.5.0_3.0_1726365846427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline_en_5.5.0_3.0_1726365846427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ashkanero_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ashkanero/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-fine_tuned_resume_model_en.md b/docs/_posts/ahmedlone127/2024-09-15-fine_tuned_resume_model_en.md new file mode 100644 index 00000000000000..15c3831361c68c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-fine_tuned_resume_model_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English fine_tuned_resume_model DistilBertForSequenceClassification from Invimatic +author: John Snow Labs +name: fine_tuned_resume_model +date: 2024-09-15 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_resume_model` is a English model originally trained by Invimatic. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_resume_model_en_5.5.0_3.0_1726385216113.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_resume_model_en_5.5.0_3.0_1726385216113.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_resume_model","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_resume_model","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_resume_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +References + +https://huggingface.co/Invimatic/fine_tuned_resume_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-mentalroberta_empai_final3_en.md b/docs/_posts/ahmedlone127/2024-09-15-mentalroberta_empai_final3_en.md new file mode 100644 index 00000000000000..3204a7f1b7a89e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-mentalroberta_empai_final3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mentalroberta_empai_final3 RoBertaEmbeddings from LuangMV97 +author: John Snow Labs +name: mentalroberta_empai_final3 +date: 2024-09-15 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mentalroberta_empai_final3` is a English model originally trained by LuangMV97. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mentalroberta_empai_final3_en_5.5.0_3.0_1726413205588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mentalroberta_empai_final3_en_5.5.0_3.0_1726413205588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mentalroberta_empai_final3","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mentalroberta_empai_final3","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mentalroberta_empai_final3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/LuangMV97/MentalRoBERTa_EmpAI_final3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_base_lora_591k_squad_model3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_lora_591k_squad_model3_pipeline_en.md new file mode 100644 index 00000000000000..71ba736db37112 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_lora_591k_squad_model3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_base_lora_591k_squad_model3_pipeline pipeline RoBertaForQuestionAnswering from varun-v-rao +author: John Snow Labs +name: roberta_base_lora_591k_squad_model3_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_lora_591k_squad_model3_pipeline` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_lora_591k_squad_model3_pipeline_en_5.5.0_3.0_1726369290494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_lora_591k_squad_model3_pipeline_en_5.5.0_3.0_1726369290494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_lora_591k_squad_model3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_lora_591k_squad_model3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_lora_591k_squad_model3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|317.4 MB| + +## References + +https://huggingface.co/varun-v-rao/roberta-base-lora-591K-squad-model3 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_large_finetuned_mrpc_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_finetuned_mrpc_en.md new file mode 100644 index 00000000000000..8e6930db08b459 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_large_finetuned_mrpc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_mrpc RoBertaForSequenceClassification from VitaliiVrublevskyi +author: John Snow Labs +name: roberta_large_finetuned_mrpc +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_mrpc` is a English model originally trained by VitaliiVrublevskyi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_mrpc_en_5.5.0_3.0_1726401680278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_mrpc_en_5.5.0_3.0_1726401680278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_mrpc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_mrpc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_mrpc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/VitaliiVrublevskyi/roberta-large-finetuned-mrpc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-schem_roberta_demographic_text_disagreement_predictor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-schem_roberta_demographic_text_disagreement_predictor_pipeline_en.md new file mode 100644 index 00000000000000..fa9b5537d9daa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-schem_roberta_demographic_text_disagreement_predictor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English schem_roberta_demographic_text_disagreement_predictor_pipeline pipeline RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: schem_roberta_demographic_text_disagreement_predictor_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`schem_roberta_demographic_text_disagreement_predictor_pipeline` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/schem_roberta_demographic_text_disagreement_predictor_pipeline_en_5.5.0_3.0_1726401544764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/schem_roberta_demographic_text_disagreement_predictor_pipeline_en_5.5.0_3.0_1726401544764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("schem_roberta_demographic_text_disagreement_predictor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("schem_roberta_demographic_text_disagreement_predictor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|schem_roberta_demographic_text_disagreement_predictor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.8 MB| + +## References + +https://huggingface.co/RuyuanWan/SChem_RoBERTa_Demographic-text_Disagreement_Predictor + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_10lang_cased_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_10lang_cased_pipeline_xx.md new file mode 100644 index 00000000000000..df157e08c940ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_10lang_cased_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_10lang_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_10lang_cased_pipeline +date: 2024-09-15 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_10lang_cased_pipeline` is a Multilingual model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_10lang_cased_pipeline_xx_5.5.0_3.0_1726436556417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_10lang_cased_pipeline_xx_5.5.0_3.0_1726436556417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_10lang_cased_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_10lang_cased_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_10lang_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|514.9 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-10lang-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline_en.md new file mode 100644 index 00000000000000..a596a36832e2e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline pipeline XlmRoBertaForTokenClassification from tatsunori +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline` is a English model originally trained by tatsunori. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline_en_5.5.0_3.0_1726370327393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline_en_5.5.0_3.0_1726370327393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_tatsunori_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/tatsunori/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-2nddeproberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-2nddeproberta_pipeline_en.md new file mode 100644 index 00000000000000..3c8b44bedc154f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-2nddeproberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2nddeproberta_pipeline pipeline RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: 2nddeproberta_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2nddeproberta_pipeline` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2nddeproberta_pipeline_en_5.5.0_3.0_1726470999355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2nddeproberta_pipeline_en_5.5.0_3.0_1726470999355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2nddeproberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2nddeproberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2nddeproberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/2ndDepRoBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bert_base_tweet_topic_classification_en.md b/docs/_posts/ahmedlone127/2024-09-16-bert_base_tweet_topic_classification_en.md new file mode 100644 index 00000000000000..8c7c9b9c06bb5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bert_base_tweet_topic_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_tweet_topic_classification BertForSequenceClassification from GeeDino +author: John Snow Labs +name: bert_base_tweet_topic_classification +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_tweet_topic_classification` is a English model originally trained by GeeDino. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_tweet_topic_classification_en_5.5.0_3.0_1726499150857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_tweet_topic_classification_en_5.5.0_3.0_1726499150857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_tweet_topic_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_tweet_topic_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_tweet_topic_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|627.8 MB| + +## References + +https://huggingface.co/GeeDino/bert-base-tweet-topic-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_bibibobo777_en.md b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_bibibobo777_en.md new file mode 100644 index 00000000000000..e4305a2a2a0d46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-burmese_awesome_qa_model_bibibobo777_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_bibibobo777 DistilBertForQuestionAnswering from bibibobo777 +author: John Snow Labs +name: burmese_awesome_qa_model_bibibobo777 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_bibibobo777` is a English model originally trained by bibibobo777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_bibibobo777_en_5.5.0_3.0_1726469593410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_bibibobo777_en_5.5.0_3.0_1726469593410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_bibibobo777","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_bibibobo777", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_bibibobo777| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/bibibobo777/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_distilled_squad_bloomlonely_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_distilled_squad_bloomlonely_pipeline_en.md new file mode 100644 index 00000000000000..219ac9c6806b45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_cased_distilled_squad_bloomlonely_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_cased_distilled_squad_bloomlonely_pipeline pipeline DistilBertForQuestionAnswering from BloomLonely +author: John Snow Labs +name: distilbert_base_cased_distilled_squad_bloomlonely_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_distilled_squad_bloomlonely_pipeline` is a English model originally trained by BloomLonely. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_bloomlonely_pipeline_en_5.5.0_3.0_1726515569100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_bloomlonely_pipeline_en_5.5.0_3.0_1726515569100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_distilled_squad_bloomlonely_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_distilled_squad_bloomlonely_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_distilled_squad_bloomlonely_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/BloomLonely/distilbert-base-cased-distilled-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline_en.md new file mode 100644 index 00000000000000..1a8f84bb8b53a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline pipeline DistilBertForSequenceClassification from Lau123 +author: John Snow Labs +name: distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline` is a English model originally trained by Lau123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline_en_5.5.0_3.0_1726525696933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline_en_5.5.0_3.0_1726525696933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_detect_ai_generated_text_lau123_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Lau123/distilbert-base-uncased-detect_ai_generated_text + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline_en.md new file mode 100644 index 00000000000000..93d2857cb0ae42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline pipeline DistilBertForSequenceClassification from jeongyeom +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline` is a English model originally trained by jeongyeom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline_en_5.5.0_3.0_1726525589020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline_en_5.5.0_3.0_1726525589020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_jeongyeom_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/jeongyeom/distilbert-base-uncased-distilled-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline_en.md new file mode 100644 index 00000000000000..bc83547a56c7fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline pipeline DistilBertForQuestionAnswering from suthanhcong +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline` is a English model originally trained by suthanhcong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline_en_5.5.0_3.0_1726515559291.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline_en_5.5.0_3.0_1726515559291.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_suthanhcong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/suthanhcong/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squad2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squad2_pipeline_en.md new file mode 100644 index 00000000000000..9609e47159371a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_finetuned_squad2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_finetuned_squad2_pipeline pipeline DistilBertForQuestionAnswering from NMCxyz +author: John Snow Labs +name: distilbert_finetuned_squad2_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_squad2_pipeline` is a English model originally trained by NMCxyz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squad2_pipeline_en_5.5.0_3.0_1726515153825.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_squad2_pipeline_en_5.5.0_3.0_1726515153825.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_squad2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_squad2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_squad2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/NMCxyz/distilbert-finetuned-squad2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_sanskrit_saskta_glue_experiment_qnli_96_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_sanskrit_saskta_glue_experiment_qnli_96_en.md new file mode 100644 index 00000000000000..d8e1205195af50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_sanskrit_saskta_glue_experiment_qnli_96_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_qnli_96 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_qnli_96 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_qnli_96` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_qnli_96_en_5.5.0_3.0_1726525464435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_qnli_96_en_5.5.0_3.0_1726525464435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_qnli_96","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_qnli_96", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_qnli_96| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|25.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_qnli_96 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-hecone_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-16-hecone_pipeline_he.md new file mode 100644 index 00000000000000..ad59ac51cd1ce4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-hecone_pipeline_he.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hebrew hecone_pipeline pipeline RoBertaForTokenClassification from HeTree +author: John Snow Labs +name: hecone_pipeline +date: 2024-09-16 +tags: [he, open_source, pipeline, onnx] +task: Named Entity Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hecone_pipeline` is a Hebrew model originally trained by HeTree. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hecone_pipeline_he_5.5.0_3.0_1726452841616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hecone_pipeline_he_5.5.0_3.0_1726452841616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hecone_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hecone_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hecone_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|466.0 MB| + +## References + +https://huggingface.co/HeTree/HeConE + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_strict_2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_strict_2023_pipeline_en.md new file mode 100644 index 00000000000000..d022eda06d359b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_strict_2023_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_strict_2023_pipeline pipeline RoBertaEmbeddings from babylm +author: John Snow Labs +name: roberta_base_strict_2023_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_strict_2023_pipeline` is a English model originally trained by babylm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_strict_2023_pipeline_en_5.5.0_3.0_1726513672725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_strict_2023_pipeline_en_5.5.0_3.0_1726513672725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_strict_2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_strict_2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_strict_2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.5 MB| + +## References + +https://huggingface.co/babylm/roberta-base-strict-2023 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_model_babylm_challenge_strict_small_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_model_babylm_challenge_strict_small_en.md new file mode 100644 index 00000000000000..9175f8fc7fb685 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_model_babylm_challenge_strict_small_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_model_babylm_challenge_strict_small RoBertaEmbeddings from TheBguy87 +author: John Snow Labs +name: roberta_model_babylm_challenge_strict_small +date: 2024-09-16 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_model_babylm_challenge_strict_small` is a English model originally trained by TheBguy87. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_model_babylm_challenge_strict_small_en_5.5.0_3.0_1726513849160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_model_babylm_challenge_strict_small_en_5.5.0_3.0_1726513849160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_model_babylm_challenge_strict_small","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_model_babylm_challenge_strict_small","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_model_babylm_challenge_strict_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/TheBguy87/roBERTa-Model-BabyLM-Challenge-Strict-Small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sent_clinical_pubmed_bert_base_512_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-sent_clinical_pubmed_bert_base_512_pipeline_en.md new file mode 100644 index 00000000000000..aaa22ac48adc56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sent_clinical_pubmed_bert_base_512_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_clinical_pubmed_bert_base_512_pipeline pipeline BertSentenceEmbeddings from Tsubasaz +author: John Snow Labs +name: sent_clinical_pubmed_bert_base_512_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_clinical_pubmed_bert_base_512_pipeline` is a English model originally trained by Tsubasaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_512_pipeline_en_5.5.0_3.0_1726501123685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_512_pipeline_en_5.5.0_3.0_1726501123685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_clinical_pubmed_bert_base_512_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_clinical_pubmed_bert_base_512_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_clinical_pubmed_bert_base_512_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.6 MB| + +## References + +https://huggingface.co/Tsubasaz/clinical-pubmed-bert-base-512 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-sentiment_analysis_on_covid_tweets_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-sentiment_analysis_on_covid_tweets_pipeline_en.md new file mode 100644 index 00000000000000..df3e5ff0e1772c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-sentiment_analysis_on_covid_tweets_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_on_covid_tweets_pipeline pipeline RoBertaForSequenceClassification from AmpomahChief +author: John Snow Labs +name: sentiment_analysis_on_covid_tweets_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_on_covid_tweets_pipeline` is a English model originally trained by AmpomahChief. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_on_covid_tweets_pipeline_en_5.5.0_3.0_1726456086917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_on_covid_tweets_pipeline_en_5.5.0_3.0_1726456086917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_on_covid_tweets_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_on_covid_tweets_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_on_covid_tweets_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/AmpomahChief/sentiment_analysis_on_covid_tweets + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-translator_en.md b/docs/_posts/ahmedlone127/2024-09-16-translator_en.md new file mode 100644 index 00000000000000..5bb87dd4d893cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-translator_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English translator MarianTransformer from motmans-pj +author: John Snow Labs +name: translator +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translator` is a English model originally trained by motmans-pj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translator_en_5.5.0_3.0_1726491520646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translator_en_5.5.0_3.0_1726491520646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("translator","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("translator","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translator| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|548.9 MB| + +## References + +https://huggingface.co/motmans-pj/translator \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_small_seiching_zh.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_seiching_zh.md new file mode 100644 index 00000000000000..b60736b3dfe19d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_small_seiching_zh.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Chinese whisper_small_seiching WhisperForCTC from seiching +author: John Snow Labs +name: whisper_small_seiching +date: 2024-09-16 +tags: [zh, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_seiching` is a Chinese model originally trained by seiching. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_seiching_zh_5.5.0_3.0_1726478635840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_seiching_zh_5.5.0_3.0_1726478635840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_seiching","zh") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_seiching", "zh") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_seiching| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|zh| +|Size:|1.7 GB| + +## References + +https://huggingface.co/seiching/whisper-small-seiching \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_polyai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_polyai_pipeline_en.md new file mode 100644 index 00000000000000..111417edbaea79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_polyai_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_polyai_pipeline pipeline WhisperForCTC from giocs2017 +author: John Snow Labs +name: whisper_tiny_polyai_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_polyai_pipeline` is a English model originally trained by giocs2017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_polyai_pipeline_en_5.5.0_3.0_1726485521296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_polyai_pipeline_en_5.5.0_3.0_1726485521296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_polyai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_polyai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_polyai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/giocs2017/whisper-tiny-polyai + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_fyl1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_fyl1_pipeline_en.md new file mode 100644 index 00000000000000..a8778ce80cddf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_fyl1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_fyl1_pipeline pipeline XlmRoBertaForTokenClassification from fyl1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_fyl1_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_fyl1_pipeline` is a English model originally trained by fyl1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_fyl1_pipeline_en_5.5.0_3.0_1726496262153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_fyl1_pipeline_en_5.5.0_3.0_1726496262153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_fyl1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_fyl1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_fyl1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/fyl1/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-yt_special_batch8_tiny_en.md b/docs/_posts/ahmedlone127/2024-09-16-yt_special_batch8_tiny_en.md new file mode 100644 index 00000000000000..73d3f71d3e3138 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-yt_special_batch8_tiny_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English yt_special_batch8_tiny WhisperForCTC from TheRains +author: John Snow Labs +name: yt_special_batch8_tiny +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yt_special_batch8_tiny` is a English model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yt_special_batch8_tiny_en_5.5.0_3.0_1726483691434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yt_special_batch8_tiny_en_5.5.0_3.0_1726483691434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("yt_special_batch8_tiny","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("yt_special_batch8_tiny", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yt_special_batch8_tiny| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.8 MB| + +## References + +https://huggingface.co/TheRains/yt-special-batch8-tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline_en.md new file mode 100644 index 00000000000000..d2ec89733a66e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline_en_5.5.0_3.0_1726545199415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline_en_5.5.0_3.0_1726545199415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_220240904182329_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.220240904182329 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000_en.md new file mode 100644 index 00000000000000..b7d6543d5cd6cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000_en_5.5.0_3.0_1726567649974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000_en_5.5.0_3.0_1726567649974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_10_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-10.0-b-32-lr-8e-07-dp-0.5-ss-0-st-False-fh-False-hs-1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_qnli_modeltc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_qnli_modeltc_pipeline_en.md new file mode 100644 index 00000000000000..950b3dee50c23e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_qnli_modeltc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_qnli_modeltc_pipeline pipeline BertForSequenceClassification from ModelTC +author: John Snow Labs +name: bert_base_uncased_qnli_modeltc_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qnli_modeltc_pipeline` is a English model originally trained by ModelTC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qnli_modeltc_pipeline_en_5.5.0_3.0_1726604380481.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qnli_modeltc_pipeline_en_5.5.0_3.0_1726604380481.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_qnli_modeltc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_qnli_modeltc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qnli_modeltc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ModelTC/bert-base-uncased-qnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_katxtong_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_katxtong_en.md new file mode 100644 index 00000000000000..6965434c838498 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_katxtong_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_model_katxtong DistilBertForQuestionAnswering from katxtong +author: John Snow Labs +name: burmese_awesome_model_katxtong +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_katxtong` is a English model originally trained by katxtong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_katxtong_en_5.5.0_3.0_1726575010959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_katxtong_en_5.5.0_3.0_1726575010959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_model_katxtong","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_model_katxtong", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_katxtong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/katxtong/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_meziane_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_meziane_en.md new file mode 100644 index 00000000000000..5d796225e71b15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_qa_model_meziane_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_meziane DistilBertForQuestionAnswering from Meziane +author: John Snow Labs +name: burmese_awesome_qa_model_meziane +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_meziane` is a English model originally trained by Meziane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_meziane_en_5.5.0_3.0_1726600018002.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_meziane_en_5.5.0_3.0_1726600018002.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_meziane","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_meziane", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_meziane| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Meziane/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-clasificador_onestop_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-clasificador_onestop_english_pipeline_en.md new file mode 100644 index 00000000000000..f3c68cf97b4e66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-clasificador_onestop_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clasificador_onestop_english_pipeline pipeline AlbertForSequenceClassification from algomet +author: John Snow Labs +name: clasificador_onestop_english_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clasificador_onestop_english_pipeline` is a English model originally trained by algomet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clasificador_onestop_english_pipeline_en_5.5.0_3.0_1726601102539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clasificador_onestop_english_pipeline_en_5.5.0_3.0_1726601102539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clasificador_onestop_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clasificador_onestop_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clasificador_onestop_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/algomet/clasificador-onestop-english + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-classifier__generated_data_only__meansdetection_albert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-classifier__generated_data_only__meansdetection_albert_pipeline_en.md new file mode 100644 index 00000000000000..d4328c307072aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-classifier__generated_data_only__meansdetection_albert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classifier__generated_data_only__meansdetection_albert_pipeline pipeline AlbertForSequenceClassification from yevhenkost +author: John Snow Labs +name: classifier__generated_data_only__meansdetection_albert_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier__generated_data_only__meansdetection_albert_pipeline` is a English model originally trained by yevhenkost. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier__generated_data_only__meansdetection_albert_pipeline_en_5.5.0_3.0_1726600527951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier__generated_data_only__meansdetection_albert_pipeline_en_5.5.0_3.0_1726600527951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classifier__generated_data_only__meansdetection_albert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classifier__generated_data_only__meansdetection_albert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier__generated_data_only__meansdetection_albert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/yevhenkost/classifier__generated_data_only__meansdetection_albert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-covid_augment_tweet_roberta_large_e4_en.md b/docs/_posts/ahmedlone127/2024-09-17-covid_augment_tweet_roberta_large_e4_en.md new file mode 100644 index 00000000000000..0c82830b6cb289 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-covid_augment_tweet_roberta_large_e4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English covid_augment_tweet_roberta_large_e4 RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: covid_augment_tweet_roberta_large_e4 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_augment_tweet_roberta_large_e4` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_augment_tweet_roberta_large_e4_en_5.5.0_3.0_1726591451910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_augment_tweet_roberta_large_e4_en_5.5.0_3.0_1726591451910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("covid_augment_tweet_roberta_large_e4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("covid_augment_tweet_roberta_large_e4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_augment_tweet_roberta_large_e4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JerryYanJiang/covid-augment-tweet-roberta-large-e4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_seddiktrk_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_seddiktrk_en.md new file mode 100644 index 00000000000000..a28f31c0a65b08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_seddiktrk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_seddiktrk DistilBertForSequenceClassification from seddiktrk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_seddiktrk +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_seddiktrk` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_seddiktrk_en_5.5.0_3.0_1726584476859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_seddiktrk_en_5.5.0_3.0_1726584476859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_seddiktrk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_seddiktrk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_seddiktrk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/seddiktrk/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_test1_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_test1_en.md new file mode 100644 index 00000000000000..732badf525d57a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_test1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_test1 DistilBertForQuestionAnswering from allistair99 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_test1 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_test1` is a English model originally trained by allistair99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_test1_en_5.5.0_3.0_1726586457773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_test1_en_5.5.0_3.0_1726586457773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_test1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_test1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_test1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/allistair99/distilbert-base-uncased-finetuned-squad-test1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_transcripts_calls_avitalby_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_transcripts_calls_avitalby_en.md new file mode 100644 index 00000000000000..1671bcd269e41d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_transcripts_calls_avitalby_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_transcripts_calls_avitalby DistilBertForQuestionAnswering from AvitalBY +author: John Snow Labs +name: distilbert_base_uncased_finetuned_transcripts_calls_avitalby +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_transcripts_calls_avitalby` is a English model originally trained by AvitalBY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_transcripts_calls_avitalby_en_5.5.0_3.0_1726599943235.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_transcripts_calls_avitalby_en_5.5.0_3.0_1726599943235.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_transcripts_calls_avitalby","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_transcripts_calls_avitalby", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_transcripts_calls_avitalby| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/AvitalBY/distilbert-base-uncased-finetuned-transcripts-calls \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilroberta_base_ft_tifu_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilroberta_base_ft_tifu_en.md new file mode 100644 index 00000000000000..edb0660fa6cc64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilroberta_base_ft_tifu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_tifu RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_tifu +date: 2024-09-17 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_tifu` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_tifu_en_5.5.0_3.0_1726602931400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_tifu_en_5.5.0_3.0_1726602931400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_tifu","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_tifu","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_tifu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.4 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-tifu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-feedback_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-feedback_classification_pipeline_en.md new file mode 100644 index 00000000000000..d67bf7bbe811f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-feedback_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English feedback_classification_pipeline pipeline BertForSequenceClassification from Yousefmd +author: John Snow Labs +name: feedback_classification_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`feedback_classification_pipeline` is a English model originally trained by Yousefmd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/feedback_classification_pipeline_en_5.5.0_3.0_1726605205107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/feedback_classification_pipeline_en_5.5.0_3.0_1726605205107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("feedback_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("feedback_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|feedback_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.4 GB| + +## References + +https://huggingface.co/Yousefmd/feedback-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-igbo_model_pipeline_ig.md b/docs/_posts/ahmedlone127/2024-09-17-igbo_model_pipeline_ig.md new file mode 100644 index 00000000000000..e972a3ced3e24d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-igbo_model_pipeline_ig.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Igbo igbo_model_pipeline pipeline XlmRoBertaForTokenClassification from ignatius +author: John Snow Labs +name: igbo_model_pipeline +date: 2024-09-17 +tags: [ig, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ig +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`igbo_model_pipeline` is a Igbo model originally trained by ignatius. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/igbo_model_pipeline_ig_5.5.0_3.0_1726577158736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/igbo_model_pipeline_ig_5.5.0_3.0_1726577158736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("igbo_model_pipeline", lang = "ig") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("igbo_model_pipeline", lang = "ig") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|igbo_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ig| +|Size:|443.2 MB| + +## References + +https://huggingface.co/ignatius/igbo_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-interlingua_multilingual_original_script_roberta_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-17-interlingua_multilingual_original_script_roberta_pipeline_xx.md new file mode 100644 index 00000000000000..c3ef1cd575056c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-interlingua_multilingual_original_script_roberta_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual interlingua_multilingual_original_script_roberta_pipeline pipeline RoBertaEmbeddings from ibm +author: John Snow Labs +name: interlingua_multilingual_original_script_roberta_pipeline +date: 2024-09-17 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`interlingua_multilingual_original_script_roberta_pipeline` is a Multilingual model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/interlingua_multilingual_original_script_roberta_pipeline_xx_5.5.0_3.0_1726595623586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/interlingua_multilingual_original_script_roberta_pipeline_xx_5.5.0_3.0_1726595623586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("interlingua_multilingual_original_script_roberta_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("interlingua_multilingual_original_script_roberta_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|interlingua_multilingual_original_script_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|638.6 MB| + +## References + +https://huggingface.co/ibm/ia-multilingual-original-script-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-interval_model_en.md b/docs/_posts/ahmedlone127/2024-09-17-interval_model_en.md new file mode 100644 index 00000000000000..6466eeb16c5a06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-interval_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English interval_model DistilBertForSequenceClassification from coggpt +author: John Snow Labs +name: interval_model +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`interval_model` is a English model originally trained by coggpt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/interval_model_en_5.5.0_3.0_1726593973029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/interval_model_en_5.5.0_3.0_1726593973029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("interval_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("interval_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|interval_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/coggpt/interval_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_wmt_finetuned_enfr_wu_2022_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_wmt_finetuned_enfr_wu_2022_en.md new file mode 100644 index 00000000000000..ffeba24647b95c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_wmt_finetuned_enfr_wu_2022_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_wmt_finetuned_enfr_wu_2022 MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_wmt_finetuned_enfr_wu_2022 +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_wmt_finetuned_enfr_wu_2022` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_wmt_finetuned_enfr_wu_2022_en_5.5.0_3.0_1726532831064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_wmt_finetuned_enfr_wu_2022_en_5.5.0_3.0_1726532831064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_wmt_finetuned_enfr_wu_2022","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_wmt_finetuned_enfr_wu_2022","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_wmt_finetuned_enfr_wu_2022| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/ethansimrm/opus_wmt_finetuned_enfr_wu_2022 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-question_answearing_7_distillbert_en.md b/docs/_posts/ahmedlone127/2024-09-17-question_answearing_7_distillbert_en.md new file mode 100644 index 00000000000000..06c9dcd31ff866 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-question_answearing_7_distillbert_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English question_answearing_7_distillbert DistilBertForQuestionAnswering from Meziane +author: John Snow Labs +name: question_answearing_7_distillbert +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_answearing_7_distillbert` is a English model originally trained by Meziane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_answearing_7_distillbert_en_5.5.0_3.0_1726586688577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_answearing_7_distillbert_en_5.5.0_3.0_1726586688577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("question_answearing_7_distillbert","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("question_answearing_7_distillbert", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_answearing_7_distillbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Meziane/question_answearing_7_distillbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-robbert_2023_dutch_large_ft_lcn_actua_en.md b/docs/_posts/ahmedlone127/2024-09-17-robbert_2023_dutch_large_ft_lcn_actua_en.md new file mode 100644 index 00000000000000..644ea59da4bb0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-robbert_2023_dutch_large_ft_lcn_actua_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robbert_2023_dutch_large_ft_lcn_actua RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: robbert_2023_dutch_large_ft_lcn_actua +date: 2024-09-17 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_2023_dutch_large_ft_lcn_actua` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_large_ft_lcn_actua_en_5.5.0_3.0_1726603011194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_large_ft_lcn_actua_en_5.5.0_3.0_1726603011194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robbert_2023_dutch_large_ft_lcn_actua","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robbert_2023_dutch_large_ft_lcn_actua","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_2023_dutch_large_ft_lcn_actua| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/btamm12/robbert-2023-dutch-large-ft-lcn-actua \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-sent_dictabert_tiny_he.md b/docs/_posts/ahmedlone127/2024-09-17-sent_dictabert_tiny_he.md new file mode 100644 index 00000000000000..704563d94576b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-sent_dictabert_tiny_he.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hebrew sent_dictabert_tiny BertSentenceEmbeddings from dicta-il +author: John Snow Labs +name: sent_dictabert_tiny +date: 2024-09-17 +tags: [he, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_dictabert_tiny` is a Hebrew model originally trained by dicta-il. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_dictabert_tiny_he_5.5.0_3.0_1726587190963.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_dictabert_tiny_he_5.5.0_3.0_1726587190963.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_dictabert_tiny","he") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_dictabert_tiny","he") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_dictabert_tiny| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|he| +|Size:|108.4 MB| + +## References + +https://huggingface.co/dicta-il/dictabert-tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-telugu_sentiment_analysis_pipeline_te.md b/docs/_posts/ahmedlone127/2024-09-17-telugu_sentiment_analysis_pipeline_te.md new file mode 100644 index 00000000000000..3d54eec4d9e1a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-telugu_sentiment_analysis_pipeline_te.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Telugu telugu_sentiment_analysis_pipeline pipeline AlbertForSequenceClassification from aashish-249 +author: John Snow Labs +name: telugu_sentiment_analysis_pipeline +date: 2024-09-17 +tags: [te, open_source, pipeline, onnx] +task: Text Classification +language: te +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`telugu_sentiment_analysis_pipeline` is a Telugu model originally trained by aashish-249. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/telugu_sentiment_analysis_pipeline_te_5.5.0_3.0_1726605971118.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/telugu_sentiment_analysis_pipeline_te_5.5.0_3.0_1726605971118.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("telugu_sentiment_analysis_pipeline", lang = "te") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("telugu_sentiment_analysis_pipeline", lang = "te") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|telugu_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|te| +|Size:|125.9 MB| + +## References + +https://huggingface.co/aashish-249/Telugu-sentiment_analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-telugu_sentiment_analysis_te.md b/docs/_posts/ahmedlone127/2024-09-17-telugu_sentiment_analysis_te.md new file mode 100644 index 00000000000000..298bd19a81ceaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-telugu_sentiment_analysis_te.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Telugu telugu_sentiment_analysis AlbertForSequenceClassification from aashish-249 +author: John Snow Labs +name: telugu_sentiment_analysis +date: 2024-09-17 +tags: [te, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: te +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`telugu_sentiment_analysis` is a Telugu model originally trained by aashish-249. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/telugu_sentiment_analysis_te_5.5.0_3.0_1726605964902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/telugu_sentiment_analysis_te_5.5.0_3.0_1726605964902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("telugu_sentiment_analysis","te") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("telugu_sentiment_analysis", "te") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|telugu_sentiment_analysis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|te| +|Size:|125.9 MB| + +## References + +https://huggingface.co/aashish-249/Telugu-sentiment_analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-test_squad_karin25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-test_squad_karin25_pipeline_en.md new file mode 100644 index 00000000000000..9ba4aef45530fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-test_squad_karin25_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test_squad_karin25_pipeline pipeline DistilBertForQuestionAnswering from karin25 +author: John Snow Labs +name: test_squad_karin25_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_squad_karin25_pipeline` is a English model originally trained by karin25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_squad_karin25_pipeline_en_5.5.0_3.0_1726599795187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_squad_karin25_pipeline_en_5.5.0_3.0_1726599795187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_squad_karin25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_squad_karin25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_squad_karin25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/karin25/test-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_katti_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_katti_en.md new file mode 100644 index 00000000000000..6c5f00d3d10aa7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_katti_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_katti WhisperForCTC from shreyasdesaisuperU +author: John Snow Labs +name: whisper_small_katti +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_katti` is a English model originally trained by shreyasdesaisuperU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_katti_en_5.5.0_3.0_1726541232967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_katti_en_5.5.0_3.0_1726541232967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_katti","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_katti", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_katti| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/shreyasdesaisuperU/whisper-small-katti \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_swedish_v4_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_swedish_v4_pipeline_sv.md new file mode 100644 index 00000000000000..4196c28cb0c31b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_swedish_v4_pipeline_sv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Swedish whisper_small_swedish_v4_pipeline pipeline WhisperForCTC from AdrianHR +author: John Snow Labs +name: whisper_small_swedish_v4_pipeline +date: 2024-09-17 +tags: [sv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swedish_v4_pipeline` is a Swedish model originally trained by AdrianHR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_v4_pipeline_sv_5.5.0_3.0_1726547719212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_v4_pipeline_sv_5.5.0_3.0_1726547719212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_swedish_v4_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_swedish_v4_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swedish_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/AdrianHR/whisper-small-sv-v4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_lbr47_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_lbr47_en.md new file mode 100644 index 00000000000000..f044e1db69d11a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_lbr47_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_lbr47 WhisperForCTC from LBR47 +author: John Snow Labs +name: whisper_tiny_lbr47 +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_lbr47` is a English model originally trained by LBR47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_lbr47_en_5.5.0_3.0_1726548104245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_lbr47_en_5.5.0_3.0_1726548104245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_lbr47","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_lbr47", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_lbr47| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|242.8 MB| + +## References + +https://huggingface.co/LBR47/whisper-tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_lbr47_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_lbr47_pipeline_en.md new file mode 100644 index 00000000000000..11b5e4a8f3d6c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_lbr47_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_lbr47_pipeline pipeline WhisperForCTC from LBR47 +author: John Snow Labs +name: whisper_tiny_lbr47_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_lbr47_pipeline` is a English model originally trained by LBR47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_lbr47_pipeline_en_5.5.0_3.0_1726548174832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_lbr47_pipeline_en_5.5.0_3.0_1726548174832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_lbr47_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_lbr47_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_lbr47_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|242.9 MB| + +## References + +https://huggingface.co/LBR47/whisper-tiny + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_spanish_herme_es.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_spanish_herme_es.md new file mode 100644 index 00000000000000..a417d3fea1baa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_spanish_herme_es.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Castilian, Spanish whisper_tiny_spanish_herme WhisperForCTC from herme +author: John Snow Labs +name: whisper_tiny_spanish_herme +date: 2024-09-17 +tags: [es, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_spanish_herme` is a Castilian, Spanish model originally trained by herme. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_herme_es_5.5.0_3.0_1726550881250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_herme_es_5.5.0_3.0_1726550881250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_spanish_herme","es") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_spanish_herme", "es") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_spanish_herme| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|es| +|Size:|390.8 MB| + +## References + +https://huggingface.co/herme/whisper-tiny-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline_en.md new file mode 100644 index 00000000000000..c96dfc1f80dc81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline pipeline XlmRoBertaForTokenClassification from u00890358 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline` is a English model originally trained by u00890358. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline_en_5.5.0_3.0_1726611641185.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline_en_5.5.0_3.0_1726611641185.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_u00890358_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/u00890358/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-2404v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-2404v2_pipeline_en.md new file mode 100644 index 00000000000000..c29fd4e8b18680 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-2404v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2404v2_pipeline pipeline RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: 2404v2_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2404v2_pipeline` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2404v2_pipeline_en_5.5.0_3.0_1726650568817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2404v2_pipeline_en_5.5.0_3.0_1726650568817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2404v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2404v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2404v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|450.6 MB| + +## References + +https://huggingface.co/adriansanz/2404v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-albert_base_jackh1995_en.md b/docs/_posts/ahmedlone127/2024-09-18-albert_base_jackh1995_en.md new file mode 100644 index 00000000000000..6cf33dce2fee51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-albert_base_jackh1995_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English albert_base_jackh1995 BertForQuestionAnswering from jackh1995 +author: John Snow Labs +name: albert_base_jackh1995 +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_jackh1995` is a English model originally trained by jackh1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_jackh1995_en_5.5.0_3.0_1726658607463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_jackh1995_en_5.5.0_3.0_1726658607463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("albert_base_jackh1995","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("albert_base_jackh1995", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_jackh1995| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|380.8 MB| + +## References + +https://huggingface.co/jackh1995/albert-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_vllm_gemma2b_deterministic_7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_vllm_gemma2b_deterministic_7_pipeline_en.md new file mode 100644 index 00000000000000..8f878e590994ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_vllm_gemma2b_deterministic_7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_vllm_gemma2b_deterministic_7_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_deterministic_7_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_deterministic_7_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_deterministic_7_pipeline_en_5.5.0_3.0_1726695114071.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_deterministic_7_pipeline_en_5.5.0_3.0_1726695114071.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_vllm_gemma2b_deterministic_7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_vllm_gemma2b_deterministic_7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_deterministic_7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-deterministic_7 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_emotion_distilbert_zijay_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_emotion_distilbert_zijay_en.md new file mode 100644 index 00000000000000..1b8e204f8cad87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_emotion_distilbert_zijay_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_finetuned_emotion_distilbert_zijay DistilBertForSequenceClassification from zijay +author: John Snow Labs +name: burmese_finetuned_emotion_distilbert_zijay +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_finetuned_emotion_distilbert_zijay` is a English model originally trained by zijay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_finetuned_emotion_distilbert_zijay_en_5.5.0_3.0_1726696233540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_finetuned_emotion_distilbert_zijay_en_5.5.0_3.0_1726696233540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_finetuned_emotion_distilbert_zijay","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_finetuned_emotion_distilbert_zijay", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_finetuned_emotion_distilbert_zijay| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/zijay/my-finetuned-emotion-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline_en.md new file mode 100644 index 00000000000000..068cdea9055be9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline pipeline DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline_en_5.5.0_3.0_1726669521039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline_en_5.5.0_3.0_1726669521039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|category_1_delivery_cancellation_distilbert_base_uncased_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/category-1-delivery-cancellation-distilbert-base-uncased-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ceva_en.md b/docs/_posts/ahmedlone127/2024-09-18-ceva_en.md new file mode 100644 index 00000000000000..9b896d8ed546c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ceva_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ceva RoBertaForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: ceva +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ceva` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ceva_en_5.5.0_3.0_1726650180046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ceva_en_5.5.0_3.0_1726650180046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ceva","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ceva", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ceva| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/dianamihalache27/ceva \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr15_seed3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr15_seed3_pipeline_en.md new file mode 100644 index 00000000000000..d07522bfe6f3e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr15_seed3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr15_seed3_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr15_seed3_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr15_seed3_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr15_seed3_pipeline_en_5.5.0_3.0_1726649700149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr15_seed3_pipeline_en_5.5.0_3.0_1726649700149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr15_seed3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr15_seed3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr15_seed3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr15-seed3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-did_the_doctor_give_you_his_name_bert_first128_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-did_the_doctor_give_you_his_name_bert_first128_pipeline_en.md new file mode 100644 index 00000000000000..12a4334bffa149 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-did_the_doctor_give_you_his_name_bert_first128_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English did_the_doctor_give_you_his_name_bert_first128_pipeline pipeline BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: did_the_doctor_give_you_his_name_bert_first128_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`did_the_doctor_give_you_his_name_bert_first128_pipeline` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/did_the_doctor_give_you_his_name_bert_first128_pipeline_en_5.5.0_3.0_1726624470534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/did_the_doctor_give_you_his_name_bert_first128_pipeline_en_5.5.0_3.0_1726624470534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("did_the_doctor_give_you_his_name_bert_first128_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("did_the_doctor_give_you_his_name_bert_first128_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|did_the_doctor_give_you_his_name_bert_first128_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/etadevosyan/did_the_doctor_give_you_his_name_bert_First128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline_en.md new file mode 100644 index 00000000000000..bfd9a82bfa2662 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline pipeline DistilBertForSequenceClassification from Hashemghanem +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline` is a English model originally trained by Hashemghanem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline_en_5.5.0_3.0_1726677355734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline_en_5.5.0_3.0_1726677355734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_hashemghanem_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Hashemghanem/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_edosevering_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_edosevering_en.md new file mode 100644 index 00000000000000..1e4af32a5cf347 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_edosevering_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_edosevering DistilBertForSequenceClassification from edoSevering +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_edosevering +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_edosevering` is a English model originally trained by edoSevering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edosevering_en_5.5.0_3.0_1726695058212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edosevering_en_5.5.0_3.0_1726695058212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_edosevering","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_edosevering", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_edosevering| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/edoSevering/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline_en.md new file mode 100644 index 00000000000000..e7f92e042340c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline pipeline DistilBertForSequenceClassification from FaceHugger69420 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline` is a English model originally trained by FaceHugger69420. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline_en_5.5.0_3.0_1726695316515.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline_en_5.5.0_3.0_1726695316515.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_facehugger69420_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/FaceHugger69420/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_mu7annad_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_mu7annad_en.md new file mode 100644 index 00000000000000..f11545f673cf86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_mu7annad_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_mu7annad DistilBertForSequenceClassification from Mu7annad +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_mu7annad +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_mu7annad` is a English model originally trained by Mu7annad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mu7annad_en_5.5.0_3.0_1726682028738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mu7annad_en_5.5.0_3.0_1726682028738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mu7annad","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mu7annad", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_mu7annad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mu7annad/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_teraz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_teraz_pipeline_en.md new file mode 100644 index 00000000000000..03acdf1ba386b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_teraz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_teraz_pipeline pipeline DistilBertForSequenceClassification from Teraz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_teraz_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_teraz_pipeline` is a English model originally trained by Teraz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_teraz_pipeline_en_5.5.0_3.0_1726630604677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_teraz_pipeline_en_5.5.0_3.0_1726630604677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_teraz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_teraz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_teraz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Teraz/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mental_social_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mental_social_en.md new file mode 100644 index 00000000000000..28bd7c72f3fe71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mental_social_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_mental_social DistilBertForSequenceClassification from PriyankaDS +author: John Snow Labs +name: distilbert_base_uncased_finetuned_mental_social +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_mental_social` is a English model originally trained by PriyankaDS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mental_social_en_5.5.0_3.0_1726625793219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mental_social_en_5.5.0_3.0_1726625793219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_mental_social","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_mental_social", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_mental_social| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PriyankaDS/distilbert-base-uncased-finetuned-mental_social \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mental_social_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mental_social_pipeline_en.md new file mode 100644 index 00000000000000..2e1a7130bd13c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_mental_social_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_mental_social_pipeline pipeline DistilBertForSequenceClassification from PriyankaDS +author: John Snow Labs +name: distilbert_base_uncased_finetuned_mental_social_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_mental_social_pipeline` is a English model originally trained by PriyankaDS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mental_social_pipeline_en_5.5.0_3.0_1726625805601.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mental_social_pipeline_en_5.5.0_3.0_1726625805601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_mental_social_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_mental_social_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_mental_social_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PriyankaDS/distilbert-base-uncased-finetuned-mental_social + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline_en.md new file mode 100644 index 00000000000000..c46bb559ae9899 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline pipeline DistilBertForQuestionAnswering from alex-atelo +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline` is a English model originally trained by alex-atelo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline_en_5.5.0_3.0_1726644033787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline_en_5.5.0_3.0_1726644033787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_alex_atelo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/alex-atelo/distilbert-base-uncased-finetuned-squad-d5716d28 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_songhyundong_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_songhyundong_en.md new file mode 100644 index 00000000000000..452423f8029063 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_songhyundong_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_songhyundong DistilBertForQuestionAnswering from songhyundong +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_songhyundong +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_songhyundong` is a English model originally trained by songhyundong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_songhyundong_en_5.5.0_3.0_1726644170135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_songhyundong_en_5.5.0_3.0_1726644170135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_songhyundong","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_songhyundong", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_songhyundong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/songhyundong/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md new file mode 100644 index 00000000000000..049b89c8e6db7b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1726630805093.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1726630805093.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st8sd_ut72ut1largepfxnf_simsp300_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st8sd_ut72ut1largePfxNf_simsp300_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p10_pipeline_en.md new file mode 100644 index 00000000000000..80dd67b53bd55c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p10_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_lora_merged_p10_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_lora_merged_p10_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_lora_merged_p10_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_merged_p10_pipeline_en_5.5.0_3.0_1726640985564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_merged_p10_pipeline_en_5.5.0_3.0_1726640985564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_lora_merged_p10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_lora_merged_p10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_lora_merged_p10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|237.6 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-lora-merged-p10 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_p85_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_p85_pipeline_en.md new file mode 100644 index 00000000000000..3cc1f415cbd89c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_p85_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p85_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p85_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p85_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p85_pipeline_en_5.5.0_3.0_1726641165642.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p85_pipeline_en_5.5.0_3.0_1726641165642.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_p85_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_p85_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p85_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|130.7 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p85 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline_en.md new file mode 100644 index 00000000000000..7302ca13c10efe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline_en_5.5.0_3.0_1726696349747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline_en_5.5.0_3.0_1726696349747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut1_ad7_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut1_ad7_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotion_neelaa_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotion_neelaa_en.md new file mode 100644 index 00000000000000..3fd9ca3b63ce01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotion_neelaa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_neelaa DistilBertForSequenceClassification from neelaa +author: John Snow Labs +name: distilbert_emotion_neelaa +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_neelaa` is a English model originally trained by neelaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_neelaa_en_5.5.0_3.0_1726625667056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_neelaa_en_5.5.0_3.0_1726625667056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_neelaa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_neelaa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_neelaa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/neelaa/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotions_fellowship_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotions_fellowship_en.md new file mode 100644 index 00000000000000..b8f8d119d8df24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_emotions_fellowship_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotions_fellowship DistilBertForSequenceClassification from Valwolfor +author: John Snow Labs +name: distilbert_emotions_fellowship +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotions_fellowship` is a English model originally trained by Valwolfor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotions_fellowship_en_5.5.0_3.0_1726670155995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotions_fellowship_en_5.5.0_3.0_1726670155995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotions_fellowship","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotions_fellowship", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotions_fellowship| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Valwolfor/distilbert_emotions_fellowship \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_huiang_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_huiang_en.md new file mode 100644 index 00000000000000..48c539aea054d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_huiang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_huiang DistilBertForSequenceClassification from huiang +author: John Snow Labs +name: distilbert_imdb_huiang +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_huiang` is a English model originally trained by huiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_huiang_en_5.5.0_3.0_1726630972201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_huiang_en_5.5.0_3.0_1726630972201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_huiang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_huiang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_huiang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/huiang/distilbert-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_toxicity_classification_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_toxicity_classification_en.md new file mode 100644 index 00000000000000..3c1ba7350c3add --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_toxicity_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_toxicity_classification DistilBertForSequenceClassification from newsmediabias +author: John Snow Labs +name: distilbert_toxicity_classification +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_toxicity_classification` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_toxicity_classification_en_5.5.0_3.0_1726625254125.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_toxicity_classification_en_5.5.0_3.0_1726625254125.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_toxicity_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_toxicity_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_toxicity_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/newsmediabias/DistilBert_Toxicity_Classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_turkish_turkish_news_tr.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_turkish_turkish_news_tr.md new file mode 100644 index 00000000000000..72ac1b7de7193a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_turkish_turkish_news_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish distilbert_turkish_turkish_news DistilBertForSequenceClassification from anilguven +author: John Snow Labs +name: distilbert_turkish_turkish_news +date: 2024-09-18 +tags: [tr, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turkish_turkish_news` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_news_tr_5.5.0_3.0_1726676870194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_news_tr_5.5.0_3.0_1726676870194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_turkish_turkish_news","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_turkish_turkish_news", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turkish_turkish_news| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tr| +|Size:|254.1 MB| + +## References + +https://huggingface.co/anilguven/distilbert_tr_turkish_news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbertmultilang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbertmultilang_pipeline_en.md new file mode 100644 index 00000000000000..a9551589663f46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbertmultilang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbertmultilang_pipeline pipeline DistilBertForSequenceClassification from baihaqy +author: John Snow Labs +name: distilbertmultilang_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbertmultilang_pipeline` is a English model originally trained by baihaqy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbertmultilang_pipeline_en_5.5.0_3.0_1726696153251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbertmultilang_pipeline_en_5.5.0_3.0_1726696153251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbertmultilang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbertmultilang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbertmultilang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/baihaqy/distilbertmultilang + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_ft_conservatives_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_ft_conservatives_pipeline_en.md new file mode 100644 index 00000000000000..f443ffe4e0987a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_ft_conservatives_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_conservatives_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_conservatives_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_conservatives_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_conservatives_pipeline_en_5.5.0_3.0_1726677977584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_conservatives_pipeline_en_5.5.0_3.0_1726677977584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_conservatives_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_conservatives_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_conservatives_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-conservatives + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-emoji_emoji_random2_seed1_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-emoji_emoji_random2_seed1_bernice_pipeline_en.md new file mode 100644 index 00000000000000..c15e08490e60ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-emoji_emoji_random2_seed1_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random2_seed1_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random2_seed1_bernice_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random2_seed1_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random2_seed1_bernice_pipeline_en_5.5.0_3.0_1726697662733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random2_seed1_bernice_pipeline_en_5.5.0_3.0_1726697662733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random2_seed1_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random2_seed1_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random2_seed1_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|825.2 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random2_seed1-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_analysis_asif1997_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_analysis_asif1997_en.md new file mode 100644 index 00000000000000..c194d448778ef1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_analysis_asif1997_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_asif1997 DistilBertForSequenceClassification from Asif1997 +author: John Snow Labs +name: finetuning_sentiment_analysis_asif1997 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_asif1997` is a English model originally trained by Asif1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_asif1997_en_5.5.0_3.0_1726694954416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_asif1997_en_5.5.0_3.0_1726694954416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_analysis_asif1997","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_analysis_asif1997", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_asif1997| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Asif1997/finetuning-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline_en.md new file mode 100644 index 00000000000000..0370e7a95a73ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726641434740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726641434740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random0_seed2_twitter_roberta_base_2019_90m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random0_seed2-twitter-roberta-base-2019-90m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-legal_undedup_base_v1_5__checkpoint_last_en.md b/docs/_posts/ahmedlone127/2024-09-18-legal_undedup_base_v1_5__checkpoint_last_en.md new file mode 100644 index 00000000000000..c2897d115cdfc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-legal_undedup_base_v1_5__checkpoint_last_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English legal_undedup_base_v1_5__checkpoint_last RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: legal_undedup_base_v1_5__checkpoint_last +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_undedup_base_v1_5__checkpoint_last` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_undedup_base_v1_5__checkpoint_last_en_5.5.0_3.0_1726618086891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_undedup_base_v1_5__checkpoint_last_en_5.5.0_3.0_1726618086891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("legal_undedup_base_v1_5__checkpoint_last","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("legal_undedup_base_v1_5__checkpoint_last","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_undedup_base_v1_5__checkpoint_last| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|296.3 MB| + +## References + +https://huggingface.co/eduagarcia-temp/legal_undedup_base_v1_5__checkpoint_last \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mobilebert_add_pre_training_complete_en.md b/docs/_posts/ahmedlone127/2024-09-18-mobilebert_add_pre_training_complete_en.md new file mode 100644 index 00000000000000..deabd330cee46a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mobilebert_add_pre_training_complete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mobilebert_add_pre_training_complete BertEmbeddings from gokuls +author: John Snow Labs +name: mobilebert_add_pre_training_complete +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_add_pre_training_complete` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_add_pre_training_complete_en_5.5.0_3.0_1726673607658.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_add_pre_training_complete_en_5.5.0_3.0_1726673607658.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("mobilebert_add_pre_training_complete","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("mobilebert_add_pre_training_complete","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_add_pre_training_complete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_add_pre-training-complete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mobilebert_add_pre_training_complete_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-mobilebert_add_pre_training_complete_pipeline_en.md new file mode 100644 index 00000000000000..9244cc72804058 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mobilebert_add_pre_training_complete_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mobilebert_add_pre_training_complete_pipeline pipeline BertEmbeddings from gokuls +author: John Snow Labs +name: mobilebert_add_pre_training_complete_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_add_pre_training_complete_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_add_pre_training_complete_pipeline_en_5.5.0_3.0_1726673612502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_add_pre_training_complete_pipeline_en_5.5.0_3.0_1726673612502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mobilebert_add_pre_training_complete_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mobilebert_add_pre_training_complete_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_add_pre_training_complete_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_add_pre-training-complete + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-n_distilbert_twitterfin_padding70model_en.md b/docs/_posts/ahmedlone127/2024-09-18-n_distilbert_twitterfin_padding70model_en.md new file mode 100644 index 00000000000000..391e6960781392 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-n_distilbert_twitterfin_padding70model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_twitterfin_padding70model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_twitterfin_padding70model +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_twitterfin_padding70model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding70model_en_5.5.0_3.0_1726677316842.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding70model_en_5.5.0_3.0_1726677316842.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_twitterfin_padding70model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_twitterfin_padding70model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_twitterfin_padding70model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_twitterfin_padding70model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random0_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random0_seed0_bernice_en.md new file mode 100644 index 00000000000000..b6ab0f9ffbab8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random0_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random0_seed0_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random0_seed0_bernice +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random0_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed0_bernice_en_5.5.0_3.0_1726686617114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed0_bernice_en_5.5.0_3.0_1726686617114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("nerd_nerd_random0_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("nerd_nerd_random0_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random0_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|832.0 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random0_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-norms_establish_check_reproducibility_16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-norms_establish_check_reproducibility_16_pipeline_en.md new file mode 100644 index 00000000000000..dd17cc746f76a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-norms_establish_check_reproducibility_16_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English norms_establish_check_reproducibility_16_pipeline pipeline RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: norms_establish_check_reproducibility_16_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norms_establish_check_reproducibility_16_pipeline` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norms_establish_check_reproducibility_16_pipeline_en_5.5.0_3.0_1726642239636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norms_establish_check_reproducibility_16_pipeline_en_5.5.0_3.0_1726642239636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("norms_establish_check_reproducibility_16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("norms_establish_check_reproducibility_16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norms_establish_check_reproducibility_16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/norms_establish_check_reproducibility_16 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline_en.md new file mode 100644 index 00000000000000..da2c6f31402a1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline pipeline RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline_en_5.5.0_3.0_1726649859736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline_en_5.5.0_3.0_1726649859736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_luis_rascon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-luis-rascon + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-polarizer_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-polarizer_base_pipeline_en.md new file mode 100644 index 00000000000000..eaa8c0b3dc1401 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-polarizer_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English polarizer_base_pipeline pipeline RoBertaEmbeddings from kyungmin011029 +author: John Snow Labs +name: polarizer_base_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polarizer_base_pipeline` is a English model originally trained by kyungmin011029. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polarizer_base_pipeline_en_5.5.0_3.0_1726618118159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polarizer_base_pipeline_en_5.5.0_3.0_1726618118159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("polarizer_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("polarizer_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polarizer_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/kyungmin011029/Polarizer-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline_en.md new file mode 100644 index 00000000000000..f1113d76d939fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline pipeline RoBertaForSequenceClassification from DevCar +author: John Snow Labs +name: roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline` is a English model originally trained by DevCar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline_en_5.5.0_3.0_1726628501394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline_en_5.5.0_3.0_1726628501394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_amazon_reviews_spanish_03_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.7 MB| + +## References + +https://huggingface.co/DevCar/roberta-base-bne-finetuned-amazon_reviews_es_03 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_mrpc_vitaliivrublevskyi_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_mrpc_vitaliivrublevskyi_en.md new file mode 100644 index 00000000000000..6785cf61fd7c52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_mrpc_vitaliivrublevskyi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_mrpc_vitaliivrublevskyi RoBertaForSequenceClassification from VitaliiVrublevskyi +author: John Snow Labs +name: roberta_base_finetuned_mrpc_vitaliivrublevskyi +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_mrpc_vitaliivrublevskyi` is a English model originally trained by VitaliiVrublevskyi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mrpc_vitaliivrublevskyi_en_5.5.0_3.0_1726666766462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mrpc_vitaliivrublevskyi_en_5.5.0_3.0_1726666766462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_mrpc_vitaliivrublevskyi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_mrpc_vitaliivrublevskyi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_mrpc_vitaliivrublevskyi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|442.4 MB| + +## References + +https://huggingface.co/VitaliiVrublevskyi/roberta-base-finetuned-mrpc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_happiness_crpo_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_happiness_crpo_en.md new file mode 100644 index 00000000000000..bbc0943669399d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_happiness_crpo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_poetry_happiness_crpo RoBertaEmbeddings from andreipb +author: John Snow Labs +name: roberta_poetry_happiness_crpo +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_poetry_happiness_crpo` is a English model originally trained by andreipb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_poetry_happiness_crpo_en_5.5.0_3.0_1726651871262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_poetry_happiness_crpo_en_5.5.0_3.0_1726651871262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_happiness_crpo","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_happiness_crpo","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_poetry_happiness_crpo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/andreipb/roberta-poetry-happiness-crpo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_similarity_mudasiryasin_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_similarity_mudasiryasin_pipeline_en.md new file mode 100644 index 00000000000000..98e67ffbcc1299 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_similarity_mudasiryasin_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_similarity_mudasiryasin_pipeline pipeline RoBertaForSequenceClassification from mudasiryasin +author: John Snow Labs +name: roberta_similarity_mudasiryasin_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_similarity_mudasiryasin_pipeline` is a English model originally trained by mudasiryasin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_similarity_mudasiryasin_pipeline_en_5.5.0_3.0_1726641438641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_similarity_mudasiryasin_pipeline_en_5.5.0_3.0_1726641438641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_similarity_mudasiryasin_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_similarity_mudasiryasin_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_similarity_mudasiryasin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|440.6 MB| + +## References + +https://huggingface.co/mudasiryasin/roberta-similarity + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-scenario_tcr_data_cl_cardiff_cl_only20271_en.md b/docs/_posts/ahmedlone127/2024-09-18-scenario_tcr_data_cl_cardiff_cl_only20271_en.md new file mode 100644 index 00000000000000..96d792c4d65fbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-scenario_tcr_data_cl_cardiff_cl_only20271_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English scenario_tcr_data_cl_cardiff_cl_only20271 XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_tcr_data_cl_cardiff_cl_only20271 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_tcr_data_cl_cardiff_cl_only20271` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_tcr_data_cl_cardiff_cl_only20271_en_5.5.0_3.0_1726697062037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_tcr_data_cl_cardiff_cl_only20271_en_5.5.0_3.0_1726697062037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_tcr_data_cl_cardiff_cl_only20271","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_tcr_data_cl_cardiff_cl_only20271", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_tcr_data_cl_cardiff_cl_only20271| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|840.5 MB| + +## References + +https://huggingface.co/haryoaw/scenario-TCR_data-cl-cardiff_cl_only20271 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_25lang_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_25lang_cased_pipeline_en.md new file mode 100644 index 00000000000000..e9c286245fa159 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_25lang_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_25lang_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_25lang_cased_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_25lang_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_25lang_cased_pipeline_en_5.5.0_3.0_1726687077805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_25lang_cased_pipeline_en_5.5.0_3.0_1726687077805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_25lang_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_25lang_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_25lang_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|565.8 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-25lang-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_english_vietnamese_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_english_vietnamese_cased_pipeline_en.md new file mode 100644 index 00000000000000..d0770b3eae2d31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_english_vietnamese_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_vietnamese_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_vietnamese_cased_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_vietnamese_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_vietnamese_cased_pipeline_en_5.5.0_3.0_1726675963015.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_vietnamese_cased_pipeline_en_5.5.0_3.0_1726675963015.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_vietnamese_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_vietnamese_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_vietnamese_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.4 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-vi-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_finetuned_academic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_finetuned_academic_pipeline_en.md new file mode 100644 index 00000000000000..98a80f3e58beb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_finetuned_academic_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_academic_pipeline pipeline BertSentenceEmbeddings from egumasa +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_academic_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_academic_pipeline` is a English model originally trained by egumasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_academic_pipeline_en_5.5.0_3.0_1726687058487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_academic_pipeline_en_5.5.0_3.0_1726687058487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_academic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_academic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_academic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/egumasa/bert-base-uncased-finetuned-academic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_issues_128_martinwunderlich_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_issues_128_martinwunderlich_pipeline_en.md new file mode 100644 index 00000000000000..a6d49520426ba3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_issues_128_martinwunderlich_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_martinwunderlich_pipeline pipeline BertSentenceEmbeddings from martinwunderlich +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_martinwunderlich_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_martinwunderlich_pipeline` is a English model originally trained by martinwunderlich. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_martinwunderlich_pipeline_en_5.5.0_3.0_1726694233934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_martinwunderlich_pipeline_en_5.5.0_3.0_1726694233934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_martinwunderlich_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_martinwunderlich_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_martinwunderlich_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/martinwunderlich/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_protaugment_lm_liu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_protaugment_lm_liu_pipeline_en.md new file mode 100644 index 00000000000000..37fc9d116a1857 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_protaugment_lm_liu_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_protaugment_lm_liu_pipeline pipeline BertSentenceEmbeddings from tdopierre +author: John Snow Labs +name: sent_protaugment_lm_liu_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_protaugment_lm_liu_pipeline` is a English model originally trained by tdopierre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_liu_pipeline_en_5.5.0_3.0_1726675783025.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_liu_pipeline_en_5.5.0_3.0_1726675783025.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_protaugment_lm_liu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_protaugment_lm_liu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_protaugment_lm_liu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.9 MB| + +## References + +https://huggingface.co/tdopierre/ProtAugment-LM-Liu + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_ssci_bert_e4_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_ssci_bert_e4_en.md new file mode 100644 index 00000000000000..1d1655157967ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_ssci_bert_e4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_ssci_bert_e4 BertSentenceEmbeddings from KM4STfulltext +author: John Snow Labs +name: sent_ssci_bert_e4 +date: 2024-09-18 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_ssci_bert_e4` is a English model originally trained by KM4STfulltext. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_ssci_bert_e4_en_5.5.0_3.0_1726687206830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_ssci_bert_e4_en_5.5.0_3.0_1726687206830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_ssci_bert_e4","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_ssci_bert_e4","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_ssci_bert_e4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/KM4STfulltext/SSCI-BERT-e4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_tod_bert_jnt_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_tod_bert_jnt_v1_pipeline_en.md new file mode 100644 index 00000000000000..943107437d4ac2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_tod_bert_jnt_v1_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_tod_bert_jnt_v1_pipeline pipeline BertSentenceEmbeddings from TODBERT +author: John Snow Labs +name: sent_tod_bert_jnt_v1_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_tod_bert_jnt_v1_pipeline` is a English model originally trained by TODBERT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_tod_bert_jnt_v1_pipeline_en_5.5.0_3.0_1726661910714.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_tod_bert_jnt_v1_pipeline_en_5.5.0_3.0_1726661910714.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_tod_bert_jnt_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_tod_bert_jnt_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_tod_bert_jnt_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.5 MB| + +## References + +https://huggingface.co/TODBERT/TOD-BERT-JNT-V1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline_en.md new file mode 100644 index 00000000000000..e6d5c6fff8bc12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline pipeline XlmRoBertaForTokenClassification from ajit-transformer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline` is a English model originally trained by ajit-transformer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline_en_5.5.0_3.0_1726664010280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline_en_5.5.0_3.0_1726664010280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_ajit_transformer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.9 MB| + +## References + +https://huggingface.co/ajit-transformer/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_drigb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_drigb_pipeline_en.md new file mode 100644 index 00000000000000..d8d3934c8b4a7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_drigb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_drigb_pipeline pipeline XlmRoBertaForTokenClassification from drigb +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_drigb_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_drigb_pipeline` is a English model originally trained by drigb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_drigb_pipeline_en_5.5.0_3.0_1726657521717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_drigb_pipeline_en_5.5.0_3.0_1726657521717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_drigb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_drigb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_drigb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/drigb/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_stevevee0101_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_stevevee0101_en.md new file mode 100644 index 00000000000000..4961e827e25396 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_stevevee0101_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_stevevee0101 XlmRoBertaForTokenClassification from stevevee0101 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_stevevee0101 +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_stevevee0101` is a English model originally trained by stevevee0101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_stevevee0101_en_5.5.0_3.0_1726655796943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_stevevee0101_en_5.5.0_3.0_1726655796943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_stevevee0101","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_stevevee0101", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_stevevee0101| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/stevevee0101/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_bessho_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_bessho_en.md new file mode 100644 index 00000000000000..dca65e38452dff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_bessho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_bessho XlmRoBertaForTokenClassification from bessho +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_bessho +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_bessho` is a English model originally trained by bessho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bessho_en_5.5.0_3.0_1726664072532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bessho_en_5.5.0_3.0_1726664072532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_bessho","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_bessho", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_bessho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/bessho/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train_en.md new file mode 100644 index 00000000000000..2112f2e17c601f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train_en_5.5.0_3.0_1726685932292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train_en_5.5.0_3.0_1726685932292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_0001_seed42_basic_original_kinyarwanda_hau_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|815.5 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.0001_seed42_basic_original_kin-hau-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline_en.md new file mode 100644 index 00000000000000..a210b0c175b54b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline_en_5.5.0_3.0_1726672308232.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline_en_5.5.0_3.0_1726672308232.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_spanish_60000_tweet_sentiment_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.0 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-es-60000-tweet-sentiment-es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_french_trimmed_french_10000_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_french_trimmed_french_10000_en.md new file mode 100644 index 00000000000000..3c80d5c326151a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_french_trimmed_french_10000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_french_trimmed_french_10000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_french_trimmed_french_10000 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_french_trimmed_french_10000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_french_trimmed_french_10000_en_5.5.0_3.0_1726659561011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_french_trimmed_french_10000_en_5.5.0_3.0_1726659561011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_french_trimmed_french_10000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_french_trimmed_french_10000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_french_trimmed_french_10000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|353.5 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-fr-trimmed-fr-10000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlmr_model_name_finetuned_panx_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlmr_model_name_finetuned_panx_german_pipeline_en.md new file mode 100644 index 00000000000000..57bcddd8a74435 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlmr_model_name_finetuned_panx_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_model_name_finetuned_panx_german_pipeline pipeline XlmRoBertaForTokenClassification from Denilah +author: John Snow Labs +name: xlmr_model_name_finetuned_panx_german_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_model_name_finetuned_panx_german_pipeline` is a English model originally trained by Denilah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_model_name_finetuned_panx_german_pipeline_en_5.5.0_3.0_1726701813783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_model_name_finetuned_panx_german_pipeline_en_5.5.0_3.0_1726701813783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_model_name_finetuned_panx_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_model_name_finetuned_panx_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_model_name_finetuned_panx_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.7 MB| + +## References + +https://huggingface.co/Denilah/xlmr_model_name-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-aift_model_review_multiple_label_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-aift_model_review_multiple_label_classification_pipeline_en.md new file mode 100644 index 00000000000000..b57fadbcec6638 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-aift_model_review_multiple_label_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English aift_model_review_multiple_label_classification_pipeline pipeline DistilBertForSequenceClassification from Cielciel +author: John Snow Labs +name: aift_model_review_multiple_label_classification_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aift_model_review_multiple_label_classification_pipeline` is a English model originally trained by Cielciel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aift_model_review_multiple_label_classification_pipeline_en_5.5.0_3.0_1726763556694.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aift_model_review_multiple_label_classification_pipeline_en_5.5.0_3.0_1726763556694.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("aift_model_review_multiple_label_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("aift_model_review_multiple_label_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aift_model_review_multiple_label_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Cielciel/aift-model-review-multiple-label-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-autotrain_g10wr_ryb7t_en.md b/docs/_posts/ahmedlone127/2024-09-19-autotrain_g10wr_ryb7t_en.md new file mode 100644 index 00000000000000..a73e52c3379a9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-autotrain_g10wr_ryb7t_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_g10wr_ryb7t RoBertaForTokenClassification from bikashpatra +author: John Snow Labs +name: autotrain_g10wr_ryb7t +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_g10wr_ryb7t` is a English model originally trained by bikashpatra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_g10wr_ryb7t_en_5.5.0_3.0_1726730984867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_g10wr_ryb7t_en_5.5.0_3.0_1726730984867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("autotrain_g10wr_ryb7t","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("autotrain_g10wr_ryb7t", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_g10wr_ryb7t| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/bikashpatra/autotrain-g10wr-ryb7t \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bailii_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-19-bailii_roberta_en.md new file mode 100644 index 00000000000000..7da94db83ea99f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bailii_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bailii_roberta RoBertaEmbeddings from tsantosh7 +author: John Snow Labs +name: bailii_roberta +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bailii_roberta` is a English model originally trained by tsantosh7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bailii_roberta_en_5.5.0_3.0_1726747868726.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bailii_roberta_en_5.5.0_3.0_1726747868726.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bailii_roberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bailii_roberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bailii_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/tsantosh7/Bailii-Roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline_en.md new file mode 100644 index 00000000000000..68d6f5ddb7caef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline_en_5.5.0_3.0_1726758409649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline_en_5.5.0_3.0_1726758409649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v4_1_0_8_1e_06_restful_sweep_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.1 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v4-1-0-8-1e-06-restful-sweep-5 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_1802_r1_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_1802_r1_en.md new file mode 100644 index 00000000000000..4b06029fa662fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_1802_r1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_1802_r1 BertEmbeddings from JamesKim +author: John Snow Labs +name: bert_base_uncased_1802_r1 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_1802_r1` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r1_en_5.5.0_3.0_1726744741322.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r1_en_5.5.0_3.0_1726744741322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_1802_r1","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_1802_r1","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_1802_r1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_1973_1974_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_1973_1974_pipeline_en.md new file mode 100644 index 00000000000000..9e1106b713eb91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_news_1973_1974_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_news_1973_1974_pipeline pipeline BertEmbeddings from sally9805 +author: John Snow Labs +name: bert_base_uncased_finetuned_news_1973_1974_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_news_1973_1974_pipeline` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_1973_1974_pipeline_en_5.5.0_3.0_1726734617602.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_1973_1974_pipeline_en_5.5.0_3.0_1726734617602.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_news_1973_1974_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_news_1973_1974_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_news_1973_1974_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1973-1974 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sijia_w_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sijia_w_pipeline_en.md new file mode 100644 index 00000000000000..88aaf5f7e4b89f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sijia_w_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_sijia_w_pipeline pipeline BertEmbeddings from sijia-w +author: John Snow Labs +name: bert_base_uncased_sijia_w_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_sijia_w_pipeline` is a English model originally trained by sijia-w. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sijia_w_pipeline_en_5.5.0_3.0_1726731962786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sijia_w_pipeline_en_5.5.0_3.0_1726731962786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_sijia_w_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_sijia_w_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_sijia_w_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sijia-w/bert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_racial_cross_validation_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_racial_cross_validation_en.md new file mode 100644 index 00000000000000..90421fe070f830 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_racial_cross_validation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_racial_cross_validation DistilBertForSequenceClassification from jamnik99 +author: John Snow Labs +name: bert_racial_cross_validation +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_racial_cross_validation` is a English model originally trained by jamnik99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_racial_cross_validation_en_5.5.0_3.0_1726719083250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_racial_cross_validation_en_5.5.0_3.0_1726719083250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_racial_cross_validation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_racial_cross_validation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_racial_cross_validation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jamnik99/BERT_racial_cross_validation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_sbic_offensive_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_sbic_offensive_en.md new file mode 100644 index 00000000000000..aee1c573645ce1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_sbic_offensive_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_sbic_offensive BertForSequenceClassification from Cameron +author: John Snow Labs +name: bert_sbic_offensive +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sbic_offensive` is a English model originally trained by Cameron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sbic_offensive_en_5.5.0_3.0_1726781767716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sbic_offensive_en_5.5.0_3.0_1726781767716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_sbic_offensive","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_sbic_offensive", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sbic_offensive| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Cameron/BERT-SBIC-offensive \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bertbased_hatespeech_pretrain_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bertbased_hatespeech_pretrain_pipeline_en.md new file mode 100644 index 00000000000000..7611408bf2906c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bertbased_hatespeech_pretrain_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertbased_hatespeech_pretrain_pipeline pipeline BertEmbeddings from agvidit1 +author: John Snow Labs +name: bertbased_hatespeech_pretrain_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertbased_hatespeech_pretrain_pipeline` is a English model originally trained by agvidit1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertbased_hatespeech_pretrain_pipeline_en_5.5.0_3.0_1726705484685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertbased_hatespeech_pretrain_pipeline_en_5.5.0_3.0_1726705484685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertbased_hatespeech_pretrain_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertbased_hatespeech_pretrain_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertbased_hatespeech_pretrain_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/agvidit1/BertBased_HateSpeech_pretrain + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_adithya5243_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_adithya5243_en.md new file mode 100644 index 00000000000000..e2478d8b067e73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_adithya5243_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_adithya5243 DistilBertForSequenceClassification from adithya5243 +author: John Snow Labs +name: burmese_awesome_model_adithya5243 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_adithya5243` is a English model originally trained by adithya5243. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_adithya5243_en_5.5.0_3.0_1726704469224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_adithya5243_en_5.5.0_3.0_1726704469224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_adithya5243","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_adithya5243", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_adithya5243| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/adithya5243/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_fold_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_fold_5_pipeline_en.md new file mode 100644 index 00000000000000..8ad301bfc05c55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_fold_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_fold_5_pipeline pipeline DistilBertForSequenceClassification from Thebisso09 +author: John Snow Labs +name: burmese_awesome_model_fold_5_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_fold_5_pipeline` is a English model originally trained by Thebisso09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fold_5_pipeline_en_5.5.0_3.0_1726763713233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fold_5_pipeline_en_5.5.0_3.0_1726763713233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_fold_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_fold_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_fold_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Thebisso09/my_awesome_model_fold_5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_kiliemah_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_kiliemah_en.md new file mode 100644 index 00000000000000..86df50033fcd89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_kiliemah_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_kiliemah DistilBertForSequenceClassification from Kiliemah +author: John Snow Labs +name: burmese_awesome_model_kiliemah +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_kiliemah` is a English model originally trained by Kiliemah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kiliemah_en_5.5.0_3.0_1726763346429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kiliemah_en_5.5.0_3.0_1726763346429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_kiliemah","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_kiliemah", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_kiliemah| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Kiliemah/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_lxl2023_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_lxl2023_en.md new file mode 100644 index 00000000000000..f2926609cd8a25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_lxl2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_lxl2023 DistilBertForSequenceClassification from lxl2023 +author: John Snow Labs +name: burmese_awesome_model_lxl2023 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_lxl2023` is a English model originally trained by lxl2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_lxl2023_en_5.5.0_3.0_1726743079364.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_lxl2023_en_5.5.0_3.0_1726743079364.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_lxl2023","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_lxl2023", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_lxl2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lxl2023/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_qa_model_torchat_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_qa_model_torchat_en.md new file mode 100644 index 00000000000000..5050680d1d369b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_qa_model_torchat_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_torchat DistilBertForQuestionAnswering from torchat +author: John Snow Labs +name: burmese_awesome_qa_model_torchat +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_torchat` is a English model originally trained by torchat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_torchat_en_5.5.0_3.0_1726785824150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_torchat_en_5.5.0_3.0_1726785824150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_torchat","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_torchat", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_torchat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/torchat/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-code_bert_small_finetuned_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-code_bert_small_finetuned_v2_pipeline_en.md new file mode 100644 index 00000000000000..4df2f49b6f97a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-code_bert_small_finetuned_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English code_bert_small_finetuned_v2_pipeline pipeline RoBertaEmbeddings from mshn74 +author: John Snow Labs +name: code_bert_small_finetuned_v2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_bert_small_finetuned_v2_pipeline` is a English model originally trained by mshn74. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_bert_small_finetuned_v2_pipeline_en_5.5.0_3.0_1726747098257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_bert_small_finetuned_v2_pipeline_en_5.5.0_3.0_1726747098257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("code_bert_small_finetuned_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("code_bert_small_finetuned_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_bert_small_finetuned_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.8 MB| + +## References + +https://huggingface.co/mshn74/code_bert_small-finetuned-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-cold_fusion_itr25_seed1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-cold_fusion_itr25_seed1_pipeline_en.md new file mode 100644 index 00000000000000..9865a0b39b6080 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-cold_fusion_itr25_seed1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr25_seed1_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr25_seed1_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr25_seed1_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr25_seed1_pipeline_en_5.5.0_3.0_1726726274550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr25_seed1_pipeline_en_5.5.0_3.0_1726726274550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr25_seed1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr25_seed1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr25_seed1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr25-seed1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-defsent_roberta_base_cls_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-defsent_roberta_base_cls_pipeline_en.md new file mode 100644 index 00000000000000..9736473c3add47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-defsent_roberta_base_cls_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English defsent_roberta_base_cls_pipeline pipeline RoBertaEmbeddings from cl-nagoya +author: John Snow Labs +name: defsent_roberta_base_cls_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`defsent_roberta_base_cls_pipeline` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_cls_pipeline_en_5.5.0_3.0_1726778336060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_cls_pipeline_en_5.5.0_3.0_1726778336060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("defsent_roberta_base_cls_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("defsent_roberta_base_cls_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|defsent_roberta_base_cls_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|413.1 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-roberta-base-cls + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distil_task_b_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distil_task_b_3_pipeline_en.md new file mode 100644 index 00000000000000..e365cb79e305f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distil_task_b_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distil_task_b_3_pipeline pipeline DistilBertForSequenceClassification from sheduele +author: John Snow Labs +name: distil_task_b_3_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_task_b_3_pipeline` is a English model originally trained by sheduele. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_task_b_3_pipeline_en_5.5.0_3.0_1726742816085.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_task_b_3_pipeline_en_5.5.0_3.0_1726742816085.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distil_task_b_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distil_task_b_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_task_b_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sheduele/distil_task_B_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_70k_qa_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_70k_qa_model_pipeline_en.md new file mode 100644 index 00000000000000..8335497c87014d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_70k_qa_model_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_70k_qa_model_pipeline pipeline DistilBertForQuestionAnswering from Vasanth +author: John Snow Labs +name: distilbert_70k_qa_model_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_70k_qa_model_pipeline` is a English model originally trained by Vasanth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_70k_qa_model_pipeline_en_5.5.0_3.0_1726748426855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_70k_qa_model_pipeline_en_5.5.0_3.0_1726748426855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_70k_qa_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_70k_qa_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_70k_qa_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/Vasanth/distilbert_70k_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_en.md new file mode 100644 index 00000000000000..69dc9869565953 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: DistilBERT base model (cased) +author: John Snow Labs +name: distilbert_base_cased +date: 2024-09-19 +tags: [distilbert, en, english, open_source, embeddings, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model is a distilled version of the [BERT base model](https://huggingface.co/bert-base-cased). It was introduced in [this paper](https://arxiv.org/abs/1910.01108). The code for the distillation process can be found [here](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation). This model is cased: it does make a difference between english and English. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_en_5.5.0_3.0_1726742707299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_en_5.5.0_3.0_1726742707299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + +{:.model-param} + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings = DistilBertEmbeddings.pretrained("distilbert_base_cased", "en") \ +.setInputCols("sentence", "token") \ +.setOutputCol("embeddings") +nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings]) +``` +```scala +val embeddings = DistilBertEmbeddings.pretrained("distilbert_base_cased", "en") +.setInputCols("sentence", "token") +.setOutputCol("embeddings") +val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings)) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.embed.distilbert").predict("""Put your text here.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +References + +[https://huggingface.co/distilbert-base-cased](https://huggingface.co/distilbert-base-cased) + +## Benchmarking + +```bash + +Benchmarking + + +When fine-tuned on downstream tasks, this model achieves the following results: + +Glue test results: + +| Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | +|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:| +| | 81.5 | 87.8 | 88.2 | 90.4 | 47.2 | 85.5 | 85.6 | 60.6 | +``` \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline_en.md new file mode 100644 index 00000000000000..eb6f71e14369a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline pipeline DistilBertForSequenceClassification from Binga288 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline` is a English model originally trained by Binga288. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline_en_5.5.0_3.0_1726764139410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline_en_5.5.0_3.0_1726764139410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw1_binga288_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Binga288/distilbert-base-uncased-finetuned-adl_hw1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_bwy071_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_bwy071_en.md new file mode 100644 index 00000000000000..f33c1556051585 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_bwy071_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_bwy071 DistilBertForSequenceClassification from bwy071 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_bwy071 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_bwy071` is a English model originally trained by bwy071. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bwy071_en_5.5.0_3.0_1726704741223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bwy071_en_5.5.0_3.0_1726704741223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_bwy071","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_bwy071", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_bwy071| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bwy071/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_hanzla107_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_hanzla107_pipeline_en.md new file mode 100644 index 00000000000000..16b3601c381f44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_hanzla107_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_hanzla107_pipeline pipeline DistilBertForSequenceClassification from hanzla107 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_hanzla107_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_hanzla107_pipeline` is a English model originally trained by hanzla107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hanzla107_pipeline_en_5.5.0_3.0_1726718995054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hanzla107_pipeline_en_5.5.0_3.0_1726718995054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_hanzla107_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_hanzla107_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_hanzla107_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hanzla107/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_benshafat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_benshafat_pipeline_en.md new file mode 100644 index 00000000000000..52336bd09fae7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_benshafat_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_benshafat_pipeline pipeline DistilBertForSequenceClassification from benshafat +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_benshafat_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_benshafat_pipeline` is a English model originally trained by benshafat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_benshafat_pipeline_en_5.5.0_3.0_1726763571286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_benshafat_pipeline_en_5.5.0_3.0_1726763571286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_benshafat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_benshafat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_benshafat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/benshafat/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_farisanki_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_farisanki_pipeline_en.md new file mode 100644 index 00000000000000..8508650e6c04a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_farisanki_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_farisanki_pipeline pipeline DistilBertForSequenceClassification from farisanki +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_farisanki_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_farisanki_pipeline` is a English model originally trained by farisanki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_farisanki_pipeline_en_5.5.0_3.0_1726742979347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_farisanki_pipeline_en_5.5.0_3.0_1726742979347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_farisanki_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_farisanki_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_farisanki_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/farisanki/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_rairachit_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_rairachit_pipeline_en.md new file mode 100644 index 00000000000000..61d6d7ce939ba0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_rairachit_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_rairachit_pipeline pipeline DistilBertForSequenceClassification from RaiRachit +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_rairachit_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_rairachit_pipeline` is a English model originally trained by RaiRachit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rairachit_pipeline_en_5.5.0_3.0_1726719221704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rairachit_pipeline_en_5.5.0_3.0_1726719221704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_rairachit_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_rairachit_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_rairachit_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RaiRachit/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline_en.md new file mode 100644 index 00000000000000..0b102a324bb772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline pipeline DistilBertForQuestionAnswering from coreyabs-db +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline` is a English model originally trained by coreyabs-db. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline_en_5.5.0_3.0_1726727609746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline_en_5.5.0_3.0_1726727609746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_coreyabs_db_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/coreyabs-db/distilbert-base-uncased-finetuned-squad-d5716d28 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_t_communication_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_t_communication_pipeline_en.md new file mode 100644 index 00000000000000..f1d232b481ea9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_t_communication_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_communication_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_communication_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_communication_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_communication_pipeline_en_5.5.0_3.0_1726742872625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_communication_pipeline_en_5.5.0_3.0_1726742872625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_t_communication_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_t_communication_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_communication_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_communication + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p50_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p50_en.md new file mode 100644 index 00000000000000..2d529022e74824 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p50_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_pruned_p50 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_pruned_p50 +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_pruned_p50` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p50_en_5.5.0_3.0_1726748506237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p50_en_5.5.0_3.0_1726748506237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_pruned_p50","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_pruned_p50", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_pruned_p50| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|185.2 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-pruned-p50 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_college_experience_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_college_experience_classifier_pipeline_en.md new file mode 100644 index 00000000000000..ea1b2d09b9c0c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_college_experience_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_college_experience_classifier_pipeline pipeline DistilBertForSequenceClassification from jasonchay +author: John Snow Labs +name: distilbert_college_experience_classifier_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_college_experience_classifier_pipeline` is a English model originally trained by jasonchay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_college_experience_classifier_pipeline_en_5.5.0_3.0_1726742918779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_college_experience_classifier_pipeline_en_5.5.0_3.0_1726742918779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_college_experience_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_college_experience_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_college_experience_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jasonchay/distilbert-college-experience-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline_en.md new file mode 100644 index 00000000000000..330a92927e95b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline_en_5.5.0_3.0_1726742900045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline_en_5.5.0_3.0_1726742900045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_mnli_192_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_mnli_192 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline_en.md new file mode 100644 index 00000000000000..c9cef6acff12b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline pipeline RoBertaForSequenceClassification from Abdelrahman-Rezk +author: John Snow Labs +name: emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline` is a English model originally trained by Abdelrahman-Rezk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline_en_5.5.0_3.0_1726726000160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline_en_5.5.0_3.0_1726726000160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/Abdelrahman-Rezk/emotion-english-distilroberta-base-fine_tuned_for_amazon_reviews_english_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuned_roberta_base_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuned_roberta_base_model_pipeline_en.md new file mode 100644 index 00000000000000..bd95b28eac06d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuned_roberta_base_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_roberta_base_model_pipeline pipeline RoBertaForSequenceClassification from KwabenaMufasa +author: John Snow Labs +name: finetuned_roberta_base_model_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_roberta_base_model_pipeline` is a English model originally trained by KwabenaMufasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_roberta_base_model_pipeline_en_5.5.0_3.0_1726780253334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_roberta_base_model_pipeline_en_5.5.0_3.0_1726780253334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_roberta_base_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_roberta_base_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_roberta_base_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.3 MB| + +## References + +https://huggingface.co/KwabenaMufasa/Finetuned-Roberta-base-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_zhian66_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_zhian66_en.md new file mode 100644 index 00000000000000..d289aad1302210 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_zhian66_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_zhian66 DistilBertForSequenceClassification from Zhian66 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_zhian66 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_zhian66` is a English model originally trained by Zhian66. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_zhian66_en_5.5.0_3.0_1726719185627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_zhian66_en_5.5.0_3.0_1726719185627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_zhian66","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_zhian66", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_zhian66| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Zhian66/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_zhian66_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_zhian66_pipeline_en.md new file mode 100644 index 00000000000000..07fe5797a128fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_zhian66_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_zhian66_pipeline pipeline DistilBertForSequenceClassification from Zhian66 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_zhian66_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_zhian66_pipeline` is a English model originally trained by Zhian66. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_zhian66_pipeline_en_5.5.0_3.0_1726719198846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_zhian66_pipeline_en_5.5.0_3.0_1726719198846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_zhian66_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_zhian66_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_zhian66_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Zhian66/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ft_danish_distilbert_gdp_en.md b/docs/_posts/ahmedlone127/2024-09-19-ft_danish_distilbert_gdp_en.md new file mode 100644 index 00000000000000..ae5da3bb8a2d85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ft_danish_distilbert_gdp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ft_danish_distilbert_gdp DistilBertForSequenceClassification from gc394 +author: John Snow Labs +name: ft_danish_distilbert_gdp +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_danish_distilbert_gdp` is a English model originally trained by gc394. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_danish_distilbert_gdp_en_5.5.0_3.0_1726743645359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_danish_distilbert_gdp_en_5.5.0_3.0_1726743645359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_danish_distilbert_gdp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_danish_distilbert_gdp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_danish_distilbert_gdp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/gc394/ft_da_distilbert_gdp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline_en.md new file mode 100644 index 00000000000000..24f22aad7a886e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726751330761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1726751330761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random2_seed1_twitter_roberta_base_2019_90m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random2_seed1-twitter-roberta-base-2019-90m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-maghriberta_en.md b/docs/_posts/ahmedlone127/2024-09-19-maghriberta_en.md new file mode 100644 index 00000000000000..365d2201824105 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-maghriberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English maghriberta RoBertaEmbeddings from nboudad +author: John Snow Labs +name: maghriberta +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maghriberta` is a English model originally trained by nboudad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maghriberta_en_5.5.0_3.0_1726747055219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maghriberta_en_5.5.0_3.0_1726747055219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("maghriberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("maghriberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maghriberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|346.5 MB| + +## References + +https://huggingface.co/nboudad/Maghriberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-multilingual_xlm_roberta_for_ner_ertyazilim_xx.md b/docs/_posts/ahmedlone127/2024-09-19-multilingual_xlm_roberta_for_ner_ertyazilim_xx.md new file mode 100644 index 00000000000000..07dff2af492647 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-multilingual_xlm_roberta_for_ner_ertyazilim_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual multilingual_xlm_roberta_for_ner_ertyazilim XlmRoBertaForTokenClassification from ertyazilim +author: John Snow Labs +name: multilingual_xlm_roberta_for_ner_ertyazilim +date: 2024-09-19 +tags: [xx, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_xlm_roberta_for_ner_ertyazilim` is a Multilingual model originally trained by ertyazilim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_ertyazilim_xx_5.5.0_3.0_1726737425165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_ertyazilim_xx_5.5.0_3.0_1726737425165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("multilingual_xlm_roberta_for_ner_ertyazilim","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("multilingual_xlm_roberta_for_ner_ertyazilim", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_xlm_roberta_for_ner_ertyazilim| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|840.8 MB| + +## References + +https://huggingface.co/ertyazilim/multilingual-xlm-roberta-for-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-opp_115_user_choice_control_en.md b/docs/_posts/ahmedlone127/2024-09-19-opp_115_user_choice_control_en.md new file mode 100644 index 00000000000000..6ada2d597bd5e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-opp_115_user_choice_control_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opp_115_user_choice_control RoBertaForSequenceClassification from jakariamd +author: John Snow Labs +name: opp_115_user_choice_control +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opp_115_user_choice_control` is a English model originally trained by jakariamd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opp_115_user_choice_control_en_5.5.0_3.0_1726780458471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opp_115_user_choice_control_en_5.5.0_3.0_1726780458471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("opp_115_user_choice_control","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("opp_115_user_choice_control", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opp_115_user_choice_control| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/jakariamd/opp_115_user_choice_control \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-opp_115_user_choice_control_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-opp_115_user_choice_control_pipeline_en.md new file mode 100644 index 00000000000000..61a4d748907ba6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-opp_115_user_choice_control_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opp_115_user_choice_control_pipeline pipeline RoBertaForSequenceClassification from jakariamd +author: John Snow Labs +name: opp_115_user_choice_control_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opp_115_user_choice_control_pipeline` is a English model originally trained by jakariamd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opp_115_user_choice_control_pipeline_en_5.5.0_3.0_1726780480799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opp_115_user_choice_control_pipeline_en_5.5.0_3.0_1726780480799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opp_115_user_choice_control_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opp_115_user_choice_control_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opp_115_user_choice_control_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/jakariamd/opp_115_user_choice_control + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-patent_ner_test_noisyocr_version_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-patent_ner_test_noisyocr_version_pipeline_en.md new file mode 100644 index 00000000000000..c5a80e68f4d774 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-patent_ner_test_noisyocr_version_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English patent_ner_test_noisyocr_version_pipeline pipeline RoBertaForTokenClassification from matthewleechen +author: John Snow Labs +name: patent_ner_test_noisyocr_version_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`patent_ner_test_noisyocr_version_pipeline` is a English model originally trained by matthewleechen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/patent_ner_test_noisyocr_version_pipeline_en_5.5.0_3.0_1726729672508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/patent_ner_test_noisyocr_version_pipeline_en_5.5.0_3.0_1726729672508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("patent_ner_test_noisyocr_version_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("patent_ner_test_noisyocr_version_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|patent_ner_test_noisyocr_version_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/matthewleechen/patent_ner_test_noisyocr_version + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-platzi_distilroberta_base_mrpc_glue_andres_galvis_en.md b/docs/_posts/ahmedlone127/2024-09-19-platzi_distilroberta_base_mrpc_glue_andres_galvis_en.md new file mode 100644 index 00000000000000..923c100cd0429b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-platzi_distilroberta_base_mrpc_glue_andres_galvis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_andres_galvis RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_andres_galvis +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_andres_galvis` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_andres_galvis_en_5.5.0_3.0_1726751072961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_andres_galvis_en_5.5.0_3.0_1726751072961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_andres_galvis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_andres_galvis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_andres_galvis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-andres-galvis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_bne_finetuned_tripadvisor_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_bne_finetuned_tripadvisor_en.md new file mode 100644 index 00000000000000..1be2a08ffc813d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_bne_finetuned_tripadvisor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_tripadvisor RoBertaEmbeddings from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_tripadvisor +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_tripadvisor` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisor_en_5.5.0_3.0_1726747101775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisor_en_5.5.0_3.0_1726747101775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_bne_finetuned_tripadvisor","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_bne_finetuned_tripadvisor","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_tripadvisor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-tripAdvisor \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_genia_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_genia_ner_pipeline_en.md new file mode 100644 index 00000000000000..231fdb3cbc1c0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_genia_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_genia_ner_pipeline pipeline RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_base_genia_ner_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_genia_ner_pipeline` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_genia_ner_pipeline_en_5.5.0_3.0_1726745789247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_genia_ner_pipeline_en_5.5.0_3.0_1726745789247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_genia_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_genia_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_genia_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.0 MB| + +## References + +https://huggingface.co/CheccoCando/roberta-base_GENIA_NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_merged_subtaskb_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_merged_subtaskb_en.md new file mode 100644 index 00000000000000..49cf3ac5f268c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_merged_subtaskb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_merged_subtaskb RoBertaForSequenceClassification from Sansh2003 +author: John Snow Labs +name: roberta_large_merged_subtaskb +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_merged_subtaskb` is a English model originally trained by Sansh2003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_merged_subtaskb_en_5.5.0_3.0_1726726488149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_merged_subtaskb_en_5.5.0_3.0_1726726488149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_merged_subtaskb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_merged_subtaskb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_merged_subtaskb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Sansh2003/roberta-large-merged-subtaskB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_ppt_occitan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_ppt_occitan_pipeline_en.md new file mode 100644 index 00000000000000..9be4d9d6dbc308 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_ppt_occitan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_ppt_occitan_pipeline pipeline RoBertaEmbeddings from mehrshadk +author: John Snow Labs +name: roberta_large_ppt_occitan_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_ppt_occitan_pipeline` is a English model originally trained by mehrshadk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_ppt_occitan_pipeline_en_5.5.0_3.0_1726749638382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_ppt_occitan_pipeline_en_5.5.0_3.0_1726749638382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_ppt_occitan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_ppt_occitan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_ppt_occitan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/mehrshadk/roberta_Large_ppt_OC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_1eps_seed995_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_1eps_seed995_pipeline_en.md new file mode 100644 index 00000000000000..ebd26f7e34a78c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_1eps_seed995_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_untrained_1eps_seed995_pipeline pipeline RoBertaForSequenceClassification from custeau +author: John Snow Labs +name: roberta_untrained_1eps_seed995_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_untrained_1eps_seed995_pipeline` is a English model originally trained by custeau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed995_pipeline_en_5.5.0_3.0_1726779888079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed995_pipeline_en_5.5.0_3.0_1726779888079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_untrained_1eps_seed995_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_untrained_1eps_seed995_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_untrained_1eps_seed995_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|447.9 MB| + +## References + +https://huggingface.co/custeau/roberta_untrained_1eps_seed995 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-robertaiqbal_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-robertaiqbal_pipeline_en.md new file mode 100644 index 00000000000000..a7739d1debde2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-robertaiqbal_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertaiqbal_pipeline pipeline RoBertaEmbeddings from cxfajar197 +author: John Snow Labs +name: robertaiqbal_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertaiqbal_pipeline` is a English model originally trained by cxfajar197. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertaiqbal_pipeline_en_5.5.0_3.0_1726747424973.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertaiqbal_pipeline_en_5.5.0_3.0_1726747424973.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertaiqbal_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertaiqbal_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertaiqbal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|471.0 MB| + +## References + +https://huggingface.co/cxfajar197/robertaiqbal + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sbic_roberta_text_disagreement_predictor_en.md b/docs/_posts/ahmedlone127/2024-09-19-sbic_roberta_text_disagreement_predictor_en.md new file mode 100644 index 00000000000000..08a963adb0ee04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sbic_roberta_text_disagreement_predictor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sbic_roberta_text_disagreement_predictor RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: sbic_roberta_text_disagreement_predictor +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sbic_roberta_text_disagreement_predictor` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sbic_roberta_text_disagreement_predictor_en_5.5.0_3.0_1726733058318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sbic_roberta_text_disagreement_predictor_en_5.5.0_3.0_1726733058318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sbic_roberta_text_disagreement_predictor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sbic_roberta_text_disagreement_predictor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sbic_roberta_text_disagreement_predictor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|415.5 MB| + +## References + +https://huggingface.co/RuyuanWan/SBIC_RoBERTa_Text_Disagreement_Predictor \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sbic_roberta_text_disagreement_predictor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sbic_roberta_text_disagreement_predictor_pipeline_en.md new file mode 100644 index 00000000000000..888fe281eb147d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sbic_roberta_text_disagreement_predictor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sbic_roberta_text_disagreement_predictor_pipeline pipeline RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: sbic_roberta_text_disagreement_predictor_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sbic_roberta_text_disagreement_predictor_pipeline` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sbic_roberta_text_disagreement_predictor_pipeline_en_5.5.0_3.0_1726733101562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sbic_roberta_text_disagreement_predictor_pipeline_en_5.5.0_3.0_1726733101562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sbic_roberta_text_disagreement_predictor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sbic_roberta_text_disagreement_predictor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sbic_roberta_text_disagreement_predictor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.5 MB| + +## References + +https://huggingface.co/RuyuanWan/SBIC_RoBERTa_Text_Disagreement_Predictor + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-schem_roberta_text_disagreement_binary_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-19-schem_roberta_text_disagreement_binary_classifier_en.md new file mode 100644 index 00000000000000..545019295e699b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-schem_roberta_text_disagreement_binary_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English schem_roberta_text_disagreement_binary_classifier RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: schem_roberta_text_disagreement_binary_classifier +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`schem_roberta_text_disagreement_binary_classifier` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/schem_roberta_text_disagreement_binary_classifier_en_5.5.0_3.0_1726733086788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/schem_roberta_text_disagreement_binary_classifier_en_5.5.0_3.0_1726733086788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("schem_roberta_text_disagreement_binary_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("schem_roberta_text_disagreement_binary_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|schem_roberta_text_disagreement_binary_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|422.8 MB| + +## References + +https://huggingface.co/RuyuanWan/SChem_RoBERTa_Text_Disagreement_Binary_Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_javanese_bert_small_imdb_jv.md b/docs/_posts/ahmedlone127/2024-09-19-sent_javanese_bert_small_imdb_jv.md new file mode 100644 index 00000000000000..3c20ad8d29fcbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_javanese_bert_small_imdb_jv.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Javanese sent_javanese_bert_small_imdb BertSentenceEmbeddings from w11wo +author: John Snow Labs +name: sent_javanese_bert_small_imdb +date: 2024-09-19 +tags: [jv, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: jv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_javanese_bert_small_imdb` is a Javanese model originally trained by w11wo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_javanese_bert_small_imdb_jv_5.5.0_3.0_1726782985552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_javanese_bert_small_imdb_jv_5.5.0_3.0_1726782985552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_javanese_bert_small_imdb","jv") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_javanese_bert_small_imdb","jv") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_javanese_bert_small_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|jv| +|Size:|407.3 MB| + +## References + +https://huggingface.co/w11wo/javanese-bert-small-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_phs_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_phs_bert_pipeline_en.md new file mode 100644 index 00000000000000..1f9b9e7567e545 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_phs_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_phs_bert_pipeline pipeline BertSentenceEmbeddings from publichealthsurveillance +author: John Snow Labs +name: sent_phs_bert_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_phs_bert_pipeline` is a English model originally trained by publichealthsurveillance. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_phs_bert_pipeline_en_5.5.0_3.0_1726782769989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_phs_bert_pipeline_en_5.5.0_3.0_1726782769989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_phs_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_phs_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_phs_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/publichealthsurveillance/PHS-BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-socmed_comment_roberta_base_indonesian_smsa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-socmed_comment_roberta_base_indonesian_smsa_pipeline_en.md new file mode 100644 index 00000000000000..232bb3fdd6f731 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-socmed_comment_roberta_base_indonesian_smsa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English socmed_comment_roberta_base_indonesian_smsa_pipeline pipeline RoBertaForSequenceClassification from databoks-irfan +author: John Snow Labs +name: socmed_comment_roberta_base_indonesian_smsa_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`socmed_comment_roberta_base_indonesian_smsa_pipeline` is a English model originally trained by databoks-irfan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/socmed_comment_roberta_base_indonesian_smsa_pipeline_en_5.5.0_3.0_1726780023599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/socmed_comment_roberta_base_indonesian_smsa_pipeline_en_5.5.0_3.0_1726780023599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("socmed_comment_roberta_base_indonesian_smsa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("socmed_comment_roberta_base_indonesian_smsa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|socmed_comment_roberta_base_indonesian_smsa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/databoks-irfan/socmed-comment-roberta-base-indonesian-smsa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-stresstweetrobertasentiment_en.md b/docs/_posts/ahmedlone127/2024-09-19-stresstweetrobertasentiment_en.md new file mode 100644 index 00000000000000..4f09e49946a0d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-stresstweetrobertasentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stresstweetrobertasentiment RoBertaForSequenceClassification from StephArn +author: John Snow Labs +name: stresstweetrobertasentiment +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stresstweetrobertasentiment` is a English model originally trained by StephArn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stresstweetrobertasentiment_en_5.5.0_3.0_1726779734363.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stresstweetrobertasentiment_en_5.5.0_3.0_1726779734363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("stresstweetrobertasentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("stresstweetrobertasentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stresstweetrobertasentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/StephArn/StressTweetRobertaSentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_jrb_small_tamil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_jrb_small_tamil_pipeline_en.md new file mode 100644 index 00000000000000..225cb584b105ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_jrb_small_tamil_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_jrb_small_tamil_pipeline pipeline WhisperForCTC from jbatista79 +author: John Snow Labs +name: whisper_jrb_small_tamil_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_jrb_small_tamil_pipeline` is a English model originally trained by jbatista79. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_jrb_small_tamil_pipeline_en_5.5.0_3.0_1726716270724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_jrb_small_tamil_pipeline_en_5.5.0_3.0_1726716270724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_jrb_small_tamil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_jrb_small_tamil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_jrb_small_tamil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jbatista79/whisper-jrb-small-ta + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_hausa_seon25_pipeline_ha.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_hausa_seon25_pipeline_ha.md new file mode 100644 index 00000000000000..bbd2b4ee32f562 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_hausa_seon25_pipeline_ha.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hausa whisper_small_hausa_seon25_pipeline pipeline WhisperForCTC from Seon25 +author: John Snow Labs +name: whisper_small_hausa_seon25_pipeline +date: 2024-09-19 +tags: [ha, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ha +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hausa_seon25_pipeline` is a Hausa model originally trained by Seon25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hausa_seon25_pipeline_ha_5.5.0_3.0_1726714465701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hausa_seon25_pipeline_ha_5.5.0_3.0_1726714465701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hausa_seon25_pipeline", lang = "ha") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hausa_seon25_pipeline", lang = "ha") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hausa_seon25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ha| +|Size:|641.7 MB| + +## References + +https://huggingface.co/Seon25/whisper-small-ha + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_finetune_hindi_fleurs_hi.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_finetune_hindi_fleurs_hi.md new file mode 100644 index 00000000000000..674b74d184be20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_finetune_hindi_fleurs_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_tiny_finetune_hindi_fleurs WhisperForCTC from Aryan-401 +author: John Snow Labs +name: whisper_tiny_finetune_hindi_fleurs +date: 2024-09-19 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_finetune_hindi_fleurs` is a Hindi model originally trained by Aryan-401. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetune_hindi_fleurs_hi_5.5.0_3.0_1726714588010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetune_hindi_fleurs_hi_5.5.0_3.0_1726714588010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_finetune_hindi_fleurs","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_finetune_hindi_fleurs", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_finetune_hindi_fleurs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|390.0 MB| + +## References + +https://huggingface.co/Aryan-401/whisper-tiny-finetune-hindi-fleurs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_albiecofie_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_albiecofie_pipeline_en.md new file mode 100644 index 00000000000000..f03029beb13fed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_albiecofie_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_albiecofie_pipeline pipeline XlmRoBertaForSequenceClassification from AlbieCofie +author: John Snow Labs +name: xlm_roberta_base_albiecofie_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_albiecofie_pipeline` is a English model originally trained by AlbieCofie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_albiecofie_pipeline_en_5.5.0_3.0_1726752910422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_albiecofie_pipeline_en_5.5.0_3.0_1726752910422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_albiecofie_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_albiecofie_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_albiecofie_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/AlbieCofie/xlm_roberta_base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_azaidi_face_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_azaidi_face_en.md new file mode 100644 index 00000000000000..f648a10f960362 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_azaidi_face_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_azaidi_face XlmRoBertaForTokenClassification from azaidi-face +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_azaidi_face +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_azaidi_face` is a English model originally trained by azaidi-face. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_azaidi_face_en_5.5.0_3.0_1726753927288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_azaidi_face_en_5.5.0_3.0_1726753927288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_azaidi_face","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_azaidi_face", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_azaidi_face| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/azaidi-face/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline_en.md new file mode 100644 index 00000000000000..6c6e8548c83c5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline pipeline XlmRoBertaForTokenClassification from DinaSalama +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline` is a English model originally trained by DinaSalama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline_en_5.5.0_3.0_1726738380246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline_en_5.5.0_3.0_1726738380246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_dinasalama_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|851.8 MB| + +## References + +https://huggingface.co/DinaSalama/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline_en.md new file mode 100644 index 00000000000000..b322f6d42b5a9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline pipeline XlmRoBertaForTokenClassification from prudhvip21 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline` is a English model originally trained by prudhvip21. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline_en_5.5.0_3.0_1726753627145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline_en_5.5.0_3.0_1726753627145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_prudhvip21_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/prudhvip21/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_finetuned_emojis_1_client_toxic_cen_2_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_finetuned_emojis_1_client_toxic_cen_2_en.md new file mode 100644 index 00000000000000..9b51d49e9a6a7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_finetuned_emojis_1_client_toxic_cen_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_1_client_toxic_cen_2 XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_1_client_toxic_cen_2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_1_client_toxic_cen_2` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_1_client_toxic_cen_2_en_5.5.0_3.0_1726752183517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_1_client_toxic_cen_2_en_5.5.0_3.0_1726752183517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_1_client_toxic_cen_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_1_client_toxic_cen_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_1_client_toxic_cen_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-1-client-toxic-cen-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-0_00005_0_999_a98zhang_en.md b/docs/_posts/ahmedlone127/2024-09-20-0_00005_0_999_a98zhang_en.md new file mode 100644 index 00000000000000..8cb1371d0fd34e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-0_00005_0_999_a98zhang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 0_00005_0_999_a98zhang RoBertaForSequenceClassification from a98zhang +author: John Snow Labs +name: 0_00005_0_999_a98zhang +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0_00005_0_999_a98zhang` is a English model originally trained by a98zhang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0_00005_0_999_a98zhang_en_5.5.0_3.0_1726852109223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0_00005_0_999_a98zhang_en_5.5.0_3.0_1726852109223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_00005_0_999_a98zhang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_00005_0_999_a98zhang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0_00005_0_999_a98zhang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/a98zhang/0.00005_0.999 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-alberta_base_mathissimo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-alberta_base_mathissimo_pipeline_en.md new file mode 100644 index 00000000000000..ce3a0b69cfcbdf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-alberta_base_mathissimo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English alberta_base_mathissimo_pipeline pipeline RoBertaForSequenceClassification from Mathissimo +author: John Snow Labs +name: alberta_base_mathissimo_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alberta_base_mathissimo_pipeline` is a English model originally trained by Mathissimo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alberta_base_mathissimo_pipeline_en_5.5.0_3.0_1726850164600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alberta_base_mathissimo_pipeline_en_5.5.0_3.0_1726850164600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("alberta_base_mathissimo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("alberta_base_mathissimo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alberta_base_mathissimo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.6 MB| + +## References + +https://huggingface.co/Mathissimo/alberta_base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_banking_1000_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_banking_1000_16_5_oos_en.md new file mode 100644 index 00000000000000..feae548d60f70c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_banking_1000_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_banking_1000_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_banking_1000_16_5_oos +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_banking_1000_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_1000_16_5_oos_en_5.5.0_3.0_1726804609648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_1000_16_5_oos_en_5.5.0_3.0_1726804609648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_1000_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_1000_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_banking_1000_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-banking-1000-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_travel_4_16_5_oos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_travel_4_16_5_oos_pipeline_en.md new file mode 100644 index 00000000000000..38e8e4b65cf6b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_travel_4_16_5_oos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_travel_4_16_5_oos_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_travel_4_16_5_oos_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_travel_4_16_5_oos_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_4_16_5_oos_pipeline_en_5.5.0_3.0_1726804896730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_4_16_5_oos_pipeline_en_5.5.0_3.0_1726804896730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_travel_4_16_5_oos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_travel_4_16_5_oos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_travel_4_16_5_oos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-travel-4-16-5-oos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-azbertacontextualizedwordembeddingsinazerbaijanilanguage_en.md b/docs/_posts/ahmedlone127/2024-09-20-azbertacontextualizedwordembeddingsinazerbaijanilanguage_en.md new file mode 100644 index 00000000000000..c46abec428b787 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-azbertacontextualizedwordembeddingsinazerbaijanilanguage_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English azbertacontextualizedwordembeddingsinazerbaijanilanguage RoBertaEmbeddings from turalizada +author: John Snow Labs +name: azbertacontextualizedwordembeddingsinazerbaijanilanguage +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`azbertacontextualizedwordembeddingsinazerbaijanilanguage` is a English model originally trained by turalizada. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/azbertacontextualizedwordembeddingsinazerbaijanilanguage_en_5.5.0_3.0_1726857736688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/azbertacontextualizedwordembeddingsinazerbaijanilanguage_en_5.5.0_3.0_1726857736688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("azbertacontextualizedwordembeddingsinazerbaijanilanguage","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("azbertacontextualizedwordembeddingsinazerbaijanilanguage","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|azbertacontextualizedwordembeddingsinazerbaijanilanguage| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/turalizada/AzBERTaContextualizedWordEmbeddingsinAzerbaijaniLanguage \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_portuguese_cased_tiagosanti_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_portuguese_cased_tiagosanti_en.md new file mode 100644 index 00000000000000..8bf312ffa39f43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_portuguese_cased_tiagosanti_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_portuguese_cased_tiagosanti BertForSequenceClassification from TiagoSanti +author: John Snow Labs +name: bert_base_portuguese_cased_tiagosanti +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_cased_tiagosanti` is a English model originally trained by TiagoSanti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_tiagosanti_en_5.5.0_3.0_1726859840548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_tiagosanti_en_5.5.0_3.0_1726859840548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_cased_tiagosanti","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_cased_tiagosanti", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_cased_tiagosanti| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/TiagoSanti/bert-base-portuguese-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetuned_set_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetuned_set_3_pipeline_en.md new file mode 100644 index 00000000000000..4f32a1df5f3902 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetuned_set_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_set_3_pipeline pipeline BertForSequenceClassification from joetey +author: John Snow Labs +name: bert_base_uncased_finetuned_set_3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_set_3_pipeline` is a English model originally trained by joetey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_set_3_pipeline_en_5.5.0_3.0_1726797285307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_set_3_pipeline_en_5.5.0_3.0_1726797285307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_set_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_set_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_set_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/joetey/bert-base-uncased-finetuned-set_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_qqp_modeltc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_qqp_modeltc_pipeline_en.md new file mode 100644 index 00000000000000..633a6390e477d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_qqp_modeltc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_qqp_modeltc_pipeline pipeline BertForSequenceClassification from ModelTC +author: John Snow Labs +name: bert_base_uncased_qqp_modeltc_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qqp_modeltc_pipeline` is a English model originally trained by ModelTC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qqp_modeltc_pipeline_en_5.5.0_3.0_1726828589206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qqp_modeltc_pipeline_en_5.5.0_3.0_1726828589206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_qqp_modeltc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_qqp_modeltc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qqp_modeltc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ModelTC/bert-base-uncased-qqp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_classification_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_classification_en.md new file mode 100644 index 00000000000000..22daa376caf6de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_classification DistilBertForSequenceClassification from mdp0999 +author: John Snow Labs +name: bert_classification +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classification` is a English model originally trained by mdp0999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classification_en_5.5.0_3.0_1726860933086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classification_en_5.5.0_3.0_1726860933086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mdp0999/bert_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_en.md new file mode 100644 index 00000000000000..2153422414e1f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert RoBertaEmbeddings from ai-ar +author: John Snow Labs +name: bert +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert` is a English model originally trained by ai-ar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_en_5.5.0_3.0_1726816418936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_en_5.5.0_3.0_1726816418936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ai-ar/bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_math_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_math_en.md new file mode 100644 index 00000000000000..71b6a4269be696 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_math_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_math DistilBertForSequenceClassification from CrissWang +author: John Snow Labs +name: bert_math +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_math` is a English model originally trained by CrissWang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_math_en_5.5.0_3.0_1726871615556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_math_en_5.5.0_3.0_1726871615556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_math","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_math", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_math| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/CrissWang/bert-math \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_next_word_prediction_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_next_word_prediction_pipeline_en.md new file mode 100644 index 00000000000000..2204d4826a3ec0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_next_word_prediction_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_next_word_prediction_pipeline pipeline BertEmbeddings from MattNandavong +author: John Snow Labs +name: bert_next_word_prediction_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_next_word_prediction_pipeline` is a English model originally trained by MattNandavong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_next_word_prediction_pipeline_en_5.5.0_3.0_1726825722673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_next_word_prediction_pipeline_en_5.5.0_3.0_1726825722673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_next_word_prediction_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_next_word_prediction_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_next_word_prediction_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/MattNandavong/bert-next-word-prediction + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_pipeline_en.md new file mode 100644 index 00000000000000..0af97ad2c7243f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_pipeline pipeline RoBertaEmbeddings from ai-ar +author: John Snow Labs +name: bert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_pipeline` is a English model originally trained by ai-ar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_pipeline_en_5.5.0_3.0_1726816482305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_pipeline_en_5.5.0_3.0_1726816482305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ai-ar/bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_zdaniar_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_zdaniar_en.md new file mode 100644 index 00000000000000..7a680b58f8c7ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_zdaniar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_zdaniar RoBertaEmbeddings from zdaniar +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_zdaniar +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_zdaniar` is a English model originally trained by zdaniar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_zdaniar_en_5.5.0_3.0_1726796423076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_zdaniar_en_5.5.0_3.0_1726796423076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_zdaniar","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_zdaniar","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_zdaniar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.3 MB| + +## References + +https://huggingface.co/zdaniar/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_blitzapurva_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_blitzapurva_pipeline_en.md new file mode 100644 index 00000000000000..ae467dcf656848 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_blitzapurva_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_blitzapurva_pipeline pipeline DistilBertForSequenceClassification from blitzapurva +author: John Snow Labs +name: burmese_awesome_model_blitzapurva_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_blitzapurva_pipeline` is a English model originally trained by blitzapurva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_blitzapurva_pipeline_en_5.5.0_3.0_1726848744568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_blitzapurva_pipeline_en_5.5.0_3.0_1726848744568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_blitzapurva_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_blitzapurva_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_blitzapurva_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/blitzapurva/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bsgreenb_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bsgreenb_en.md new file mode 100644 index 00000000000000..beda77d20a5b3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bsgreenb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_bsgreenb DistilBertForSequenceClassification from bsgreenb +author: John Snow Labs +name: burmese_awesome_model_bsgreenb +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_bsgreenb` is a English model originally trained by bsgreenb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bsgreenb_en_5.5.0_3.0_1726832954871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bsgreenb_en_5.5.0_3.0_1726832954871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bsgreenb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bsgreenb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_bsgreenb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bsgreenb/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bsgreenb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bsgreenb_pipeline_en.md new file mode 100644 index 00000000000000..f358b353185e7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_bsgreenb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_bsgreenb_pipeline pipeline DistilBertForSequenceClassification from bsgreenb +author: John Snow Labs +name: burmese_awesome_model_bsgreenb_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_bsgreenb_pipeline` is a English model originally trained by bsgreenb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bsgreenb_pipeline_en_5.5.0_3.0_1726832968996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bsgreenb_pipeline_en_5.5.0_3.0_1726832968996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_bsgreenb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_bsgreenb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_bsgreenb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bsgreenb/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gauravr12060102_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gauravr12060102_pipeline_en.md new file mode 100644 index 00000000000000..afdb6bb4a0f432 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gauravr12060102_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_gauravr12060102_pipeline pipeline DistilBertForSequenceClassification from GauravR12060102 +author: John Snow Labs +name: burmese_awesome_model_gauravr12060102_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_gauravr12060102_pipeline` is a English model originally trained by GauravR12060102. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gauravr12060102_pipeline_en_5.5.0_3.0_1726832408770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gauravr12060102_pipeline_en_5.5.0_3.0_1726832408770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_gauravr12060102_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_gauravr12060102_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_gauravr12060102_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/GauravR12060102/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_rk212_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_rk212_pipeline_en.md new file mode 100644 index 00000000000000..767a7b76e4b30e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_rk212_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_rk212_pipeline pipeline DistilBertForSequenceClassification from rk212 +author: John Snow Labs +name: burmese_awesome_model_rk212_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_rk212_pipeline` is a English model originally trained by rk212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_rk212_pipeline_en_5.5.0_3.0_1726833050390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_rk212_pipeline_en_5.5.0_3.0_1726833050390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_rk212_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_rk212_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_rk212_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rk212/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_souh333_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_souh333_pipeline_en.md new file mode 100644 index 00000000000000..bf5e01285ffbc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_souh333_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_souh333_pipeline pipeline DistilBertForSequenceClassification from Souh333 +author: John Snow Labs +name: burmese_awesome_model_souh333_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_souh333_pipeline` is a English model originally trained by Souh333. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_souh333_pipeline_en_5.5.0_3.0_1726832840123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_souh333_pipeline_en_5.5.0_3.0_1726832840123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_souh333_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_souh333_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_souh333_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Souh333/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_thepixel42_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_thepixel42_en.md new file mode 100644 index 00000000000000..5407dd9f6712ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_thepixel42_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_thepixel42 DistilBertForSequenceClassification from thePixel42 +author: John Snow Labs +name: burmese_awesome_model_thepixel42 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_thepixel42` is a English model originally trained by thePixel42. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thepixel42_en_5.5.0_3.0_1726809082914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thepixel42_en_5.5.0_3.0_1726809082914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_thepixel42","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_thepixel42", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_thepixel42| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thePixel42/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_qa_model_anamgarcia_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_qa_model_anamgarcia_pipeline_en.md new file mode 100644 index 00000000000000..e45a98563f89cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_qa_model_anamgarcia_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_anamgarcia_pipeline pipeline DistilBertForQuestionAnswering from anamgarcia +author: John Snow Labs +name: burmese_awesome_qa_model_anamgarcia_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_anamgarcia_pipeline` is a English model originally trained by anamgarcia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_anamgarcia_pipeline_en_5.5.0_3.0_1726851177809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_anamgarcia_pipeline_en_5.5.0_3.0_1726851177809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_anamgarcia_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_anamgarcia_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_anamgarcia_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/anamgarcia/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_wnut_model_kanansharmaa_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_wnut_model_kanansharmaa_en.md new file mode 100644 index 00000000000000..57bfe1e1c3b0b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_wnut_model_kanansharmaa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_kanansharmaa RoBertaForTokenClassification from kanansharmaa +author: John Snow Labs +name: burmese_awesome_wnut_model_kanansharmaa +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_kanansharmaa` is a English model originally trained by kanansharmaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_kanansharmaa_en_5.5.0_3.0_1726847270855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_kanansharmaa_en_5.5.0_3.0_1726847270855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("burmese_awesome_wnut_model_kanansharmaa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("burmese_awesome_wnut_model_kanansharmaa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_kanansharmaa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|434.2 MB| + +## References + +https://huggingface.co/kanansharmaa/my_awesome_wnut_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_wnut_model_kanansharmaa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_wnut_model_kanansharmaa_pipeline_en.md new file mode 100644 index 00000000000000..d3bd09afe357cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_wnut_model_kanansharmaa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_kanansharmaa_pipeline pipeline RoBertaForTokenClassification from kanansharmaa +author: John Snow Labs +name: burmese_awesome_wnut_model_kanansharmaa_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_kanansharmaa_pipeline` is a English model originally trained by kanansharmaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_kanansharmaa_pipeline_en_5.5.0_3.0_1726847303753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_kanansharmaa_pipeline_en_5.5.0_3.0_1726847303753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_model_kanansharmaa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_model_kanansharmaa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_kanansharmaa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|434.2 MB| + +## References + +https://huggingface.co/kanansharmaa/my_awesome_wnut_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_idea_classification_model_trial_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_idea_classification_model_trial_1_pipeline_en.md new file mode 100644 index 00000000000000..87d33813a5d4d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_idea_classification_model_trial_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_idea_classification_model_trial_1_pipeline pipeline DistilBertForSequenceClassification from manimaranpa07 +author: John Snow Labs +name: burmese_idea_classification_model_trial_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_idea_classification_model_trial_1_pipeline` is a English model originally trained by manimaranpa07. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_idea_classification_model_trial_1_pipeline_en_5.5.0_3.0_1726809114485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_idea_classification_model_trial_1_pipeline_en_5.5.0_3.0_1726809114485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_idea_classification_model_trial_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_idea_classification_model_trial_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_idea_classification_model_trial_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/manimaranpa07/my_idea_classification_model_trial_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_model1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_model1_pipeline_en.md new file mode 100644 index 00000000000000..a112a77a943a81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_model1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_model1_pipeline pipeline DistilBertForSequenceClassification from Asadbek1 +author: John Snow Labs +name: burmese_model1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_model1_pipeline` is a English model originally trained by Asadbek1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_model1_pipeline_en_5.5.0_3.0_1726842236201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_model1_pipeline_en_5.5.0_3.0_1726842236201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_model1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_model1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_model1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Asadbek1/my_model1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_model_jiangwf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_model_jiangwf_pipeline_en.md new file mode 100644 index 00000000000000..516ddf8809d81d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_model_jiangwf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_model_jiangwf_pipeline pipeline DistilBertForSequenceClassification from jiangwf +author: John Snow Labs +name: burmese_model_jiangwf_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_model_jiangwf_pipeline` is a English model originally trained by jiangwf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_model_jiangwf_pipeline_en_5.5.0_3.0_1726842093245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_model_jiangwf_pipeline_en_5.5.0_3.0_1726842093245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_model_jiangwf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_model_jiangwf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_model_jiangwf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jiangwf/my_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_model_mlituma_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_model_mlituma_en.md new file mode 100644 index 00000000000000..4c99404c50fda9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_model_mlituma_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_model_mlituma DistilBertForSequenceClassification from mlituma +author: John Snow Labs +name: burmese_model_mlituma +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_model_mlituma` is a English model originally trained by mlituma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_model_mlituma_en_5.5.0_3.0_1726841322048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_model_mlituma_en_5.5.0_3.0_1726841322048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_model_mlituma","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_model_mlituma", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_model_mlituma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mlituma/my_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cat_1_html_distilbert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-20-cat_1_html_distilbert_base_uncased_en.md new file mode 100644 index 00000000000000..4ae3a8990af64d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cat_1_html_distilbert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_1_html_distilbert_base_uncased DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: cat_1_html_distilbert_base_uncased +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_1_html_distilbert_base_uncased` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_1_html_distilbert_base_uncased_en_5.5.0_3.0_1726832500256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_1_html_distilbert_base_uncased_en_5.5.0_3.0_1726832500256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("cat_1_html_distilbert_base_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("cat_1_html_distilbert_base_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_1_html_distilbert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/cat-1-html-distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cyberta_en.md b/docs/_posts/ahmedlone127/2024-09-20-cyberta_en.md new file mode 100644 index 00000000000000..2a8bf6f83b41fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cyberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cyberta RoBertaEmbeddings from mstaron +author: John Snow Labs +name: cyberta +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cyberta` is a English model originally trained by mstaron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cyberta_en_5.5.0_3.0_1726816179846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cyberta_en_5.5.0_3.0_1726816179846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("cyberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("cyberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cyberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.9 MB| + +## References + +https://huggingface.co/mstaron/CyBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-descr_class_two_cm_en.md b/docs/_posts/ahmedlone127/2024-09-20-descr_class_two_cm_en.md new file mode 100644 index 00000000000000..7031f5771d7cfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-descr_class_two_cm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English descr_class_two_cm DistilBertForSequenceClassification from BanananaMax +author: John Snow Labs +name: descr_class_two_cm +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`descr_class_two_cm` is a English model originally trained by BanananaMax. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/descr_class_two_cm_en_5.5.0_3.0_1726849038165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/descr_class_two_cm_en_5.5.0_3.0_1726849038165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("descr_class_two_cm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("descr_class_two_cm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|descr_class_two_cm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/BanananaMax/descr_class_two_cm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_cased_airlines_news_multi_label_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_cased_airlines_news_multi_label_pipeline_en.md new file mode 100644 index 00000000000000..ab210bcf3d2b0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_cased_airlines_news_multi_label_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_cased_airlines_news_multi_label_pipeline pipeline DistilBertForSequenceClassification from dahe827 +author: John Snow Labs +name: distilbert_base_cased_airlines_news_multi_label_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_airlines_news_multi_label_pipeline` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_airlines_news_multi_label_pipeline_en_5.5.0_3.0_1726792392806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_airlines_news_multi_label_pipeline_en_5.5.0_3.0_1726792392806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_airlines_news_multi_label_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_airlines_news_multi_label_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_airlines_news_multi_label_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/dahe827/distilbert-base-cased-airlines-news-multi-label + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline_xx.md new file mode 100644 index 00000000000000..a34ae05a4ba19e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline pipeline DistilBertForSequenceClassification from youssefkhalil320 +author: John Snow Labs +name: distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline +date: 2024-09-20 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline` is a Multilingual model originally trained by youssefkhalil320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline_xx_5.5.0_3.0_1726809697652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline_xx_5.5.0_3.0_1726809697652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_resumesclasssifierv1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.8 MB| + +## References + +https://huggingface.co/youssefkhalil320/distilbert-base-multilingual-cased-resumesClasssifierV1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_akashjoy_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_akashjoy_en.md new file mode 100644 index 00000000000000..d1b2f4396bfd51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_akashjoy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_akashjoy DistilBertForSequenceClassification from akashjoy +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_akashjoy +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_akashjoy` is a English model originally trained by akashjoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_akashjoy_en_5.5.0_3.0_1726842466550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_akashjoy_en_5.5.0_3.0_1726842466550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_akashjoy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_akashjoy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_akashjoy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/akashjoy/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_khalidr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_khalidr_pipeline_en.md new file mode 100644 index 00000000000000..a1c5cc264b3afd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_khalidr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_khalidr_pipeline pipeline DistilBertForSequenceClassification from khalidr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_khalidr_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_khalidr_pipeline` is a English model originally trained by khalidr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_khalidr_pipeline_en_5.5.0_3.0_1726841568600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_khalidr_pipeline_en_5.5.0_3.0_1726841568600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_khalidr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_khalidr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_khalidr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/khalidr/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_k_kiron_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_k_kiron_pipeline_en.md new file mode 100644 index 00000000000000..ac4fed97273ad2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_k_kiron_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_k_kiron_pipeline pipeline DistilBertForSequenceClassification from K-kiron +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_k_kiron_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_k_kiron_pipeline` is a English model originally trained by K-kiron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_k_kiron_pipeline_en_5.5.0_3.0_1726823946955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_k_kiron_pipeline_en_5.5.0_3.0_1726823946955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_k_kiron_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_k_kiron_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_k_kiron_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-kiron/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_chhabi_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_chhabi_en.md new file mode 100644 index 00000000000000..e1ec9e5d0c51a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_chhabi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_chhabi DistilBertForSequenceClassification from Chhabi +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_chhabi +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_chhabi` is a English model originally trained by Chhabi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_chhabi_en_5.5.0_3.0_1726823530293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_chhabi_en_5.5.0_3.0_1726823530293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_chhabi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_chhabi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_chhabi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Chhabi/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_edarmartinez_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_edarmartinez_en.md new file mode 100644 index 00000000000000..2f915292eee438 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_edarmartinez_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_edarmartinez DistilBertForSequenceClassification from edarmartinez +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_edarmartinez +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_edarmartinez` is a English model originally trained by edarmartinez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edarmartinez_en_5.5.0_3.0_1726830224241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edarmartinez_en_5.5.0_3.0_1726830224241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_edarmartinez","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_edarmartinez", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_edarmartinez| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/edarmartinez/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_jhagege_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_jhagege_en.md new file mode 100644 index 00000000000000..a8eb1592a63916 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_jhagege_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jhagege DistilBertForSequenceClassification from jhagege +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jhagege +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jhagege` is a English model originally trained by jhagege. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhagege_en_5.5.0_3.0_1726842163474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhagege_en_5.5.0_3.0_1726842163474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jhagege","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jhagege", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jhagege| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jhagege/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_lostsartre_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_lostsartre_en.md new file mode 100644 index 00000000000000..2a4457072f4fb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_lostsartre_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_lostsartre DistilBertForSequenceClassification from lostsartre +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_lostsartre +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_lostsartre` is a English model originally trained by lostsartre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lostsartre_en_5.5.0_3.0_1726841442221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lostsartre_en_5.5.0_3.0_1726841442221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_lostsartre","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_lostsartre", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_lostsartre| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lostsartre/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_intent_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_intent_pipeline_en.md new file mode 100644 index 00000000000000..df083eb9ff8a95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_intent_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_intent_pipeline pipeline DistilBertForSequenceClassification from avivnat13 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_intent_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_intent_pipeline` is a English model originally trained by avivnat13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_intent_pipeline_en_5.5.0_3.0_1726842387548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_intent_pipeline_en_5.5.0_3.0_1726842387548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_intent_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_intent_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_intent_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/avivnat13/distilbert-base-uncased-finetuned-intent + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_scam_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_scam_classification_pipeline_en.md new file mode 100644 index 00000000000000..a6a30702bbbcab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_scam_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_scam_classification_pipeline pipeline DistilBertForSequenceClassification from jaranohaal +author: John Snow Labs +name: distilbert_base_uncased_finetuned_scam_classification_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_scam_classification_pipeline` is a English model originally trained by jaranohaal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_scam_classification_pipeline_en_5.5.0_3.0_1726809539259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_scam_classification_pipeline_en_5.5.0_3.0_1726809539259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_scam_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_scam_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_scam_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jaranohaal/distilbert-base-uncased-finetuned-scam-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline_en.md new file mode 100644 index 00000000000000..97f4d20e1b30cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline_en_5.5.0_3.0_1726841359815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline_en_5.5.0_3.0_1726841359815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1large11pfxnf_simsp400_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st11sd_ut72ut1large11PfxNf_simsp400_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_en.md new file mode 100644 index 00000000000000..1a8b4dc8e37871 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_en_5.5.0_3.0_1726841211476.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200_en_5.5.0_3.0_1726841211476.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st14sd_ut72ut5_plprefix0stlarge14_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st14sd_ut72ut5_PLPrefix0stlarge14_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_en.md new file mode 100644 index 00000000000000..1b7e621235df04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_en_5.5.0_3.0_1726848629986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100_en_5.5.0_3.0_1726848629986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1large15pfxnf_simsp400_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st15sd_ut72ut1large15PfxNf_simsp400_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline_en.md new file mode 100644 index 00000000000000..2e1c4eb634e204 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline_en_5.5.0_3.0_1726823767070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline_en_5.5.0_3.0_1726823767070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge80_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_en.md new file mode 100644 index 00000000000000..111bd1992a91e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_en_5.5.0_3.0_1726848930136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_en_5.5.0_3.0_1726848930136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large90PfxNf_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..79def0c8663602 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline_en_5.5.0_3.0_1726849043401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline_en_5.5.0_3.0_1726849043401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut1_PLPrefix0stlarge5_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_en.md new file mode 100644 index 00000000000000..ad23a6b39b51be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_en_5.5.0_3.0_1726848524010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200_en_5.5.0_3.0_1726848524010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp400_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut1_PLPrefix0stlarge5_simsp400_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline_en.md new file mode 100644 index 00000000000000..4665dc5d10a931 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline pipeline DistilBertForSequenceClassification from Mou11209203 +author: John Snow Labs +name: distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline` is a English model originally trained by Mou11209203. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline_en_5.5.0_3.0_1726848947954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline_en_5.5.0_3.0_1726848947954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_stock_classification_finetuned_dcard_epoch2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mou11209203/distilbert-base-uncased_stock_classification_finetuned_dcard_epoch2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_en.md new file mode 100644 index 00000000000000..3088de542d7584 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_en_5.5.0_3.0_1726823471280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_en_5.5.0_3.0_1726823471280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_product_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_product_classifier_pipeline_en.md new file mode 100644 index 00000000000000..233f038cf1bf93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_product_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_product_classifier_pipeline pipeline BertForSequenceClassification from SavvySpender +author: John Snow Labs +name: distilbert_product_classifier_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_product_classifier_pipeline` is a English model originally trained by SavvySpender. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_product_classifier_pipeline_en_5.5.0_3.0_1726803653229.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_product_classifier_pipeline_en_5.5.0_3.0_1726803653229.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_product_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_product_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_product_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/SavvySpender/distilbert-product-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline_en.md new file mode 100644 index 00000000000000..ad666099497c17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline_en_5.5.0_3.0_1726840909601.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline_en_5.5.0_3.0_1726840909601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_qqp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|251.2 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_qqp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sql_timeout_classifier_with_features_4096_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sql_timeout_classifier_with_features_4096_en.md new file mode 100644 index 00000000000000..dda2c342f11484 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sql_timeout_classifier_with_features_4096_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sql_timeout_classifier_with_features_4096 DistilBertForSequenceClassification from Lifehouse +author: John Snow Labs +name: distilbert_sql_timeout_classifier_with_features_4096 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sql_timeout_classifier_with_features_4096` is a English model originally trained by Lifehouse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_features_4096_en_5.5.0_3.0_1726823659753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_features_4096_en_5.5.0_3.0_1726823659753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sql_timeout_classifier_with_features_4096","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sql_timeout_classifier_with_features_4096", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sql_timeout_classifier_with_features_4096| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|259.8 MB| + +## References + +https://huggingface.co/Lifehouse/distilbert-sql-timeout-classifier-with-features-4096 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distillbert_qsc_en.md b/docs/_posts/ahmedlone127/2024-09-20-distillbert_qsc_en.md new file mode 100644 index 00000000000000..7c875b265b08e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distillbert_qsc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distillbert_qsc DistilBertForSequenceClassification from thehyperpineapple +author: John Snow Labs +name: distillbert_qsc +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_qsc` is a English model originally trained by thehyperpineapple. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_qsc_en_5.5.0_3.0_1726824079221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_qsc_en_5.5.0_3.0_1726824079221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distillbert_qsc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distillbert_qsc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_qsc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thehyperpineapple/DistillBERT-QSC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilrubert_tiny_2nd_finetune_epru_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilrubert_tiny_2nd_finetune_epru_pipeline_en.md new file mode 100644 index 00000000000000..b745136675ee08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilrubert_tiny_2nd_finetune_epru_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilrubert_tiny_2nd_finetune_epru_pipeline pipeline DistilBertForSequenceClassification from mmillet +author: John Snow Labs +name: distilrubert_tiny_2nd_finetune_epru_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilrubert_tiny_2nd_finetune_epru_pipeline` is a English model originally trained by mmillet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilrubert_tiny_2nd_finetune_epru_pipeline_en_5.5.0_3.0_1726848741278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilrubert_tiny_2nd_finetune_epru_pipeline_en_5.5.0_3.0_1726848741278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilrubert_tiny_2nd_finetune_epru_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilrubert_tiny_2nd_finetune_epru_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilrubert_tiny_2nd_finetune_epru_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|39.3 MB| + +## References + +https://huggingface.co/mmillet/distilrubert-tiny-2nd-finetune-epru + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..ba5fa169e058ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726852379610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726852379610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed1_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed1-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-emotion_detector_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-emotion_detector_pipeline_en.md new file mode 100644 index 00000000000000..78792d62d3ebc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-emotion_detector_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_detector_pipeline pipeline DistilBertForSequenceClassification from Foulbubble +author: John Snow Labs +name: emotion_detector_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_detector_pipeline` is a English model originally trained by Foulbubble. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_detector_pipeline_en_5.5.0_3.0_1726809323317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_detector_pipeline_en_5.5.0_3.0_1726809323317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_detector_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_detector_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_detector_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Foulbubble/Emotion-Detector + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en.md new file mode 100644 index 00000000000000..13ba902000ce30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline pipeline BertForQuestionAnswering from muhammadravi251001 +author: John Snow Labs +name: fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline` is a English model originally trained by muhammadravi251001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en_5.5.0_3.0_1726820475729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en_5.5.0_3.0_1726820475729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/muhammadravi251001/fine-tuned-DatasetQAS-IDK-MRC-with-indobert-base-uncased-with-ITTL-without-freeze-LR-1e-05 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_aaaaaiden_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_aaaaaiden_en.md new file mode 100644 index 00000000000000..45061513efbe26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_aaaaaiden_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_aaaaaiden DistilBertForSequenceClassification from AAAAAiden +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_aaaaaiden +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_aaaaaiden` is a English model originally trained by AAAAAiden. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aaaaaiden_en_5.5.0_3.0_1726832945070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aaaaaiden_en_5.5.0_3.0_1726832945070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_aaaaaiden","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_aaaaaiden", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_aaaaaiden| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AAAAAiden/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_bijupv_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_bijupv_en.md new file mode 100644 index 00000000000000..0e100e999f868e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_bijupv_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_bijupv DistilBertForSequenceClassification from BijuPV +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_bijupv +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_bijupv` is a English model originally trained by BijuPV. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bijupv_en_5.5.0_3.0_1726792302809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bijupv_en_5.5.0_3.0_1726792302809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_bijupv","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_bijupv", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_bijupv| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BijuPV/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mk_20_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mk_20_en.md new file mode 100644 index 00000000000000..c3782613cf302b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mk_20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_mk_20 DistilBertForSequenceClassification from mk-20 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_mk_20 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_mk_20` is a English model originally trained by mk-20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mk_20_en_5.5.0_3.0_1726809105407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mk_20_en_5.5.0_3.0_1726809105407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_mk_20","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_mk_20", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_mk_20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mk-20/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tanushgolwala_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tanushgolwala_en.md new file mode 100644 index 00000000000000..2f3d829f5df5e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tanushgolwala_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_tanushgolwala DistilBertForSequenceClassification from tanushgolwala +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_tanushgolwala +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_tanushgolwala` is a English model originally trained by tanushgolwala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_tanushgolwala_en_5.5.0_3.0_1726824072566.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_tanushgolwala_en_5.5.0_3.0_1726824072566.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_tanushgolwala","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_tanushgolwala", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_tanushgolwala| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tanushgolwala/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline_en.md new file mode 100644 index 00000000000000..8880403b339f59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline pipeline DistilBertForSequenceClassification from tanushgolwala +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline` is a English model originally trained by tanushgolwala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline_en_5.5.0_3.0_1726824084091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline_en_5.5.0_3.0_1726824084091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_tanushgolwala_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tanushgolwala/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_enriquer_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_enriquer_en.md new file mode 100644 index 00000000000000..8a3ec2b1288763 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_enriquer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_enriquer DistilBertForSequenceClassification from EnriqueR +author: John Snow Labs +name: finetuning_sentiment_model_enriquer +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_enriquer` is a English model originally trained by EnriqueR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_enriquer_en_5.5.0_3.0_1726792530274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_enriquer_en_5.5.0_3.0_1726792530274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_enriquer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_enriquer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_enriquer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EnriqueR/finetuning-sentiment-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ft_distilroberta_base_with_askscience_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ft_distilroberta_base_with_askscience_pipeline_en.md new file mode 100644 index 00000000000000..d2d66f124bb82c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ft_distilroberta_base_with_askscience_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ft_distilroberta_base_with_askscience_pipeline pipeline RoBertaEmbeddings from aisuko +author: John Snow Labs +name: ft_distilroberta_base_with_askscience_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_distilroberta_base_with_askscience_pipeline` is a English model originally trained by aisuko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_distilroberta_base_with_askscience_pipeline_en_5.5.0_3.0_1726796320099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_distilroberta_base_with_askscience_pipeline_en_5.5.0_3.0_1726796320099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ft_distilroberta_base_with_askscience_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ft_distilroberta_base_with_askscience_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_distilroberta_base_with_askscience_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/aisuko/ft-distilroberta-base-with-askscience + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random3_seed1_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random3_seed1_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..da7ef4e8fac96c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random3_seed1_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_random3_seed1_roberta_base_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random3_seed1_roberta_base_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random3_seed1_roberta_base_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random3_seed1_roberta_base_pipeline_en_5.5.0_3.0_1726804464538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random3_seed1_roberta_base_pipeline_en_5.5.0_3.0_1726804464538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_random3_seed1_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_random3_seed1_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random3_seed1_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.6 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random3_seed1-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hf_qa_bert_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-hf_qa_bert_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..eb8f480158fa96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hf_qa_bert_base_uncased_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English hf_qa_bert_base_uncased_pipeline pipeline BertForQuestionAnswering from rinogrego +author: John Snow Labs +name: hf_qa_bert_base_uncased_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hf_qa_bert_base_uncased_pipeline` is a English model originally trained by rinogrego. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hf_qa_bert_base_uncased_pipeline_en_5.5.0_3.0_1726808128012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hf_qa_bert_base_uncased_pipeline_en_5.5.0_3.0_1726808128012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hf_qa_bert_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hf_qa_bert_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hf_qa_bert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/rinogrego/HF-QA-bert-base-uncased + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ieq_bert_en.md b/docs/_posts/ahmedlone127/2024-09-20-ieq_bert_en.md new file mode 100644 index 00000000000000..a8e641bac0555f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ieq_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ieq_bert BertForSequenceClassification from ieq +author: John Snow Labs +name: ieq_bert +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ieq_bert` is a English model originally trained by ieq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ieq_bert_en_5.5.0_3.0_1726828900783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ieq_bert_en_5.5.0_3.0_1726828900783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ieq_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ieq_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ieq_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ieq/IEQ-BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-infoxlm_base_on_custom_kural_500_en.md b/docs/_posts/ahmedlone127/2024-09-20-infoxlm_base_on_custom_kural_500_en.md new file mode 100644 index 00000000000000..8cfcf5d48fc0d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-infoxlm_base_on_custom_kural_500_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English infoxlm_base_on_custom_kural_500 XlmRoBertaForSequenceClassification from bikram22pi7 +author: John Snow Labs +name: infoxlm_base_on_custom_kural_500 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`infoxlm_base_on_custom_kural_500` is a English model originally trained by bikram22pi7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/infoxlm_base_on_custom_kural_500_en_5.5.0_3.0_1726846313309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/infoxlm_base_on_custom_kural_500_en_5.5.0_3.0_1726846313309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("infoxlm_base_on_custom_kural_500","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("infoxlm_base_on_custom_kural_500", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|infoxlm_base_on_custom_kural_500| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|777.7 MB| + +## References + +https://huggingface.co/bikram22pi7/infoxlm-base-on-custom-kural-500 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-job_listing_filtering_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-job_listing_filtering_model_pipeline_en.md new file mode 100644 index 00000000000000..d7a7b64712f333 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-job_listing_filtering_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English job_listing_filtering_model_pipeline pipeline XlmRoBertaForSequenceClassification from saattrupdan +author: John Snow Labs +name: job_listing_filtering_model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`job_listing_filtering_model_pipeline` is a English model originally trained by saattrupdan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/job_listing_filtering_model_pipeline_en_5.5.0_3.0_1726846136591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/job_listing_filtering_model_pipeline_en_5.5.0_3.0_1726846136591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("job_listing_filtering_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("job_listing_filtering_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|job_listing_filtering_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|776.3 MB| + +## References + +https://huggingface.co/saattrupdan/job-listing-filtering-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-lenate_model_8_en.md b/docs/_posts/ahmedlone127/2024-09-20-lenate_model_8_en.md new file mode 100644 index 00000000000000..acfe6138367667 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-lenate_model_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lenate_model_8 DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: lenate_model_8 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenate_model_8` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenate_model_8_en_5.5.0_3.0_1726832516504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenate_model_8_en_5.5.0_3.0_1726832516504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("lenate_model_8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("lenate_model_8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenate_model_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/lenate_model_8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-lenate_model_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-lenate_model_8_pipeline_en.md new file mode 100644 index 00000000000000..cbec513de707b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-lenate_model_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lenate_model_8_pipeline pipeline DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: lenate_model_8_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenate_model_8_pipeline` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenate_model_8_pipeline_en_5.5.0_3.0_1726832529038.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenate_model_8_pipeline_en_5.5.0_3.0_1726832529038.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lenate_model_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lenate_model_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenate_model_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/lenate_model_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-llm_hw1_en.md b/docs/_posts/ahmedlone127/2024-09-20-llm_hw1_en.md new file mode 100644 index 00000000000000..800609e6616855 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-llm_hw1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llm_hw1 DistilBertForSequenceClassification from Chenbirdy +author: John Snow Labs +name: llm_hw1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_hw1` is a English model originally trained by Chenbirdy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_hw1_en_5.5.0_3.0_1726809438132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_hw1_en_5.5.0_3.0_1726809438132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_hw1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_hw1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_hw1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Chenbirdy/LLM-HW1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-marbertv2_finetuned_egyptian_hate_speech_detection_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-20-marbertv2_finetuned_egyptian_hate_speech_detection_pipeline_ar.md new file mode 100644 index 00000000000000..2ee16ac4621941 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-marbertv2_finetuned_egyptian_hate_speech_detection_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic marbertv2_finetuned_egyptian_hate_speech_detection_pipeline pipeline BertForSequenceClassification from IbrahimAmin +author: John Snow Labs +name: marbertv2_finetuned_egyptian_hate_speech_detection_pipeline +date: 2024-09-20 +tags: [ar, open_source, pipeline, onnx] +task: Text Classification +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marbertv2_finetuned_egyptian_hate_speech_detection_pipeline` is a Arabic model originally trained by IbrahimAmin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marbertv2_finetuned_egyptian_hate_speech_detection_pipeline_ar_5.5.0_3.0_1726860461568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marbertv2_finetuned_egyptian_hate_speech_detection_pipeline_ar_5.5.0_3.0_1726860461568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marbertv2_finetuned_egyptian_hate_speech_detection_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marbertv2_finetuned_egyptian_hate_speech_detection_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marbertv2_finetuned_egyptian_hate_speech_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|608.8 MB| + +## References + +https://huggingface.co/IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-model_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-model_3_pipeline_en.md new file mode 100644 index 00000000000000..019c3fd69462dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-model_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_3_pipeline pipeline BertForSequenceClassification from cannotbolt +author: John Snow Labs +name: model_3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_3_pipeline` is a English model originally trained by cannotbolt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_3_pipeline_en_5.5.0_3.0_1726870106996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_3_pipeline_en_5.5.0_3.0_1726870106996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cannotbolt/model_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_3e_4_nathanjlee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_3e_4_nathanjlee_pipeline_en.md new file mode 100644 index 00000000000000..9a2ecfd378ac1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp2_base_3e_4_nathanjlee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp2_base_3e_4_nathanjlee_pipeline pipeline DistilBertForSequenceClassification from NathanJLee +author: John Snow Labs +name: nlp2_base_3e_4_nathanjlee_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp2_base_3e_4_nathanjlee_pipeline` is a English model originally trained by NathanJLee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_4_nathanjlee_pipeline_en_5.5.0_3.0_1726849212811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_4_nathanjlee_pipeline_en_5.5.0_3.0_1726849212811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp2_base_3e_4_nathanjlee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp2_base_3e_4_nathanjlee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp2_base_3e_4_nathanjlee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/NathanJLee/NLP2_Base_3e-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ntuadlhw1_question_answering_en.md b/docs/_posts/ahmedlone127/2024-09-20-ntuadlhw1_question_answering_en.md new file mode 100644 index 00000000000000..a4e28ec5b42787 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ntuadlhw1_question_answering_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English ntuadlhw1_question_answering BertForQuestionAnswering from weitung8 +author: John Snow Labs +name: ntuadlhw1_question_answering +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ntuadlhw1_question_answering` is a English model originally trained by weitung8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ntuadlhw1_question_answering_en_5.5.0_3.0_1726834371107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ntuadlhw1_question_answering_en_5.5.0_3.0_1726834371107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("ntuadlhw1_question_answering","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("ntuadlhw1_question_answering", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ntuadlhw1_question_answering| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/weitung8/ntuadlhw1-question-answering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-pretrained_mario_bert_448_paths_ctx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-pretrained_mario_bert_448_paths_ctx_pipeline_en.md new file mode 100644 index 00000000000000..f165e7682a536f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-pretrained_mario_bert_448_paths_ctx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pretrained_mario_bert_448_paths_ctx_pipeline pipeline RoBertaEmbeddings from shyamsn97 +author: John Snow Labs +name: pretrained_mario_bert_448_paths_ctx_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pretrained_mario_bert_448_paths_ctx_pipeline` is a English model originally trained by shyamsn97. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pretrained_mario_bert_448_paths_ctx_pipeline_en_5.5.0_3.0_1726796424906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pretrained_mario_bert_448_paths_ctx_pipeline_en_5.5.0_3.0_1726796424906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pretrained_mario_bert_448_paths_ctx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pretrained_mario_bert_448_paths_ctx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pretrained_mario_bert_448_paths_ctx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/shyamsn97/pretrained-mario-bert-448-paths-ctx + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-qa_persian_bert_persian_farsi_zwnj_base_en.md b/docs/_posts/ahmedlone127/2024-09-20-qa_persian_bert_persian_farsi_zwnj_base_en.md new file mode 100644 index 00000000000000..948a8e31fffabc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-qa_persian_bert_persian_farsi_zwnj_base_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English qa_persian_bert_persian_farsi_zwnj_base BertForQuestionAnswering from makhataei +author: John Snow Labs +name: qa_persian_bert_persian_farsi_zwnj_base +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_persian_bert_persian_farsi_zwnj_base` is a English model originally trained by makhataei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_persian_bert_persian_farsi_zwnj_base_en_5.5.0_3.0_1726820527184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_persian_bert_persian_farsi_zwnj_base_en_5.5.0_3.0_1726820527184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("qa_persian_bert_persian_farsi_zwnj_base","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("qa_persian_bert_persian_farsi_zwnj_base", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_persian_bert_persian_farsi_zwnj_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|441.7 MB| + +## References + +https://huggingface.co/makhataei/qa-persian-bert-fa-zwnj-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_agnews_padding20model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_agnews_padding20model_pipeline_en.md new file mode 100644 index 00000000000000..a36ff83da75722 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_agnews_padding20model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_agnews_padding20model_pipeline pipeline RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: roberta_agnews_padding20model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_agnews_padding20model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_agnews_padding20model_pipeline_en_5.5.0_3.0_1726851921186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_agnews_padding20model_pipeline_en_5.5.0_3.0_1726851921186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_agnews_padding20model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_agnews_padding20model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_agnews_padding20model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/Realgon/roberta_agnews_padding20model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bc2gm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bc2gm_pipeline_en.md new file mode 100644 index 00000000000000..36f5245bdb0139 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bc2gm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bc2gm_pipeline pipeline RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_base_bc2gm_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bc2gm_pipeline` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bc2gm_pipeline_en_5.5.0_3.0_1726862370596.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bc2gm_pipeline_en_5.5.0_3.0_1726862370596.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bc2gm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bc2gm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bc2gm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.0 MB| + +## References + +https://huggingface.co/CheccoCando/roberta-base_bc2gm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_linear_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_linear_ner_pipeline_en.md new file mode 100644 index 00000000000000..50ed82baf14c71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_linear_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_linear_ner_pipeline pipeline RoBertaForTokenClassification from hlhdatscience +author: John Snow Labs +name: roberta_base_bne_linear_ner_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_linear_ner_pipeline` is a English model originally trained by hlhdatscience. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_linear_ner_pipeline_en_5.5.0_3.0_1726853360393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_linear_ner_pipeline_en_5.5.0_3.0_1726853360393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_linear_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_linear_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_linear_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|458.9 MB| + +## References + +https://huggingface.co/hlhdatscience/roberta-base-bne-Linear-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_disaster_tweets_downpour_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_disaster_tweets_downpour_en.md new file mode 100644 index 00000000000000..6977988ebf78b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_disaster_tweets_downpour_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_disaster_tweets_downpour RoBertaForSequenceClassification from maxschlake +author: John Snow Labs +name: roberta_base_disaster_tweets_downpour +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_disaster_tweets_downpour` is a English model originally trained by maxschlake. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_disaster_tweets_downpour_en_5.5.0_3.0_1726851603878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_disaster_tweets_downpour_en_5.5.0_3.0_1726851603878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_disaster_tweets_downpour","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_disaster_tweets_downpour", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_disaster_tweets_downpour| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.9 MB| + +## References + +https://huggingface.co/maxschlake/roberta-base_disaster_tweets_downpour \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_56_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_56_pipeline_en.md new file mode 100644 index 00000000000000..7019816a30d869 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_epoch_56_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_56_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_56_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_56_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_56_pipeline_en_5.5.0_3.0_1726793784768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_56_pipeline_en_5.5.0_3.0_1726793784768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_56_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_56_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_56_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_56 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_codesearchnet_nepal_bhasa_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_codesearchnet_nepal_bhasa_en.md new file mode 100644 index 00000000000000..ee819e0eb7022f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_codesearchnet_nepal_bhasa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_codesearchnet_nepal_bhasa RoBertaEmbeddings from shradha01 +author: John Snow Labs +name: roberta_codesearchnet_nepal_bhasa +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_codesearchnet_nepal_bhasa` is a English model originally trained by shradha01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_codesearchnet_nepal_bhasa_en_5.5.0_3.0_1726816311900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_codesearchnet_nepal_bhasa_en_5.5.0_3.0_1726816311900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_codesearchnet_nepal_bhasa","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_codesearchnet_nepal_bhasa","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_codesearchnet_nepal_bhasa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/shradha01/roberta_codesearchnet_new \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_detect_dep_v3_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_detect_dep_v3_en.md new file mode 100644 index 00000000000000..51aadbe9446eb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_detect_dep_v3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_detect_dep_v3 RoBertaForSequenceClassification from Trong-Nghia +author: John Snow Labs +name: roberta_large_detect_dep_v3 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_detect_dep_v3` is a English model originally trained by Trong-Nghia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_detect_dep_v3_en_5.5.0_3.0_1726851948037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_detect_dep_v3_en_5.5.0_3.0_1726851948037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_detect_dep_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_detect_dep_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_detect_dep_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Trong-Nghia/roberta-large-detect-dep-v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_lora_2_63m_snli_model1_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_lora_2_63m_snli_model1_en.md new file mode 100644 index 00000000000000..72d8d9cb25af9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_lora_2_63m_snli_model1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_lora_2_63m_snli_model1 RoBertaForSequenceClassification from varun-v-rao +author: John Snow Labs +name: roberta_large_lora_2_63m_snli_model1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_lora_2_63m_snli_model1` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_lora_2_63m_snli_model1_en_5.5.0_3.0_1726804895956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_lora_2_63m_snli_model1_en_5.5.0_3.0_1726804895956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_lora_2_63m_snli_model1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_lora_2_63m_snli_model1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_lora_2_63m_snli_model1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|915.0 MB| + +## References + +https://huggingface.co/varun-v-rao/roberta-large-lora-2.63M-snli-model1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_ontonotes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_ontonotes_pipeline_en.md new file mode 100644 index 00000000000000..ce1681ad7430cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_ontonotes_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_ontonotes_pipeline pipeline RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_large_ontonotes_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_ontonotes_pipeline` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_ontonotes_pipeline_en_5.5.0_3.0_1726862944918.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_ontonotes_pipeline_en_5.5.0_3.0_1726862944918.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_ontonotes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_ontonotes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_ontonotes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/CheccoCando/roberta-large_Ontonotes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_sayula_popoluca_tagging_amir01_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_sayula_popoluca_tagging_amir01_en.md new file mode 100644 index 00000000000000..4c4a5180c4f1c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_sayula_popoluca_tagging_amir01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_sayula_popoluca_tagging_amir01 RoBertaForTokenClassification from Amir01 +author: John Snow Labs +name: roberta_sayula_popoluca_tagging_amir01 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_sayula_popoluca_tagging_amir01` is a English model originally trained by Amir01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_sayula_popoluca_tagging_amir01_en_5.5.0_3.0_1726853246028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_sayula_popoluca_tagging_amir01_en_5.5.0_3.0_1726853246028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_sayula_popoluca_tagging_amir01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_sayula_popoluca_tagging_amir01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_sayula_popoluca_tagging_amir01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|464.5 MB| + +## References + +https://huggingface.co/Amir01/roberta-pos-tagging \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_manx_tl.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_manx_tl.md new file mode 100644 index 00000000000000..8db15cf3f2c4f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_manx_tl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Tagalog roberta_tagalog_base_ft_udpos213_manx RoBertaForTokenClassification from iceman2434 +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_manx +date: 2024-09-20 +tags: [tl, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: tl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_manx` is a Tagalog model originally trained by iceman2434. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_manx_tl_5.5.0_3.0_1726847001856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_manx_tl_5.5.0_3.0_1726847001856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_manx","tl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_manx", "tl") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_manx| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|tl| +|Size:|407.2 MB| + +## References + +https://huggingface.co/iceman2434/roberta-tagalog-base-ft-udpos213-gv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-robertalex_mlm_armas_inga_estrella_en.md b/docs/_posts/ahmedlone127/2024-09-20-robertalex_mlm_armas_inga_estrella_en.md new file mode 100644 index 00000000000000..c645cb60526159 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-robertalex_mlm_armas_inga_estrella_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertalex_mlm_armas_inga_estrella RoBertaEmbeddings from JFernandoGRE +author: John Snow Labs +name: robertalex_mlm_armas_inga_estrella +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertalex_mlm_armas_inga_estrella` is a English model originally trained by JFernandoGRE. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertalex_mlm_armas_inga_estrella_en_5.5.0_3.0_1726857591924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertalex_mlm_armas_inga_estrella_en_5.5.0_3.0_1726857591924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robertalex_mlm_armas_inga_estrella","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robertalex_mlm_armas_inga_estrella","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertalex_mlm_armas_inga_estrella| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/JFernandoGRE/RoBERTalex_mlm_ARMAS_INGA_ESTRELLA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline_en.md new file mode 100644 index 00000000000000..3c92aeb17092cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline pipeline BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline_en_5.5.0_3.0_1726860155220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline_en_5.5.0_3.0_1726860155220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sarcasm_detection_bert_base_uncased_sayula_popoluca_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/sarcasm-detection-Bert-base-uncased-POS + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sst5_padding100model_en.md b/docs/_posts/ahmedlone127/2024-09-20-sst5_padding100model_en.md new file mode 100644 index 00000000000000..564883377f36bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sst5_padding100model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sst5_padding100model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst5_padding100model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst5_padding100model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst5_padding100model_en_5.5.0_3.0_1726848683901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst5_padding100model_en_5.5.0_3.0_1726848683901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst5_padding100model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst5_padding100model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst5_padding100model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst5_padding100model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-stereotype_italian_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-20-stereotype_italian_pipeline_it.md new file mode 100644 index 00000000000000..99806185836de4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-stereotype_italian_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian stereotype_italian_pipeline pipeline BertForSequenceClassification from aequa-tech +author: John Snow Labs +name: stereotype_italian_pipeline +date: 2024-09-20 +tags: [it, open_source, pipeline, onnx] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stereotype_italian_pipeline` is a Italian model originally trained by aequa-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stereotype_italian_pipeline_it_5.5.0_3.0_1726859987957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stereotype_italian_pipeline_it_5.5.0_3.0_1726859987957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stereotype_italian_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stereotype_italian_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stereotype_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|691.9 MB| + +## References + +https://huggingface.co/aequa-tech/stereotype-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-t_100002_en.md b/docs/_posts/ahmedlone127/2024-09-20-t_100002_en.md new file mode 100644 index 00000000000000..ea45a4421425bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-t_100002_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English t_100002 RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_100002 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_100002` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_100002_en_5.5.0_3.0_1726852190915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_100002_en_5.5.0_3.0_1726852190915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_100002","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_100002", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_100002| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_100002 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test_model_name_en.md b/docs/_posts/ahmedlone127/2024-09-20-test_model_name_en.md new file mode 100644 index 00000000000000..15f7a68ffd7ff5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test_model_name_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_model_name DistilBertForSequenceClassification from lingaying +author: John Snow Labs +name: test_model_name +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_name` is a English model originally trained by lingaying. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_name_en_5.5.0_3.0_1726848654259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_name_en_5.5.0_3.0_1726848654259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model_name","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model_name", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_name| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lingaying/test_model_name \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test_trainer_raghavsharma06_en.md b/docs/_posts/ahmedlone127/2024-09-20-test_trainer_raghavsharma06_en.md new file mode 100644 index 00000000000000..174bf824769b9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test_trainer_raghavsharma06_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_trainer_raghavsharma06 DistilBertForSequenceClassification from raghavsharma06 +author: John Snow Labs +name: test_trainer_raghavsharma06 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer_raghavsharma06` is a English model originally trained by raghavsharma06. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer_raghavsharma06_en_5.5.0_3.0_1726861223495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer_raghavsharma06_en_5.5.0_3.0_1726861223495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_trainer_raghavsharma06","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_trainer_raghavsharma06", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer_raghavsharma06| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/raghavsharma06/test_trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_kwanchiva_en.md b/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_kwanchiva_en.md new file mode 100644 index 00000000000000..cfa7502ed6e140 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_kwanchiva_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English test_whisper_tiny_thai_kwanchiva WhisperForCTC from kwanchiva +author: John Snow Labs +name: test_whisper_tiny_thai_kwanchiva +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_whisper_tiny_thai_kwanchiva` is a English model originally trained by kwanchiva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_kwanchiva_en_5.5.0_3.0_1726813864712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_kwanchiva_en_5.5.0_3.0_1726813864712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("test_whisper_tiny_thai_kwanchiva","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("test_whisper_tiny_thai_kwanchiva", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_whisper_tiny_thai_kwanchiva| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/kwanchiva/test-whisper-tiny-th \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-tmp_trainer_rajendrabaskota_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-tmp_trainer_rajendrabaskota_pipeline_en.md new file mode 100644 index 00000000000000..0fe6d2efd02f8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-tmp_trainer_rajendrabaskota_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tmp_trainer_rajendrabaskota_pipeline pipeline RoBertaForSequenceClassification from rajendrabaskota +author: John Snow Labs +name: tmp_trainer_rajendrabaskota_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tmp_trainer_rajendrabaskota_pipeline` is a English model originally trained by rajendrabaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tmp_trainer_rajendrabaskota_pipeline_en_5.5.0_3.0_1726804491369.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tmp_trainer_rajendrabaskota_pipeline_en_5.5.0_3.0_1726804491369.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tmp_trainer_rajendrabaskota_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tmp_trainer_rajendrabaskota_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tmp_trainer_rajendrabaskota_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|457.4 MB| + +## References + +https://huggingface.co/rajendrabaskota/tmp_trainer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-twitchleaguebert_1000k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-twitchleaguebert_1000k_pipeline_en.md new file mode 100644 index 00000000000000..bac475912586c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-twitchleaguebert_1000k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitchleaguebert_1000k_pipeline pipeline RoBertaEmbeddings from Epidot +author: John Snow Labs +name: twitchleaguebert_1000k_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitchleaguebert_1000k_pipeline` is a English model originally trained by Epidot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitchleaguebert_1000k_pipeline_en_5.5.0_3.0_1726796346204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitchleaguebert_1000k_pipeline_en_5.5.0_3.0_1726796346204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitchleaguebert_1000k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitchleaguebert_1000k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitchleaguebert_1000k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|305.5 MB| + +## References + +https://huggingface.co/Epidot/TwitchLeagueBert-1000k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_42_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_42_pipeline_en.md new file mode 100644 index 00000000000000..a35adb2ff35d54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_42_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English uned_tfg_08_42_pipeline pipeline RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_42_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_42_pipeline` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_42_pipeline_en_5.5.0_3.0_1726852264168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_42_pipeline_en_5.5.0_3.0_1726852264168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("uned_tfg_08_42_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("uned_tfg_08_42_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_42_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.42 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-wannasleep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-wannasleep_pipeline_en.md new file mode 100644 index 00000000000000..ee661bc8235e94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-wannasleep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English wannasleep_pipeline pipeline DistilBertForSequenceClassification from kithangw +author: John Snow Labs +name: wannasleep_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wannasleep_pipeline` is a English model originally trained by kithangw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wannasleep_pipeline_en_5.5.0_3.0_1726809654745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wannasleep_pipeline_en_5.5.0_3.0_1726809654745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wannasleep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wannasleep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wannasleep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kithangw/wannasleep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_indonesian_zeinhasan_hi.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_indonesian_zeinhasan_hi.md new file mode 100644 index 00000000000000..b2be33bd4eb4e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_indonesian_zeinhasan_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_indonesian_zeinhasan WhisperForCTC from zeinhasan +author: John Snow Labs +name: whisper_small_indonesian_zeinhasan +date: 2024-09-20 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_zeinhasan` is a Hindi model originally trained by zeinhasan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_zeinhasan_hi_5.5.0_3.0_1726811954031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_zeinhasan_hi_5.5.0_3.0_1726811954031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_zeinhasan","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_zeinhasan", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_zeinhasan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|389.8 MB| + +## References + +https://huggingface.co/zeinhasan/whisper-small-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_tamil_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_tamil_pipeline_hi.md new file mode 100644 index 00000000000000..d089eed6c1fba4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_tamil_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_tiny_tamil_pipeline pipeline WhisperForCTC from Sammarieo +author: John Snow Labs +name: whisper_tiny_tamil_pipeline +date: 2024-09-20 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_tamil_pipeline` is a Hindi model originally trained by Sammarieo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_tamil_pipeline_hi_5.5.0_3.0_1726813718761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_tamil_pipeline_hi_5.5.0_3.0_1726813718761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_tamil_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_tamil_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_tamil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|390.7 MB| + +## References + +https://huggingface.co/Sammarieo/whisper-tiny-ta + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kbleejohn_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kbleejohn_en.md new file mode 100644 index 00000000000000..bbe7fc5a655a65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kbleejohn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_kbleejohn XlmRoBertaForTokenClassification from kbleejohn +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_kbleejohn +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_kbleejohn` is a English model originally trained by kbleejohn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kbleejohn_en_5.5.0_3.0_1726844501552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kbleejohn_en_5.5.0_3.0_1726844501552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_kbleejohn","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_kbleejohn", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_kbleejohn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/kbleejohn/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline_en.md new file mode 100644 index 00000000000000..4a3134f91cb234 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline pipeline XlmRoBertaForTokenClassification from kbleejohn +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline` is a English model originally trained by kbleejohn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline_en_5.5.0_3.0_1726844566516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline_en_5.5.0_3.0_1726844566516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_kbleejohn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/kbleejohn/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kenhoffman_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kenhoffman_en.md new file mode 100644 index 00000000000000..0a99ba5b8653a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_kenhoffman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_kenhoffman XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_kenhoffman +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_kenhoffman` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kenhoffman_en_5.5.0_3.0_1726843910561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_kenhoffman_en_5.5.0_3.0_1726843910561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_kenhoffman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_kenhoffman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_kenhoffman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-your_repo_name_iwaves_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-your_repo_name_iwaves_pipeline_en.md new file mode 100644 index 00000000000000..84d15f7d82612c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-your_repo_name_iwaves_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English your_repo_name_iwaves_pipeline pipeline DistilBertForSequenceClassification from Iwaves +author: John Snow Labs +name: your_repo_name_iwaves_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`your_repo_name_iwaves_pipeline` is a English model originally trained by Iwaves. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/your_repo_name_iwaves_pipeline_en_5.5.0_3.0_1726832707735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/your_repo_name_iwaves_pipeline_en_5.5.0_3.0_1726832707735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("your_repo_name_iwaves_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("your_repo_name_iwaves_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|your_repo_name_iwaves_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Iwaves/your-repo-name + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-autotrain_intent_classification_6categories_roberta_89129143858_en.md b/docs/_posts/ahmedlone127/2024-09-21-autotrain_intent_classification_6categories_roberta_89129143858_en.md new file mode 100644 index 00000000000000..d20482afa3c50f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-autotrain_intent_classification_6categories_roberta_89129143858_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_intent_classification_6categories_roberta_89129143858 XlmRoBertaForSequenceClassification from yeye776 +author: John Snow Labs +name: autotrain_intent_classification_6categories_roberta_89129143858 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_intent_classification_6categories_roberta_89129143858` is a English model originally trained by yeye776. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_intent_classification_6categories_roberta_89129143858_en_5.5.0_3.0_1726932498765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_intent_classification_6categories_roberta_89129143858_en_5.5.0_3.0_1726932498765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("autotrain_intent_classification_6categories_roberta_89129143858","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("autotrain_intent_classification_6categories_roberta_89129143858", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_intent_classification_6categories_roberta_89129143858| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|770.2 MB| + +## References + +https://huggingface.co/yeye776/autotrain-intent-classification-6categories-roberta-89129143858 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline_en.md new file mode 100644 index 00000000000000..9d9ee19d13cb85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline_en_5.5.0_3.0_1726950055300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline_en_5.5.0_3.0_1726950055300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_final_combined_2_0_8_1e_05_balmy_sweep_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.6 MB| + +## References + +https://huggingface.co/saahith/base.en-final-combined-2-0-8-1e-05-balmy-sweep-1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bengali_whisper_base_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-21-bengali_whisper_base_pipeline_bn.md new file mode 100644 index 00000000000000..315a97dd959e84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bengali_whisper_base_pipeline_bn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bengali bengali_whisper_base_pipeline pipeline WhisperForCTC from emon-j +author: John Snow Labs +name: bengali_whisper_base_pipeline +date: 2024-09-21 +tags: [bn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bengali_whisper_base_pipeline` is a Bengali model originally trained by emon-j. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bengali_whisper_base_pipeline_bn_5.5.0_3.0_1726906170359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bengali_whisper_base_pipeline_bn_5.5.0_3.0_1726906170359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bengali_whisper_base_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bengali_whisper_base_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bengali_whisper_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|642.0 MB| + +## References + +https://huggingface.co/emon-j/Bengali-Whisper-Base + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline_en.md new file mode 100644 index 00000000000000..13357e1d5dc7cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline_en_5.5.0_3.0_1726947210977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline_en_5.5.0_3.0_1726947210977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_56_b_8_lr_4e_07_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_400_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.56-b-8-lr-4e-07-dp-1.0-ss-0-st-False-fh-False-hs-400 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_en.md new file mode 100644 index 00000000000000..0f72df0cb48911 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_en_5.5.0_3.0_1726946545397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_en_5.5.0_3.0_1726946545397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-0.0001-wd-0.001-dp-0.9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_large_cased_squad_model2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_large_cased_squad_model2_pipeline_en.md new file mode 100644 index 00000000000000..c0df10526376bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_large_cased_squad_model2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_cased_squad_model2_pipeline pipeline BertForQuestionAnswering from varun-v-rao +author: John Snow Labs +name: bert_large_cased_squad_model2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_squad_model2_pipeline` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_squad_model2_pipeline_en_5.5.0_3.0_1726946879264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_squad_model2_pipeline_en_5.5.0_3.0_1726946879264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_cased_squad_model2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_cased_squad_model2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_squad_model2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/varun-v-rao/bert-large-cased-squad-model2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_large_finetuned_tqa_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_large_finetuned_tqa_en.md new file mode 100644 index 00000000000000..2a84af452dcd72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_large_finetuned_tqa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_finetuned_tqa BertForQuestionAnswering from tvsharish +author: John Snow Labs +name: bert_large_finetuned_tqa +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_finetuned_tqa` is a English model originally trained by tvsharish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_finetuned_tqa_en_5.5.0_3.0_1726946543912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_finetuned_tqa_en_5.5.0_3.0_1726946543912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_finetuned_tqa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_finetuned_tqa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_finetuned_tqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tvsharish/bert-large-finetuned-tqa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-berturk_earthquake_tweets_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-berturk_earthquake_tweets_classification_pipeline_en.md new file mode 100644 index 00000000000000..1c4e8d31ebdb39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-berturk_earthquake_tweets_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English berturk_earthquake_tweets_classification_pipeline pipeline BertForSequenceClassification from yhaslan +author: John Snow Labs +name: berturk_earthquake_tweets_classification_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`berturk_earthquake_tweets_classification_pipeline` is a English model originally trained by yhaslan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/berturk_earthquake_tweets_classification_pipeline_en_5.5.0_3.0_1726955844805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/berturk_earthquake_tweets_classification_pipeline_en_5.5.0_3.0_1726955844805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("berturk_earthquake_tweets_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("berturk_earthquake_tweets_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|berturk_earthquake_tweets_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/yhaslan/berturk-earthquake-tweets-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bertweet_large_reddit_gab_16000sample_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bertweet_large_reddit_gab_16000sample_pipeline_en.md new file mode 100644 index 00000000000000..0e52e5d2650367 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bertweet_large_reddit_gab_16000sample_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertweet_large_reddit_gab_16000sample_pipeline pipeline RoBertaEmbeddings from HPL +author: John Snow Labs +name: bertweet_large_reddit_gab_16000sample_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertweet_large_reddit_gab_16000sample_pipeline` is a English model originally trained by HPL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertweet_large_reddit_gab_16000sample_pipeline_en_5.5.0_3.0_1726957912393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertweet_large_reddit_gab_16000sample_pipeline_en_5.5.0_3.0_1726957912393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertweet_large_reddit_gab_16000sample_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertweet_large_reddit_gab_16000sample_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertweet_large_reddit_gab_16000sample_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/HPL/bertweet-large-reddit-gab-16000sample + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_confunius_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_confunius_pipeline_en.md new file mode 100644 index 00000000000000..3138cde892a874 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_confunius_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_confunius_pipeline pipeline RoBertaEmbeddings from confunius +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_confunius_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_confunius_pipeline` is a English model originally trained by confunius. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_confunius_pipeline_en_5.5.0_3.0_1726934712999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_confunius_pipeline_en_5.5.0_3.0_1726934712999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_confunius_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_confunius_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_confunius_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/confunius/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_riaraju_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_riaraju_en.md new file mode 100644 index 00000000000000..246264f1c82364 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_riaraju_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_riaraju DistilBertForSequenceClassification from riaraju +author: John Snow Labs +name: burmese_awesome_model_riaraju +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_riaraju` is a English model originally trained by riaraju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_riaraju_en_5.5.0_3.0_1726884936396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_riaraju_en_5.5.0_3.0_1726884936396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_riaraju","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_riaraju", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_riaraju| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/riaraju/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-case_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-case_classifier_pipeline_en.md new file mode 100644 index 00000000000000..eb51ea8582eceb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-case_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English case_classifier_pipeline pipeline DistilBertForSequenceClassification from LahiruProjects +author: John Snow Labs +name: case_classifier_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`case_classifier_pipeline` is a English model originally trained by LahiruProjects. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/case_classifier_pipeline_en_5.5.0_3.0_1726888838380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/case_classifier_pipeline_en_5.5.0_3.0_1726888838380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("case_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("case_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|case_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LahiruProjects/case-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-colombian_sign_language_small_biased_random_20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-colombian_sign_language_small_biased_random_20_pipeline_en.md new file mode 100644 index 00000000000000..867b45b1c68813 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-colombian_sign_language_small_biased_random_20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English colombian_sign_language_small_biased_random_20_pipeline pipeline RoBertaEmbeddings from antolin +author: John Snow Labs +name: colombian_sign_language_small_biased_random_20_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`colombian_sign_language_small_biased_random_20_pipeline` is a English model originally trained by antolin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/colombian_sign_language_small_biased_random_20_pipeline_en_5.5.0_3.0_1726958055226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/colombian_sign_language_small_biased_random_20_pipeline_en_5.5.0_3.0_1726958055226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("colombian_sign_language_small_biased_random_20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("colombian_sign_language_small_biased_random_20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|colombian_sign_language_small_biased_random_20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|470.6 MB| + +## References + +https://huggingface.co/antolin/csn-small-biased-random-20 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_en.md b/docs/_posts/ahmedlone127/2024-09-21-deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_en.md new file mode 100644 index 00000000000000..1b99215cd5ac5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English deepset_bert_base_cased_squad2_orkg_unchanged_5e_05 BertForQuestionAnswering from Moussab +author: John Snow Labs +name: deepset_bert_base_cased_squad2_orkg_unchanged_5e_05 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deepset_bert_base_cased_squad2_orkg_unchanged_5e_05` is a English model originally trained by Moussab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_en_5.5.0_3.0_1726946445195.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_unchanged_5e_05_en_5.5.0_3.0_1726946445195.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("deepset_bert_base_cased_squad2_orkg_unchanged_5e_05","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("deepset_bert_base_cased_squad2_orkg_unchanged_5e_05", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepset_bert_base_cased_squad2_orkg_unchanged_5e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Moussab/deepset_bert-base-cased-squad2-orkg-unchanged-5e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_distilled_clinc_mrwetsnow_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_distilled_clinc_mrwetsnow_en.md new file mode 100644 index 00000000000000..5b34b647b80320 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_distilled_clinc_mrwetsnow_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_mrwetsnow DistilBertForSequenceClassification from MrWetsnow +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_mrwetsnow +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_mrwetsnow` is a English model originally trained by MrWetsnow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_mrwetsnow_en_5.5.0_3.0_1726953235001.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_mrwetsnow_en_5.5.0_3.0_1726953235001.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_mrwetsnow","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_mrwetsnow", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_mrwetsnow| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/MrWetsnow/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline_en.md new file mode 100644 index 00000000000000..4aac673f2aae89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline pipeline DistilBertForSequenceClassification from MrWetsnow +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline` is a English model originally trained by MrWetsnow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline_en_5.5.0_3.0_1726953247058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline_en_5.5.0_3.0_1726953247058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_mrwetsnow_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/MrWetsnow/distilbert-base-uncased-distilled-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_cola_cltsai_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_cola_cltsai_en.md new file mode 100644 index 00000000000000..862810eca25855 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_cola_cltsai_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_cltsai DistilBertForSequenceClassification from cltsai +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_cltsai +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_cltsai` is a English model originally trained by cltsai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_cltsai_en_5.5.0_3.0_1726888780862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_cltsai_en_5.5.0_3.0_1726888780862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_cltsai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_cltsai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_cltsai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cltsai/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leotunganh_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leotunganh_en.md new file mode 100644 index 00000000000000..cda90210db340d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_leotunganh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_leotunganh DistilBertForSequenceClassification from LeoTungAnh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_leotunganh +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_leotunganh` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_leotunganh_en_5.5.0_3.0_1726888764161.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_leotunganh_en_5.5.0_3.0_1726888764161.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_leotunganh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_leotunganh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_leotunganh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeoTungAnh/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline_en.md new file mode 100644 index 00000000000000..37e28e514e1f97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline pipeline DistilBertForSequenceClassification from yerkekz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline` is a English model originally trained by yerkekz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline_en_5.5.0_3.0_1726884594212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline_en_5.5.0_3.0_1726884594212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yerkekz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yerkekz/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline_en.md new file mode 100644 index 00000000000000..1c5557116f5300 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline pipeline DistilBertForSequenceClassification from abhinavreddy17 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline` is a English model originally trained by abhinavreddy17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline_en_5.5.0_3.0_1726884506167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline_en_5.5.0_3.0_1726884506167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_qnli_abhinavreddy17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abhinavreddy17/distilbert-base-uncased-finetuned-qnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_en.md new file mode 100644 index 00000000000000..9243f6d04693dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_en_5.5.0_3.0_1726923818099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_en_5.5.0_3.0_1726923818099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline_en.md new file mode 100644 index 00000000000000..c975ff522b1620 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline_en_5.5.0_3.0_1726924311919.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline_en_5.5.0_3.0_1726924311919.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_refine_cl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_refine_cl + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_aliciiavs_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_aliciiavs_en.md new file mode 100644 index 00000000000000..ae577092e0fa8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_emotion_aliciiavs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_aliciiavs DistilBertForSequenceClassification from aliciiavs +author: John Snow Labs +name: distilbert_emotion_aliciiavs +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_aliciiavs` is a English model originally trained by aliciiavs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_aliciiavs_en_5.5.0_3.0_1726888838940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_aliciiavs_en_5.5.0_3.0_1726888838940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_aliciiavs","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_aliciiavs", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_aliciiavs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aliciiavs/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding10model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding10model_pipeline_en.md new file mode 100644 index 00000000000000..4899389fb62faa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding10model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_twitterfin_padding10model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding10model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding10model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding10model_pipeline_en_5.5.0_3.0_1726888614013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding10model_pipeline_en_5.5.0_3.0_1726888614013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_twitterfin_padding10model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_twitterfin_padding10model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding10model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding10model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilkobert_ep3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilkobert_ep3_pipeline_en.md new file mode 100644 index 00000000000000..4f7cb94e098424 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilkobert_ep3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilkobert_ep3_pipeline pipeline DistilBertForSequenceClassification from yeye776 +author: John Snow Labs +name: distilkobert_ep3_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilkobert_ep3_pipeline` is a English model originally trained by yeye776. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilkobert_ep3_pipeline_en_5.5.0_3.0_1726923705730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilkobert_ep3_pipeline_en_5.5.0_3.0_1726923705730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilkobert_ep3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilkobert_ep3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilkobert_ep3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|106.5 MB| + +## References + +https://huggingface.co/yeye776/DistilKoBERT-ep3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_rbm213k_ep40_ep20_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_rbm213k_ep40_ep20_en.md new file mode 100644 index 00000000000000..0ca656c3ec42e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_rbm213k_ep40_ep20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_rbm213k_ep40_ep20 RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_rbm213k_ep40_ep20 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_rbm213k_ep40_ep20` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_rbm213k_ep40_ep20_en_5.5.0_3.0_1726957827336.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_rbm213k_ep40_ep20_en_5.5.0_3.0_1726957827336.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_rbm213k_ep40_ep20","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_rbm213k_ep40_ep20","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_rbm213k_ep40_ep20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.1 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-rbm213k-ep40-ep20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-emscad_skill_extraction_conference_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-emscad_skill_extraction_conference_pipeline_en.md new file mode 100644 index 00000000000000..97b79cdb4c54df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-emscad_skill_extraction_conference_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emscad_skill_extraction_conference_pipeline pipeline BertForSequenceClassification from Ivo +author: John Snow Labs +name: emscad_skill_extraction_conference_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emscad_skill_extraction_conference_pipeline` is a English model originally trained by Ivo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emscad_skill_extraction_conference_pipeline_en_5.5.0_3.0_1726956415213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emscad_skill_extraction_conference_pipeline_en_5.5.0_3.0_1726956415213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emscad_skill_extraction_conference_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emscad_skill_extraction_conference_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emscad_skill_extraction_conference_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Ivo/emscad-skill-extraction-conference + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-final_finetuned_model_en.md b/docs/_posts/ahmedlone127/2024-09-21-final_finetuned_model_en.md new file mode 100644 index 00000000000000..6133c3400f1c6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-final_finetuned_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English final_finetuned_model DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: final_finetuned_model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_finetuned_model` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_finetuned_model_en_5.5.0_3.0_1726924222361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_finetuned_model_en_5.5.0_3.0_1726924222361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_finetuned_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_finetuned_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_finetuned_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Final-Finetuned-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_2_pipeline_en.md new file mode 100644 index 00000000000000..2c8faac06bcd53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_2_pipeline pipeline DistilBertForSequenceClassification from OscarSuarez +author: John Snow Labs +name: finetuning_sentiment_model_2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_2_pipeline` is a English model originally trained by OscarSuarez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_2_pipeline_en_5.5.0_3.0_1726953159138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_2_pipeline_en_5.5.0_3.0_1726953159138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/OscarSuarez/finetuning-sentiment-model-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline_en.md new file mode 100644 index 00000000000000..4ad170f08300c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline pipeline DistilBertForSequenceClassification from DiegodelValle +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline` is a English model originally trained by DiegodelValle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline_en_5.5.0_3.0_1726884684711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline_en_5.5.0_3.0_1726884684711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_diegodelvalle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DiegodelValle/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_marcelarosalesj_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_marcelarosalesj_en.md new file mode 100644 index 00000000000000..4345c9b0a1285c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_marcelarosalesj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_marcelarosalesj DistilBertForSequenceClassification from marcelarosalesj +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_marcelarosalesj +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_marcelarosalesj` is a English model originally trained by marcelarosalesj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_marcelarosalesj_en_5.5.0_3.0_1726923712368.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_marcelarosalesj_en_5.5.0_3.0_1726923712368.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_marcelarosalesj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_marcelarosalesj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_marcelarosalesj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/marcelarosalesj/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-firmner_v2_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-firmner_v2_small_pipeline_en.md new file mode 100644 index 00000000000000..911c0719f9c720 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-firmner_v2_small_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English firmner_v2_small_pipeline pipeline BertForTokenClassification from loyoladatamining +author: John Snow Labs +name: firmner_v2_small_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`firmner_v2_small_pipeline` is a English model originally trained by loyoladatamining. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/firmner_v2_small_pipeline_en_5.5.0_3.0_1726889734236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/firmner_v2_small_pipeline_en_5.5.0_3.0_1726889734236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("firmner_v2_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("firmner_v2_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|firmner_v2_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|107.0 MB| + +## References + +https://huggingface.co/loyoladatamining/firmNER-v2-small + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-frozen_news_classifier_ft_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-21-frozen_news_classifier_ft_pipeline_ru.md new file mode 100644 index 00000000000000..70648a01f38469 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-frozen_news_classifier_ft_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian frozen_news_classifier_ft_pipeline pipeline BertForSequenceClassification from data-silence +author: John Snow Labs +name: frozen_news_classifier_ft_pipeline +date: 2024-09-21 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frozen_news_classifier_ft_pipeline` is a Russian model originally trained by data-silence. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frozen_news_classifier_ft_pipeline_ru_5.5.0_3.0_1726955255135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frozen_news_classifier_ft_pipeline_ru_5.5.0_3.0_1726955255135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frozen_news_classifier_ft_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frozen_news_classifier_ft_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frozen_news_classifier_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|1.8 GB| + +## References + +https://huggingface.co/data-silence/frozen_news_classifier_ft + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ft_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-21-ft_distilbert_en.md new file mode 100644 index 00000000000000..a061781fd4c5d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ft_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ft_distilbert DistilBertForSequenceClassification from kumbi500 +author: John Snow Labs +name: ft_distilbert +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_distilbert` is a English model originally trained by kumbi500. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_distilbert_en_5.5.0_3.0_1726888759607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_distilbert_en_5.5.0_3.0_1726888759607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kumbi500/FT_DistilBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ft_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-ft_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..7856deb0f12b5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ft_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ft_distilbert_pipeline pipeline DistilBertForSequenceClassification from kumbi500 +author: John Snow Labs +name: ft_distilbert_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_distilbert_pipeline` is a English model originally trained by kumbi500. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_distilbert_pipeline_en_5.5.0_3.0_1726888771915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_distilbert_pipeline_en_5.5.0_3.0_1726888771915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ft_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ft_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kumbi500/FT_DistilBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-hraf_multilabel_hierarchical_en.md b/docs/_posts/ahmedlone127/2024-09-21-hraf_multilabel_hierarchical_en.md new file mode 100644 index 00000000000000..73994f20c72c8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-hraf_multilabel_hierarchical_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hraf_multilabel_hierarchical DistilBertForSequenceClassification from Chantland +author: John Snow Labs +name: hraf_multilabel_hierarchical +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hraf_multilabel_hierarchical` is a English model originally trained by Chantland. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hraf_multilabel_hierarchical_en_5.5.0_3.0_1726953472235.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hraf_multilabel_hierarchical_en_5.5.0_3.0_1726953472235.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hraf_multilabel_hierarchical","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hraf_multilabel_hierarchical", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hraf_multilabel_hierarchical| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Chantland/HRAF_MultiLabel_Hierarchical \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-jerteh355sentneg4_en.md b/docs/_posts/ahmedlone127/2024-09-21-jerteh355sentneg4_en.md new file mode 100644 index 00000000000000..1aeafda02461fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-jerteh355sentneg4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jerteh355sentneg4 RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentneg4 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentneg4` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentneg4_en_5.5.0_3.0_1726900880289.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentneg4_en_5.5.0_3.0_1726900880289.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentneg4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentneg4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentneg4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTNEG4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-mal_asr_whisper_small_imasc_1000_pipeline_nan.md b/docs/_posts/ahmedlone127/2024-09-21-mal_asr_whisper_small_imasc_1000_pipeline_nan.md new file mode 100644 index 00000000000000..574a46e283dc3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-mal_asr_whisper_small_imasc_1000_pipeline_nan.md @@ -0,0 +1,69 @@ +--- +layout: model +title: None mal_asr_whisper_small_imasc_1000_pipeline pipeline WhisperForCTC from leenag +author: John Snow Labs +name: mal_asr_whisper_small_imasc_1000_pipeline +date: 2024-09-21 +tags: [nan, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: nan +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mal_asr_whisper_small_imasc_1000_pipeline` is a None model originally trained by leenag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mal_asr_whisper_small_imasc_1000_pipeline_nan_5.5.0_3.0_1726960285629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mal_asr_whisper_small_imasc_1000_pipeline_nan_5.5.0_3.0_1726960285629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mal_asr_whisper_small_imasc_1000_pipeline", lang = "nan") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mal_asr_whisper_small_imasc_1000_pipeline", lang = "nan") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mal_asr_whisper_small_imasc_1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nan| +|Size:|1.7 GB| + +## References + +https://huggingface.co/leenag/Mal_ASR_Whisper_small_imasc_1000 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_urdu_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_urdu_pipeline_xx.md new file mode 100644 index 00000000000000..8784e92aab7dd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-markuus_bert_base_multilingual_squad_cqa_urdu_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual markuus_bert_base_multilingual_squad_cqa_urdu_pipeline pipeline BertForQuestionAnswering from imrazaa +author: John Snow Labs +name: markuus_bert_base_multilingual_squad_cqa_urdu_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`markuus_bert_base_multilingual_squad_cqa_urdu_pipeline` is a Multilingual model originally trained by imrazaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/markuus_bert_base_multilingual_squad_cqa_urdu_pipeline_xx_5.5.0_3.0_1726946921579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/markuus_bert_base_multilingual_squad_cqa_urdu_pipeline_xx_5.5.0_3.0_1726946921579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("markuus_bert_base_multilingual_squad_cqa_urdu_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("markuus_bert_base_multilingual_squad_cqa_urdu_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|markuus_bert_base_multilingual_squad_cqa_urdu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/imrazaa/markuus-bert-base-multilingual-squad-cqa-ur + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-mlroberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-mlroberta_pipeline_en.md new file mode 100644 index 00000000000000..4f4bd80f3c52d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-mlroberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mlroberta_pipeline pipeline RoBertaEmbeddings from shrutisingh +author: John Snow Labs +name: mlroberta_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mlroberta_pipeline` is a English model originally trained by shrutisingh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mlroberta_pipeline_en_5.5.0_3.0_1726942236334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mlroberta_pipeline_en_5.5.0_3.0_1726942236334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mlroberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mlroberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mlroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.8 MB| + +## References + +https://huggingface.co/shrutisingh/MLRoBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-nepal_bhasa_finetuning_covidsenti_distilbert_model_en.md b/docs/_posts/ahmedlone127/2024-09-21-nepal_bhasa_finetuning_covidsenti_distilbert_model_en.md new file mode 100644 index 00000000000000..313108df9f5c88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-nepal_bhasa_finetuning_covidsenti_distilbert_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nepal_bhasa_finetuning_covidsenti_distilbert_model DistilBertForSequenceClassification from Letrica +author: John Snow Labs +name: nepal_bhasa_finetuning_covidsenti_distilbert_model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_finetuning_covidsenti_distilbert_model` is a English model originally trained by Letrica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_finetuning_covidsenti_distilbert_model_en_5.5.0_3.0_1726924360512.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_finetuning_covidsenti_distilbert_model_en_5.5.0_3.0_1726924360512.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_finetuning_covidsenti_distilbert_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_finetuning_covidsenti_distilbert_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_finetuning_covidsenti_distilbert_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Letrica/new-finetuning-COVIDSenti-distilbert-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline_en.md new file mode 100644 index 00000000000000..72874efb701845 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline pipeline DistilBertForSequenceClassification from Letrica +author: John Snow Labs +name: nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline` is a English model originally trained by Letrica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline_en_5.5.0_3.0_1726924372546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline_en_5.5.0_3.0_1726924372546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_finetuning_covidsenti_distilbert_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Letrica/new-finetuning-COVIDSenti-distilbert-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..7365f79bfb7c2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-recipes_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English recipes_roberta_base_pipeline pipeline RoBertaEmbeddings from AnonymousSub +author: John Snow Labs +name: recipes_roberta_base_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`recipes_roberta_base_pipeline` is a English model originally trained by AnonymousSub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/recipes_roberta_base_pipeline_en_5.5.0_3.0_1726934303061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/recipes_roberta_base_pipeline_en_5.5.0_3.0_1726934303061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("recipes_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("recipes_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|recipes_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/AnonymousSub/recipes-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_whisper_5ep_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_whisper_5ep_en.md new file mode 100644 index 00000000000000..9cc07d0af152a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_finetuned_wallisian_whisper_5ep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_whisper_5ep RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_whisper_5ep +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_whisper_5ep` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_5ep_en_5.5.0_3.0_1726943981506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_5ep_en_5.5.0_3.0_1726943981506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_whisper_5ep","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_whisper_5ep","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_whisper_5ep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-whisper-5ep \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_m_express_emo_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_m_express_emo_en.md new file mode 100644 index 00000000000000..24104cca7e7a78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_m_express_emo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_m_express_emo RoBertaForSequenceClassification from Gregorig +author: John Snow Labs +name: roberta_large_finetuned_m_express_emo +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_m_express_emo` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_m_express_emo_en_5.5.0_3.0_1726900421013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_m_express_emo_en_5.5.0_3.0_1726900421013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_m_express_emo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_finetuned_m_express_emo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_m_express_emo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Gregorig/roberta-large-finetuned-m_express_emo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_fp_sick_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_fp_sick_pipeline_en.md new file mode 100644 index 00000000000000..891488b72a41c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_fp_sick_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_fp_sick_pipeline pipeline RoBertaForSequenceClassification from varun-v-rao +author: John Snow Labs +name: roberta_large_fp_sick_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_fp_sick_pipeline` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_fp_sick_pipeline_en_5.5.0_3.0_1726940814436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_fp_sick_pipeline_en_5.5.0_3.0_1726940814436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_fp_sick_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_fp_sick_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_fp_sick_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/varun-v-rao/roberta-large-fp-sick + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-saved_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-saved_model_pipeline_en.md new file mode 100644 index 00000000000000..f7474c5a37acc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-saved_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English saved_model_pipeline pipeline DistilBertForSequenceClassification from hanyp +author: John Snow Labs +name: saved_model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`saved_model_pipeline` is a English model originally trained by hanyp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/saved_model_pipeline_en_5.5.0_3.0_1726888965838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/saved_model_pipeline_en_5.5.0_3.0_1726888965838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("saved_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("saved_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|saved_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hanyp/saved_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-scibert_ner_drugname_en.md b/docs/_posts/ahmedlone127/2024-09-21-scibert_ner_drugname_en.md new file mode 100644 index 00000000000000..080dc8669fedaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-scibert_ner_drugname_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English scibert_ner_drugname BertForTokenClassification from duytu +author: John Snow Labs +name: scibert_ner_drugname +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scibert_ner_drugname` is a English model originally trained by duytu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scibert_ner_drugname_en_5.5.0_3.0_1726889563283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scibert_ner_drugname_en_5.5.0_3.0_1726889563283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("scibert_ner_drugname","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("scibert_ner_drugname", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scibert_ner_drugname| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/duytu/scibert_ner_drugname \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_aristoberto_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_aristoberto_en.md new file mode 100644 index 00000000000000..e7659207be3399 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_aristoberto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_aristoberto BertSentenceEmbeddings from Jacobo +author: John Snow Labs +name: sent_aristoberto +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_aristoberto` is a English model originally trained by Jacobo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_aristoberto_en_5.5.0_3.0_1726941679489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_aristoberto_en_5.5.0_3.0_1726941679489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_aristoberto","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_aristoberto","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_aristoberto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|420.1 MB| + +## References + +https://huggingface.co/Jacobo/aristoBERTo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_multilingual_cased_finetuned_igbo_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_multilingual_cased_finetuned_igbo_pipeline_xx.md new file mode 100644 index 00000000000000..4266b73b79bdcb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_multilingual_cased_finetuned_igbo_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_igbo_pipeline pipeline BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_igbo_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_igbo_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_igbo_pipeline_xx_5.5.0_3.0_1726898402847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_igbo_pipeline_xx_5.5.0_3.0_1726898402847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_igbo_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_igbo_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_igbo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.6 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-igbo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_issues_128_lijingxin_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_issues_128_lijingxin_en.md new file mode 100644 index 00000000000000..59d1df9c7b46f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_issues_128_lijingxin_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_lijingxin BertSentenceEmbeddings from lijingxin +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_lijingxin +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_lijingxin` is a English model originally trained by lijingxin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_lijingxin_en_5.5.0_3.0_1726941432089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_lijingxin_en_5.5.0_3.0_1726941432089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_lijingxin","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_lijingxin","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_lijingxin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/lijingxin/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline_en.md new file mode 100644 index 00000000000000..52b77690a79dd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline_en_5.5.0_3.0_1726960775763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline_en_5.5.0_3.0_1726960775763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_final_combined_1_0_1_8_1e_06_daily_sweep_15_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.5 MB| + +## References + +https://huggingface.co/saahith/tiny.en-final-combined-1-0.1-8-1e-06-daily-sweep-15 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-toxicity_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-toxicity_classifier_pipeline_en.md new file mode 100644 index 00000000000000..11d83553399cb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-toxicity_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English toxicity_classifier_pipeline pipeline DistilBertForSequenceClassification from richterleo +author: John Snow Labs +name: toxicity_classifier_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxicity_classifier_pipeline` is a English model originally trained by richterleo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxicity_classifier_pipeline_en_5.5.0_3.0_1726888609200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxicity_classifier_pipeline_en_5.5.0_3.0_1726888609200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("toxicity_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("toxicity_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxicity_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/richterleo/toxicity_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-twitterfin_padding100model_en.md b/docs/_posts/ahmedlone127/2024-09-21-twitterfin_padding100model_en.md new file mode 100644 index 00000000000000..6ad8fcc54791e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-twitterfin_padding100model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitterfin_padding100model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: twitterfin_padding100model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitterfin_padding100model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitterfin_padding100model_en_5.5.0_3.0_1726888870325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitterfin_padding100model_en_5.5.0_3.0_1726888870325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitterfin_padding100model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitterfin_padding100model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitterfin_padding100model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/twitterfin_padding100model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_fine_tuned_base_company_earnings_call_v0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_fine_tuned_base_company_earnings_call_v0_pipeline_en.md new file mode 100644 index 00000000000000..cefeb8d192dd45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_fine_tuned_base_company_earnings_call_v0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_fine_tuned_base_company_earnings_call_v0_pipeline pipeline WhisperForCTC from MasatoShima1618 +author: John Snow Labs +name: whisper_fine_tuned_base_company_earnings_call_v0_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_fine_tuned_base_company_earnings_call_v0_pipeline` is a English model originally trained by MasatoShima1618. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_fine_tuned_base_company_earnings_call_v0_pipeline_en_5.5.0_3.0_1726962096819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_fine_tuned_base_company_earnings_call_v0_pipeline_en_5.5.0_3.0_1726962096819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_fine_tuned_base_company_earnings_call_v0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_fine_tuned_base_company_earnings_call_v0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_fine_tuned_base_company_earnings_call_v0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.5 MB| + +## References + +https://huggingface.co/MasatoShima1618/Whisper-fine-tuned-base-company-earnings-call-v0 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_md_greek_modern_intlv_xs_el.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_md_greek_modern_intlv_xs_el.md new file mode 100644 index 00000000000000..d8a02cbd7d409e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_md_greek_modern_intlv_xs_el.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Modern Greek (1453-) whisper_md_greek_modern_intlv_xs WhisperForCTC from farsipal +author: John Snow Labs +name: whisper_md_greek_modern_intlv_xs +date: 2024-09-21 +tags: [el, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: el +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_md_greek_modern_intlv_xs` is a Modern Greek (1453-) model originally trained by farsipal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_md_greek_modern_intlv_xs_el_5.5.0_3.0_1726962187330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_md_greek_modern_intlv_xs_el_5.5.0_3.0_1726962187330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_md_greek_modern_intlv_xs","el") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_md_greek_modern_intlv_xs", "el") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_md_greek_modern_intlv_xs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|el| +|Size:|4.8 GB| + +## References + +https://huggingface.co/farsipal/whisper-md-el-intlv-xs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_arch4ngel_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_arch4ngel_pipeline_dv.md new file mode 100644 index 00000000000000..648111f12312da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_arch4ngel_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_arch4ngel_pipeline pipeline WhisperForCTC from Arch4ngel +author: John Snow Labs +name: whisper_small_divehi_arch4ngel_pipeline +date: 2024-09-21 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_arch4ngel_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by Arch4ngel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_arch4ngel_pipeline_dv_5.5.0_3.0_1726906317656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_arch4ngel_pipeline_dv_5.5.0_3.0_1726906317656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_arch4ngel_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_arch4ngel_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_arch4ngel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Arch4ngel/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_ptah23_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_ptah23_dv.md new file mode 100644 index 00000000000000..482d87e39aa47e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_ptah23_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_ptah23 WhisperForCTC from ptah23 +author: John Snow Labs +name: whisper_small_divehi_ptah23 +date: 2024-09-21 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_ptah23` is a Dhivehi, Divehi, Maldivian model originally trained by ptah23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_ptah23_dv_5.5.0_3.0_1726890817757.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_ptah23_dv_5.5.0_3.0_1726890817757.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_ptah23","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_ptah23", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_ptah23| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ptah23/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_winmodel_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_winmodel_dv.md new file mode 100644 index 00000000000000..36ffb1e8ae5fcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_divehi_winmodel_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_winmodel WhisperForCTC from Winmodel +author: John Snow Labs +name: whisper_small_divehi_winmodel +date: 2024-09-21 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_winmodel` is a Dhivehi, Divehi, Maldivian model originally trained by Winmodel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_winmodel_dv_5.5.0_3.0_1726935914727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_winmodel_dv_5.5.0_3.0_1726935914727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_winmodel","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_winmodel", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_winmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Winmodel/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_tonga_tonga_islands_myst55h_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_tonga_tonga_islands_myst55h_en.md new file mode 100644 index 00000000000000..3988d8a8d77b83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_tonga_tonga_islands_myst55h_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_english_tonga_tonga_islands_myst55h WhisperForCTC from rishabhjain16 +author: John Snow Labs +name: whisper_small_english_tonga_tonga_islands_myst55h +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_english_tonga_tonga_islands_myst55h` is a English model originally trained by rishabhjain16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_english_tonga_tonga_islands_myst55h_en_5.5.0_3.0_1726911883945.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_english_tonga_tonga_islands_myst55h_en_5.5.0_3.0_1726911883945.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_english_tonga_tonga_islands_myst55h","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_english_tonga_tonga_islands_myst55h", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_english_tonga_tonga_islands_myst55h| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rishabhjain16/whisper_small_en_to_myst55h \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_european_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_european_pipeline_en.md new file mode 100644 index 00000000000000..76bb0f148b65d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_european_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_european_pipeline pipeline WhisperForCTC from aware-ai +author: John Snow Labs +name: whisper_small_european_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_european_pipeline` is a English model originally trained by aware-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_european_pipeline_en_5.5.0_3.0_1726936014455.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_european_pipeline_en_5.5.0_3.0_1726936014455.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_european_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_european_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_european_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/aware-ai/whisper-small-european + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hausa_phaeeza_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hausa_phaeeza_en.md new file mode 100644 index 00000000000000..81f7b2d2aa3664 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hausa_phaeeza_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hausa_phaeeza WhisperForCTC from phaeeza +author: John Snow Labs +name: whisper_small_hausa_phaeeza +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hausa_phaeeza` is a English model originally trained by phaeeza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hausa_phaeeza_en_5.5.0_3.0_1726947967917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hausa_phaeeza_en_5.5.0_3.0_1726947967917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hausa_phaeeza","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hausa_phaeeza", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hausa_phaeeza| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/phaeeza/whisper-small-ha \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pashto_pipeline_ps.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pashto_pipeline_ps.md new file mode 100644 index 00000000000000..9eb08520b1eb9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pashto_pipeline_ps.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Pashto, Pushto whisper_small_pashto_pipeline pipeline WhisperForCTC from ihanif +author: John Snow Labs +name: whisper_small_pashto_pipeline +date: 2024-09-21 +tags: [ps, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ps +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_pashto_pipeline` is a Pashto, Pushto model originally trained by ihanif. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_pashto_pipeline_ps_5.5.0_3.0_1726878658350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_pashto_pipeline_ps_5.5.0_3.0_1726878658350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_pashto_pipeline", lang = "ps") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_pashto_pipeline", lang = "ps") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_pashto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ps| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ihanif/whisper-small-ps + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_vietnamese_v4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_vietnamese_v4_pipeline_en.md new file mode 100644 index 00000000000000..828b119c573e0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_vietnamese_v4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_vietnamese_v4_pipeline pipeline WhisperForCTC from thanhduycao +author: John Snow Labs +name: whisper_small_vietnamese_v4_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_vietnamese_v4_pipeline` is a English model originally trained by thanhduycao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_vietnamese_v4_pipeline_en_5.5.0_3.0_1726893767941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_vietnamese_v4_pipeline_en_5.5.0_3.0_1726893767941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_vietnamese_v4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_vietnamese_v4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_vietnamese_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/thanhduycao/whisper-small-vi-v4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_korif_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_korif_en.md new file mode 100644 index 00000000000000..13618f450a88b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_korif_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_korif WhisperForCTC from KoRiF +author: John Snow Labs +name: whisper_tiny_english_korif +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_korif` is a English model originally trained by KoRiF. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_korif_en_5.5.0_3.0_1726962037931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_korif_en_5.5.0_3.0_1726962037931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_korif","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_korif", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_korif| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/KoRiF/whisper-tiny-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_temp_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_temp_en.md new file mode 100644 index 00000000000000..39237595c2cd27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_faroese_temp_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_faroese_temp WhisperForCTC from lukespeech +author: John Snow Labs +name: whisper_tiny_faroese_temp +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_faroese_temp` is a English model originally trained by lukespeech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_temp_en_5.5.0_3.0_1726908840067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_temp_en_5.5.0_3.0_1726908840067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_faroese_temp","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_faroese_temp", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_faroese_temp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/lukespeech/whisper-tiny-fo-temp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_frenchmed_v1_fr.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_frenchmed_v1_fr.md new file mode 100644 index 00000000000000..d7341edeaa6374 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_frenchmed_v1_fr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: French whisper_tiny_frenchmed_v1 WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_tiny_frenchmed_v1 +date: 2024-09-21 +tags: [fr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_frenchmed_v1` is a French model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_frenchmed_v1_fr_5.5.0_3.0_1726939373479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_frenchmed_v1_fr_5.5.0_3.0_1726939373479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_frenchmed_v1","fr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_frenchmed_v1", "fr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_frenchmed_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|fr| +|Size:|379.2 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-tiny-frenchmed-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_hindi_common_voice_16_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_hindi_common_voice_16_1_pipeline_en.md new file mode 100644 index 00000000000000..f7bb0c6a70cece --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_hindi_common_voice_16_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_hindi_common_voice_16_1_pipeline pipeline WhisperForCTC from archit342000 +author: John Snow Labs +name: whisper_tiny_hindi_common_voice_16_1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hindi_common_voice_16_1_pipeline` is a English model originally trained by archit342000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_common_voice_16_1_pipeline_en_5.5.0_3.0_1726961939227.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_common_voice_16_1_pipeline_en_5.5.0_3.0_1726961939227.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_hindi_common_voice_16_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_hindi_common_voice_16_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hindi_common_voice_16_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.7 MB| + +## References + +https://huggingface.co/archit342000/whisper_tiny_hi_common_voice_16_1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_wd_1k_v1_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_wd_1k_v1_en.md new file mode 100644 index 00000000000000..bf3516769355c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_wd_1k_v1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_wd_1k_v1 WhisperForCTC from devkyle +author: John Snow Labs +name: whisper_tiny_wd_1k_v1 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_wd_1k_v1` is a English model originally trained by devkyle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_wd_1k_v1_en_5.5.0_3.0_1726935875486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_wd_1k_v1_en_5.5.0_3.0_1726935875486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_wd_1k_v1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_wd_1k_v1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_wd_1k_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.7 MB| + +## References + +https://huggingface.co/devkyle/whisper-tiny-wd-1k-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_vsfc_100_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_vsfc_100_en.md new file mode 100644 index 00000000000000..a9ab826d65d92f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_vsfc_100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_vsfc_100 XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_vsfc_100 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vsfc_100` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vsfc_100_en_5.5.0_3.0_1726919308873.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vsfc_100_en_5.5.0_3.0_1726919308873.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_vsfc_100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_vsfc_100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vsfc_100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|773.7 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-vsfc-100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_vsfc_100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_vsfc_100_pipeline_en.md new file mode 100644 index 00000000000000..cef999f57e6008 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_vsfc_100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_vsfc_100_pipeline pipeline XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_vsfc_100_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vsfc_100_pipeline` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vsfc_100_pipeline_en_5.5.0_3.0_1726919446713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vsfc_100_pipeline_en_5.5.0_3.0_1726919446713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_vsfc_100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_vsfc_100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vsfc_100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|773.7 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-vsfc-100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_v_base_trimmed_arabic_xnli_arabic_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_v_base_trimmed_arabic_xnli_arabic_en.md new file mode 100644 index 00000000000000..feaac8de76511b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_v_base_trimmed_arabic_xnli_arabic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_v_base_trimmed_arabic_xnli_arabic XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_v_base_trimmed_arabic_xnli_arabic +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_v_base_trimmed_arabic_xnli_arabic` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_arabic_xnli_arabic_en_5.5.0_3.0_1726933105156.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_arabic_xnli_arabic_en_5.5.0_3.0_1726933105156.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_v_base_trimmed_arabic_xnli_arabic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_v_base_trimmed_arabic_xnli_arabic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_v_base_trimmed_arabic_xnli_arabic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|530.8 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-ar-xnli-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-0_000003_0_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-0_000003_0_9_pipeline_en.md new file mode 100644 index 00000000000000..909d1a5f421932 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-0_000003_0_9_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 0_000003_0_9_pipeline pipeline RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: 0_000003_0_9_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0_000003_0_9_pipeline` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0_000003_0_9_pipeline_en_5.5.0_3.0_1727016821291.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0_000003_0_9_pipeline_en_5.5.0_3.0_1727016821291.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("0_000003_0_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("0_000003_0_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0_000003_0_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/0.000003_0.9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-2504separado3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-2504separado3_pipeline_en.md new file mode 100644 index 00000000000000..0d1d36148f0f2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-2504separado3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2504separado3_pipeline pipeline RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: 2504separado3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2504separado3_pipeline` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2504separado3_pipeline_en_5.5.0_3.0_1726972479204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2504separado3_pipeline_en_5.5.0_3.0_1726972479204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2504separado3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2504separado3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2504separado3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|448.4 MB| + +## References + +https://huggingface.co/adriansanz/2504separado3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_banking_8_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_banking_8_16_5_oos_en.md new file mode 100644 index 00000000000000..bbd1345f9e0c24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_banking_8_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_banking_8_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_banking_8_16_5_oos +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_banking_8_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_8_16_5_oos_en_5.5.0_3.0_1727026800688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_8_16_5_oos_en_5.5.0_3.0_1727026800688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_8_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_8_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_banking_8_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-banking-8-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-auro_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-auro_1_en.md new file mode 100644 index 00000000000000..850cc0a4ec240e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-auro_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English auro_1 RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: auro_1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`auro_1` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/auro_1_en_5.5.0_3.0_1726972010668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/auro_1_en_5.5.0_3.0_1726972010668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("auro_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("auro_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|auro_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/AURO_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_emotion_ncduy_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_emotion_ncduy_en.md new file mode 100644 index 00000000000000..8aa0498e818777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_emotion_ncduy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_finetuned_emotion_ncduy BertForSequenceClassification from ncduy +author: John Snow Labs +name: bert_base_cased_finetuned_emotion_ncduy +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_emotion_ncduy` is a English model originally trained by ncduy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_emotion_ncduy_en_5.5.0_3.0_1727007715737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_emotion_ncduy_en_5.5.0_3.0_1727007715737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_finetuned_emotion_ncduy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_finetuned_emotion_ncduy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_emotion_ncduy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/ncduy/bert-base-cased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_en.md new file mode 100644 index 00000000000000..39adf53add0597 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_finetuned_squad BertForQuestionAnswering from Arup-Dutta-Bappy +author: John Snow Labs +name: bert_base_cased_finetuned_squad +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_squad` is a English model originally trained by Arup-Dutta-Bappy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_en_5.5.0_3.0_1727049190012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_en_5.5.0_3.0_1727049190012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Arup-Dutta-Bappy/bert-base-cased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..8c171b7919be8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_finetuned_squad_pipeline pipeline BertForQuestionAnswering from Arup-Dutta-Bappy +author: John Snow Labs +name: bert_base_cased_finetuned_squad_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_squad_pipeline` is a English model originally trained by Arup-Dutta-Bappy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_pipeline_en_5.5.0_3.0_1727049216807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_pipeline_en_5.5.0_3.0_1727049216807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Arup-Dutta-Bappy/bert-base-cased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_plane_ood_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_plane_ood_2_pipeline_en.md new file mode 100644 index 00000000000000..855e7d4923b956 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_plane_ood_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_plane_ood_2_pipeline pipeline BertForSequenceClassification from lorenzoscottb +author: John Snow Labs +name: bert_base_cased_plane_ood_2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_plane_ood_2_pipeline` is a English model originally trained by lorenzoscottb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_plane_ood_2_pipeline_en_5.5.0_3.0_1726991265734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_plane_ood_2_pipeline_en_5.5.0_3.0_1726991265734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_plane_ood_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_plane_ood_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_plane_ood_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/lorenzoscottb/bert-base-cased-PLANE-ood-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_local_results_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_local_results_pipeline_en.md new file mode 100644 index 00000000000000..d35507c5e8afc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_local_results_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_local_results_pipeline pipeline BertForSequenceClassification from serpapi +author: John Snow Labs +name: bert_base_local_results_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_local_results_pipeline` is a English model originally trained by serpapi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_local_results_pipeline_en_5.5.0_3.0_1726976516239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_local_results_pipeline_en_5.5.0_3.0_1726976516239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_local_results_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_local_results_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_local_results_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/serpapi/bert-base-local-results + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_en.md new file mode 100644 index 00000000000000..87f7605789a05c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_en_5.5.0_3.0_1727039380809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_en_5.5.0_3.0_1727039380809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915121227 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_english_sentweet_profane_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_english_sentweet_profane_en.md new file mode 100644 index 00000000000000..f42a6ec713d41f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_english_sentweet_profane_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_english_sentweet_profane BertForSequenceClassification from jayanta +author: John Snow Labs +name: bert_base_uncased_english_sentweet_profane +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_english_sentweet_profane` is a English model originally trained by jayanta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_english_sentweet_profane_en_5.5.0_3.0_1727030006784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_english_sentweet_profane_en_5.5.0_3.0_1727030006784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_english_sentweet_profane","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_english_sentweet_profane", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_english_sentweet_profane| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jayanta/bert-base-uncased-english-sentweet-profane \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..84e016310493c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727042393714.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727042393714.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.62-b-32-lr-4e-07-dp-1.0-ss-600-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..d08670d5675b6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727049190563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727049190563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.69-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline_en.md new file mode 100644 index 00000000000000..32cb5ed6c3c933 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline_en_5.5.0_3.0_1727042513733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline_en_5.5.0_3.0_1727042513733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.9-lr-1e-05-wd-0.001-dp-0.99999-ss-70000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en.md new file mode 100644 index 00000000000000..cf23be8c9ab0b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en_5.5.0_3.0_1726991686796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en_5.5.0_3.0_1726991686796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.29-lr-4e-07-wd-1e-05-dp-0.3-ss-0-st-False-fh-False-hs-300 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_en.md new file mode 100644 index 00000000000000..cb96afd32c4c59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_en_5.5.0_3.0_1727043004595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_en_5.5.0_3.0_1727043004595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.3-lr-1e-06-wd-0.001-dp-0.99999-ss-120000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline_en.md new file mode 100644 index 00000000000000..3cff31b16a0c2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline_en_5.5.0_3.0_1727042761630.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline_en_5.5.0_3.0_1727042761630.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-05-wd-0.001-dp-0.02-ss-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_en.md new file mode 100644 index 00000000000000..9aaf94ba32959b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_en_5.5.0_3.0_1727042269236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_en_5.5.0_3.0_1727042269236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-4e-07-wd-0.001-dp-0.999 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline_en.md new file mode 100644 index 00000000000000..2f93059f06e910 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline_en_5.5.0_3.0_1727042295683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline_en_5.5.0_3.0_1727042295683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_4e_07_wd_0_001_dp_0_999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-4e-07-wd-0.001-dp-0.999 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_en.md new file mode 100644 index 00000000000000..e7f7391909f36f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_en_5.5.0_3.0_1726991973615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_en_5.5.0_3.0_1726991973615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-4.0-lr-0.0005-wd-0.01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..496c97e68896bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_pipeline pipeline BertForQuestionAnswering from PabloGuinea +author: John Snow Labs +name: bert_base_uncased_finetuned_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_pipeline` is a English model originally trained by PabloGuinea. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_pipeline_en_5.5.0_3.0_1727043069170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_pipeline_en_5.5.0_3.0_1727043069170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/PabloGuinea/bert-base-uncased-finetuned + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_benroma_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_benroma_en.md new file mode 100644 index 00000000000000..454d34d155232c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_benroma_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_squad_benroma BertForQuestionAnswering from benroma +author: John Snow Labs +name: bert_finetuned_squad_benroma +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_benroma` is a English model originally trained by benroma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_benroma_en_5.5.0_3.0_1727049384527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_benroma_en_5.5.0_3.0_1727049384527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_benroma","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_benroma", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_benroma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/benroma/bert-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_human_label_multiperspective_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_human_label_multiperspective_pipeline_en.md new file mode 100644 index 00000000000000..f31c648ea086c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_human_label_multiperspective_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_human_label_multiperspective_pipeline pipeline BertForSequenceClassification from Multiperspective +author: John Snow Labs +name: bert_human_label_multiperspective_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_human_label_multiperspective_pipeline` is a English model originally trained by Multiperspective. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_human_label_multiperspective_pipeline_en_5.5.0_3.0_1727032988381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_human_label_multiperspective_pipeline_en_5.5.0_3.0_1727032988381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_human_label_multiperspective_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_human_label_multiperspective_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_human_label_multiperspective_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Multiperspective/bert-human_label + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_wikistance_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_wikistance_v1_en.md new file mode 100644 index 00000000000000..83dd6550722e8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_wikistance_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_uncased_wikistance_v1 BertForSequenceClassification from research-dump +author: John Snow Labs +name: bert_large_uncased_wikistance_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_wikistance_v1` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_wikistance_v1_en_5.5.0_3.0_1726989229205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_wikistance_v1_en_5.5.0_3.0_1726989229205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_large_uncased_wikistance_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_large_uncased_wikistance_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_wikistance_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/bert-large-uncased_wikistance_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_afishally_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_afishally_en.md new file mode 100644 index 00000000000000..1cbb1753e2288a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_afishally_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_afishally RoBertaEmbeddings from Afishally +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_afishally +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_afishally` is a English model originally trained by Afishally. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_afishally_en_5.5.0_3.0_1727041765860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_afishally_en_5.5.0_3.0_1727041765860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_afishally","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_afishally","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_afishally| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Afishally/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_afishally_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_afishally_pipeline_en.md new file mode 100644 index 00000000000000..0dd8cb12dd7c33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_afishally_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_afishally_pipeline pipeline RoBertaEmbeddings from Afishally +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_afishally_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_afishally_pipeline` is a English model originally trained by Afishally. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_afishally_pipeline_en_5.5.0_3.0_1727041780956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_afishally_pipeline_en_5.5.0_3.0_1727041780956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_afishally_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_afishally_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_afishally_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Afishally/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_philander_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_philander_pipeline_en.md new file mode 100644 index 00000000000000..bd2853788d207b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_philander_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_philander_pipeline pipeline RoBertaEmbeddings from PHILANDER +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_philander_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_philander_pipeline` is a English model originally trained by PHILANDER. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_philander_pipeline_en_5.5.0_3.0_1727041630597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_philander_pipeline_en_5.5.0_3.0_1727041630597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_philander_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_philander_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_philander_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/PHILANDER/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_augmented_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_augmented_pipeline_en.md new file mode 100644 index 00000000000000..9fe7632bc7a34d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_augmented_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_augmented_pipeline pipeline DistilBertForSequenceClassification from Shozi +author: John Snow Labs +name: burmese_awesome_model_augmented_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_augmented_pipeline` is a English model originally trained by Shozi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_augmented_pipeline_en_5.5.0_3.0_1727033601000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_augmented_pipeline_en_5.5.0_3.0_1727033601000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_augmented_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_augmented_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_augmented_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Shozi/my_awesome_model_augmented + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_fabisor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_fabisor_pipeline_en.md new file mode 100644 index 00000000000000..17089f29d82655 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_fabisor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_fabisor_pipeline pipeline DistilBertForSequenceClassification from fabisor +author: John Snow Labs +name: burmese_awesome_model_fabisor_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_fabisor_pipeline` is a English model originally trained by fabisor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fabisor_pipeline_en_5.5.0_3.0_1726980125076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fabisor_pipeline_en_5.5.0_3.0_1726980125076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_fabisor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_fabisor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_fabisor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fabisor/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_iamaries_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_iamaries_en.md new file mode 100644 index 00000000000000..8621b3df898b55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_iamaries_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_iamaries DistilBertForSequenceClassification from iamaries +author: John Snow Labs +name: burmese_awesome_model_iamaries +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_iamaries` is a English model originally trained by iamaries. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_iamaries_en_5.5.0_3.0_1726980429985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_iamaries_en_5.5.0_3.0_1726980429985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_iamaries","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_iamaries", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_iamaries| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/iamaries/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_myousfi_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_myousfi_en.md new file mode 100644 index 00000000000000..8fa18063f4eef8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_myousfi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_myousfi DistilBertForSequenceClassification from myousfi +author: John Snow Labs +name: burmese_awesome_model_myousfi +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_myousfi` is a English model originally trained by myousfi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_myousfi_en_5.5.0_3.0_1727013032120.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_myousfi_en_5.5.0_3.0_1727013032120.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_myousfi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_myousfi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_myousfi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/myousfi/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_myousfi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_myousfi_pipeline_en.md new file mode 100644 index 00000000000000..4e81315129b579 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_myousfi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_myousfi_pipeline pipeline DistilBertForSequenceClassification from myousfi +author: John Snow Labs +name: burmese_awesome_model_myousfi_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_myousfi_pipeline` is a English model originally trained by myousfi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_myousfi_pipeline_en_5.5.0_3.0_1727013044101.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_myousfi_pipeline_en_5.5.0_3.0_1727013044101.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_myousfi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_myousfi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_myousfi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/myousfi/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ollamh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ollamh_pipeline_en.md new file mode 100644 index 00000000000000..410cf736377f29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ollamh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_ollamh_pipeline pipeline DistilBertForSequenceClassification from ollamh +author: John Snow Labs +name: burmese_awesome_model_ollamh_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ollamh_pipeline` is a English model originally trained by ollamh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ollamh_pipeline_en_5.5.0_3.0_1727012688309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ollamh_pipeline_en_5.5.0_3.0_1727012688309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_ollamh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_ollamh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ollamh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ollamh/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-chemberta_zinc250k_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-chemberta_zinc250k_v1_en.md new file mode 100644 index 00000000000000..4a27e31106e802 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-chemberta_zinc250k_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English chemberta_zinc250k_v1 RoBertaEmbeddings from seyonec +author: John Snow Labs +name: chemberta_zinc250k_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chemberta_zinc250k_v1` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chemberta_zinc250k_v1_en_5.5.0_3.0_1726999929591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chemberta_zinc250k_v1_en_5.5.0_3.0_1726999929591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("chemberta_zinc250k_v1","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("chemberta_zinc250k_v1","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chemberta_zinc250k_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|309.9 MB| + +## References + +https://huggingface.co/seyonec/ChemBERTa-zinc250k-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-chemberta_zinc250k_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-chemberta_zinc250k_v1_pipeline_en.md new file mode 100644 index 00000000000000..627a998bbcc22f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-chemberta_zinc250k_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English chemberta_zinc250k_v1_pipeline pipeline RoBertaEmbeddings from seyonec +author: John Snow Labs +name: chemberta_zinc250k_v1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chemberta_zinc250k_v1_pipeline` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chemberta_zinc250k_v1_pipeline_en_5.5.0_3.0_1726999943429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chemberta_zinc250k_v1_pipeline_en_5.5.0_3.0_1726999943429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chemberta_zinc250k_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chemberta_zinc250k_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chemberta_zinc250k_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.9 MB| + +## References + +https://huggingface.co/seyonec/ChemBERTa-zinc250k-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-chinese_extract_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-chinese_extract_bert_pipeline_en.md new file mode 100644 index 00000000000000..ddc1656be95cef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-chinese_extract_bert_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English chinese_extract_bert_pipeline pipeline BertForQuestionAnswering from frett +author: John Snow Labs +name: chinese_extract_bert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_extract_bert_pipeline` is a English model originally trained by frett. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_extract_bert_pipeline_en_5.5.0_3.0_1727039843187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_extract_bert_pipeline_en_5.5.0_3.0_1727039843187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chinese_extract_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chinese_extract_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_extract_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/frett/chinese_extract_bert + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-classifiereutoplevelroberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-classifiereutoplevelroberta_pipeline_en.md new file mode 100644 index 00000000000000..1625401dc08b09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-classifiereutoplevelroberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classifiereutoplevelroberta_pipeline pipeline RoBertaForSequenceClassification from gianma +author: John Snow Labs +name: classifiereutoplevelroberta_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifiereutoplevelroberta_pipeline` is a English model originally trained by gianma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifiereutoplevelroberta_pipeline_en_5.5.0_3.0_1727037950517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifiereutoplevelroberta_pipeline_en_5.5.0_3.0_1727037950517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classifiereutoplevelroberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classifiereutoplevelroberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifiereutoplevelroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/gianma/classifierEUtopLevelRoberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-clinicalbertqa_200_en.md b/docs/_posts/ahmedlone127/2024-09-22-clinicalbertqa_200_en.md new file mode 100644 index 00000000000000..fec4c381b4dc15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-clinicalbertqa_200_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English clinicalbertqa_200 BertForQuestionAnswering from lanzv +author: John Snow Labs +name: clinicalbertqa_200 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalbertqa_200` is a English model originally trained by lanzv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalbertqa_200_en_5.5.0_3.0_1727049191313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalbertqa_200_en_5.5.0_3.0_1727049191313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("clinicalbertqa_200","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("clinicalbertqa_200", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalbertqa_200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/lanzv/ClinicalBERTQA_200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cv9_special_batch12_lr6_small_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-22-cv9_special_batch12_lr6_small_pipeline_id.md new file mode 100644 index 00000000000000..065182c7f8fe56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cv9_special_batch12_lr6_small_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian cv9_special_batch12_lr6_small_pipeline pipeline WhisperForCTC from TheRains +author: John Snow Labs +name: cv9_special_batch12_lr6_small_pipeline +date: 2024-09-22 +tags: [id, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cv9_special_batch12_lr6_small_pipeline` is a Indonesian model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cv9_special_batch12_lr6_small_pipeline_id_5.5.0_3.0_1727024625699.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cv9_special_batch12_lr6_small_pipeline_id_5.5.0_3.0_1727024625699.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cv9_special_batch12_lr6_small_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cv9_special_batch12_lr6_small_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cv9_special_batch12_lr6_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TheRains/cv9-special-batch12-lr6-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-danish_bert_review_sentiment_da.md b/docs/_posts/ahmedlone127/2024-09-22-danish_bert_review_sentiment_da.md new file mode 100644 index 00000000000000..307238a60d8d31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-danish_bert_review_sentiment_da.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Danish danish_bert_review_sentiment BertForSequenceClassification from KennethTM +author: John Snow Labs +name: danish_bert_review_sentiment +date: 2024-09-22 +tags: [da, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`danish_bert_review_sentiment` is a Danish model originally trained by KennethTM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/danish_bert_review_sentiment_da_5.5.0_3.0_1727029775457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/danish_bert_review_sentiment_da_5.5.0_3.0_1727029775457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("danish_bert_review_sentiment","da") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("danish_bert_review_sentiment", "da") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|danish_bert_review_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|da| +|Size:|414.5 MB| + +## References + +https://huggingface.co/KennethTM/danish-bert-review-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-danish_bert_review_sentiment_pipeline_da.md b/docs/_posts/ahmedlone127/2024-09-22-danish_bert_review_sentiment_pipeline_da.md new file mode 100644 index 00000000000000..25295610a1d411 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-danish_bert_review_sentiment_pipeline_da.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Danish danish_bert_review_sentiment_pipeline pipeline BertForSequenceClassification from KennethTM +author: John Snow Labs +name: danish_bert_review_sentiment_pipeline +date: 2024-09-22 +tags: [da, open_source, pipeline, onnx] +task: Text Classification +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`danish_bert_review_sentiment_pipeline` is a Danish model originally trained by KennethTM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/danish_bert_review_sentiment_pipeline_da_5.5.0_3.0_1727029800396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/danish_bert_review_sentiment_pipeline_da_5.5.0_3.0_1727029800396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("danish_bert_review_sentiment_pipeline", lang = "da") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("danish_bert_review_sentiment_pipeline", lang = "da") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|danish_bert_review_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|da| +|Size:|414.5 MB| + +## References + +https://huggingface.co/KennethTM/danish-bert-review-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-defsent_roberta_base_mean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-defsent_roberta_base_mean_pipeline_en.md new file mode 100644 index 00000000000000..479405408c7f46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-defsent_roberta_base_mean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English defsent_roberta_base_mean_pipeline pipeline RoBertaEmbeddings from cl-nagoya +author: John Snow Labs +name: defsent_roberta_base_mean_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`defsent_roberta_base_mean_pipeline` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_mean_pipeline_en_5.5.0_3.0_1727041791826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_mean_pipeline_en_5.5.0_3.0_1727041791826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("defsent_roberta_base_mean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("defsent_roberta_base_mean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|defsent_roberta_base_mean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|413.1 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-roberta-base-mean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-disaster_tweet_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-disaster_tweet_4_pipeline_en.md new file mode 100644 index 00000000000000..555ef275c46bfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-disaster_tweet_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English disaster_tweet_4_pipeline pipeline RoBertaForSequenceClassification from aellxx +author: John Snow Labs +name: disaster_tweet_4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`disaster_tweet_4_pipeline` is a English model originally trained by aellxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/disaster_tweet_4_pipeline_en_5.5.0_3.0_1727027036081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/disaster_tweet_4_pipeline_en_5.5.0_3.0_1727027036081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("disaster_tweet_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("disaster_tweet_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|disaster_tweet_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/aellxx/disaster-tweet-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-dissertation_sahil_bert_en.md b/docs/_posts/ahmedlone127/2024-09-22-dissertation_sahil_bert_en.md new file mode 100644 index 00000000000000..9b1eac749ea74f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-dissertation_sahil_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dissertation_sahil_bert BertForSequenceClassification from mahadev23 +author: John Snow Labs +name: dissertation_sahil_bert +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dissertation_sahil_bert` is a English model originally trained by mahadev23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dissertation_sahil_bert_en_5.5.0_3.0_1727029774986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dissertation_sahil_bert_en_5.5.0_3.0_1727029774986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("dissertation_sahil_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("dissertation_sahil_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dissertation_sahil_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mahadev23/dissertation_sahil_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-dissertation_sahil_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-dissertation_sahil_bert_pipeline_en.md new file mode 100644 index 00000000000000..b1f7da9eae1c1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-dissertation_sahil_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dissertation_sahil_bert_pipeline pipeline BertForSequenceClassification from mahadev23 +author: John Snow Labs +name: dissertation_sahil_bert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dissertation_sahil_bert_pipeline` is a English model originally trained by mahadev23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dissertation_sahil_bert_pipeline_en_5.5.0_3.0_1727029799949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dissertation_sahil_bert_pipeline_en_5.5.0_3.0_1727029799949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dissertation_sahil_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dissertation_sahil_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dissertation_sahil_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mahadev23/dissertation_sahil_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distbert_cpcd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distbert_cpcd_pipeline_en.md new file mode 100644 index 00000000000000..c559977fe8cc9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distbert_cpcd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distbert_cpcd_pipeline pipeline DistilBertForSequenceClassification from jnwnlee +author: John Snow Labs +name: distbert_cpcd_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distbert_cpcd_pipeline` is a English model originally trained by jnwnlee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distbert_cpcd_pipeline_en_5.5.0_3.0_1727035522113.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distbert_cpcd_pipeline_en_5.5.0_3.0_1727035522113.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distbert_cpcd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distbert_cpcd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distbert_cpcd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jnwnlee/distbert_cpcd + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_07_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_07_3_pipeline_en.md new file mode 100644 index 00000000000000..0dd166e1715c99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_07_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_07_3_pipeline pipeline DistilBertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: distilbert_07_3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_07_3_pipeline` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_07_3_pipeline_en_5.5.0_3.0_1727033257962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_07_3_pipeline_en_5.5.0_3.0_1727033257962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_07_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_07_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_07_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KalaiselvanD/distilbert_07_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_dataverse_2023_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_dataverse_2023_en.md new file mode 100644 index 00000000000000..266211444613a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_dataverse_2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_dataverse_2023 DistilBertForSequenceClassification from rajendrabaskota +author: John Snow Labs +name: distilbert_base_dataverse_2023 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_dataverse_2023` is a English model originally trained by rajendrabaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_dataverse_2023_en_5.5.0_3.0_1727033132297.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_dataverse_2023_en_5.5.0_3.0_1727033132297.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_dataverse_2023","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_dataverse_2023", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_dataverse_2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rajendrabaskota/distilbert-base-dataverse-2023 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_dataverse_2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_dataverse_2023_pipeline_en.md new file mode 100644 index 00000000000000..042b5c265d6d12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_dataverse_2023_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_dataverse_2023_pipeline pipeline DistilBertForSequenceClassification from rajendrabaskota +author: John Snow Labs +name: distilbert_base_dataverse_2023_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_dataverse_2023_pipeline` is a English model originally trained by rajendrabaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_dataverse_2023_pipeline_en_5.5.0_3.0_1727033153188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_dataverse_2023_pipeline_en_5.5.0_3.0_1727033153188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_dataverse_2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_dataverse_2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_dataverse_2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rajendrabaskota/distilbert-base-dataverse-2023 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_maybehesham_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_maybehesham_en.md new file mode 100644 index 00000000000000..7d7436753f5527 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_maybehesham_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_maybehesham DistilBertForSequenceClassification from MayBeHesham +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_maybehesham +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_maybehesham` is a English model originally trained by MayBeHesham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_maybehesham_en_5.5.0_3.0_1727012537981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_maybehesham_en_5.5.0_3.0_1727012537981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_maybehesham","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_maybehesham", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_maybehesham| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/MayBeHesham/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_bensonhugging_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_bensonhugging_en.md new file mode 100644 index 00000000000000..f2c6ff9b0f5406 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_bensonhugging_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_bensonhugging DistilBertForSequenceClassification from BensonHugging +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_bensonhugging +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_bensonhugging` is a English model originally trained by BensonHugging. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bensonhugging_en_5.5.0_3.0_1727033258488.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bensonhugging_en_5.5.0_3.0_1727033258488.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_bensonhugging","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_bensonhugging", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_bensonhugging| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BensonHugging/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_ravikant22_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_ravikant22_pipeline_en.md new file mode 100644 index 00000000000000..e82be3cf26026e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_ravikant22_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_ravikant22_pipeline pipeline DistilBertForSequenceClassification from ravikant22 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_ravikant22_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_ravikant22_pipeline` is a English model originally trained by ravikant22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_ravikant22_pipeline_en_5.5.0_3.0_1727020677398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_ravikant22_pipeline_en_5.5.0_3.0_1727020677398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_ravikant22_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_ravikant22_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_ravikant22_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ravikant22/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_beijaflor2024_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_beijaflor2024_en.md new file mode 100644 index 00000000000000..466671cee0455c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_beijaflor2024_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_beijaflor2024 DistilBertForSequenceClassification from Beijaflor2024 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_beijaflor2024 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_beijaflor2024` is a English model originally trained by Beijaflor2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_beijaflor2024_en_5.5.0_3.0_1727012538121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_beijaflor2024_en_5.5.0_3.0_1727012538121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_beijaflor2024","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_beijaflor2024", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_beijaflor2024| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Beijaflor2024/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline_en.md new file mode 100644 index 00000000000000..a2f111fc87fad9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline pipeline DistilBertForSequenceClassification from Jason-Oh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline` is a English model originally trained by Jason-Oh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline_en_5.5.0_3.0_1727020904370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline_en_5.5.0_3.0_1727020904370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jason_oh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jason-Oh/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_kaebams_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_kaebams_pipeline_en.md new file mode 100644 index 00000000000000..22bb0d36c2d1f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_kaebams_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_kaebams_pipeline pipeline DistilBertForSequenceClassification from kaebaMS +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_kaebams_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_kaebams_pipeline` is a English model originally trained by kaebaMS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kaebams_pipeline_en_5.5.0_3.0_1726980124056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_kaebams_pipeline_en_5.5.0_3.0_1726980124056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_kaebams_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_kaebams_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_kaebams_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kaebaMS/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_ouba_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_ouba_pipeline_en.md new file mode 100644 index 00000000000000..75fdb55f17fe32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_ouba_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ouba_pipeline pipeline DistilBertForSequenceClassification from ouba +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ouba_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ouba_pipeline` is a English model originally trained by ouba. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ouba_pipeline_en_5.5.0_3.0_1727033830279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ouba_pipeline_en_5.5.0_3.0_1727033830279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ouba_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ouba_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ouba_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ouba/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_fibleep_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_fibleep_en.md new file mode 100644 index 00000000000000..69bb18c0a72a68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_fibleep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_fibleep DistilBertForSequenceClassification from fibleep +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_fibleep +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_fibleep` is a English model originally trained by fibleep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_fibleep_en_5.5.0_3.0_1727035254547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_fibleep_en_5.5.0_3.0_1727035254547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_fibleep","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_fibleep", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_fibleep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fibleep/distilbert-base-uncased-finetuned-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_m_avoid_harm_seler_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_m_avoid_harm_seler_en.md new file mode 100644 index 00000000000000..0af343d7f4d0c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_m_avoid_harm_seler_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_m_avoid_harm_seler DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_m_avoid_harm_seler +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_m_avoid_harm_seler` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_avoid_harm_seler_en_5.5.0_3.0_1726979997943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_avoid_harm_seler_en_5.5.0_3.0_1726979997943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_m_avoid_harm_seler","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_m_avoid_harm_seler", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_m_avoid_harm_seler| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-m_avoid_harm_seler \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_feedback_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_feedback_pipeline_en.md new file mode 100644 index 00000000000000..76a1c733ea491b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_feedback_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_feedback_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_feedback_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_feedback_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_feedback_pipeline_en_5.5.0_3.0_1727012926017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_feedback_pipeline_en_5.5.0_3.0_1727012926017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_t_feedback_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_t_feedback_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_feedback_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_feedback + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_massive_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_massive_v1_pipeline_en.md new file mode 100644 index 00000000000000..22b5c23e00de0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_massive_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_massive_v1_pipeline pipeline DistilBertForSequenceClassification from benayas +author: John Snow Labs +name: distilbert_base_uncased_massive_v1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_massive_v1_pipeline` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_massive_v1_pipeline_en_5.5.0_3.0_1726980760518.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_massive_v1_pipeline_en_5.5.0_3.0_1726980760518.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_massive_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_massive_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_massive_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|258.0 MB| + +## References + +https://huggingface.co/benayas/distilbert-base-uncased-massive-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en.md new file mode 100644 index 00000000000000..0b12fd1f8b1896 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en_5.5.0_3.0_1726980011775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline_en_5.5.0_3.0_1726980011775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut1largePfxNf_simsp300_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md new file mode 100644 index 00000000000000..0a880ca42ad15a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1727033131932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1727033131932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st1sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_batch_size_64_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_batch_size_64_en.md new file mode 100644 index 00000000000000..5fdc359a18d1fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_batch_size_64_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_batch_size_64 DistilBertForSequenceClassification from K-kiron +author: John Snow Labs +name: distilbert_batch_size_64 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_batch_size_64` is a English model originally trained by K-kiron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_batch_size_64_en_5.5.0_3.0_1727012230226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_batch_size_64_en_5.5.0_3.0_1727012230226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_batch_size_64","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_batch_size_64", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_batch_size_64| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-kiron/distilbert-batch-size-64 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_emindurmus80_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_emindurmus80_en.md new file mode 100644 index 00000000000000..f978d9fbec7fdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_emindurmus80_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_emindurmus80 DistilBertForSequenceClassification from EminDurmus80 +author: John Snow Labs +name: distilbert_emotion_emindurmus80 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_emindurmus80` is a English model originally trained by EminDurmus80. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_emindurmus80_en_5.5.0_3.0_1726980329243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_emindurmus80_en_5.5.0_3.0_1726980329243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_emindurmus80","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_emindurmus80", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_emindurmus80| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EminDurmus80/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_ner_haydenbspence_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_ner_haydenbspence_en.md new file mode 100644 index 00000000000000..ecde4f32e2e89b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_ner_haydenbspence_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_ner_haydenbspence BertForTokenClassification from haydenbspence +author: John Snow Labs +name: distilbert_finetuned_ner_haydenbspence +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_ner_haydenbspence` is a English model originally trained by haydenbspence. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_ner_haydenbspence_en_5.5.0_3.0_1727030910144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_ner_haydenbspence_en_5.5.0_3.0_1727030910144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("distilbert_finetuned_ner_haydenbspence","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("distilbert_finetuned_ner_haydenbspence", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_ner_haydenbspence| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.8 MB| + +## References + +https://huggingface.co/haydenbspence/distilbert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_foundation_category_c5_finetune_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_foundation_category_c5_finetune_pipeline_en.md new file mode 100644 index 00000000000000..464d1d6e8795f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_foundation_category_c5_finetune_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_foundation_category_c5_finetune_pipeline pipeline DistilBertForSequenceClassification from eric-mc2 +author: John Snow Labs +name: distilbert_foundation_category_c5_finetune_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_foundation_category_c5_finetune_pipeline` is a English model originally trained by eric-mc2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_c5_finetune_pipeline_en_5.5.0_3.0_1727033501218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_c5_finetune_pipeline_en_5.5.0_3.0_1727033501218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_foundation_category_c5_finetune_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_foundation_category_c5_finetune_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_foundation_category_c5_finetune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eric-mc2/distilbert-foundation-category-c5-finetune + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline_en.md new file mode 100644 index 00000000000000..6b2003e2fe764e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline_en_5.5.0_3.0_1726980651348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline_en_5.5.0_3.0_1726980651348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_pretrain_wnli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_pretrain_wnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sst2_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sst2_padding20model_en.md new file mode 100644 index 00000000000000..96cc084e3a5e6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sst2_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sst2_padding20model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst2_padding20model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst2_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding20model_en_5.5.0_3.0_1726980724911.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding20model_en_5.5.0_3.0_1726980724911.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst2_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst2_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst2_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst2_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_turkish_turkish_spam_email_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_turkish_turkish_spam_email_pipeline_tr.md new file mode 100644 index 00000000000000..bfc2b4c039a512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_turkish_turkish_spam_email_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish distilbert_turkish_turkish_spam_email_pipeline pipeline DistilBertForSequenceClassification from anilguven +author: John Snow Labs +name: distilbert_turkish_turkish_spam_email_pipeline +date: 2024-09-22 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turkish_turkish_spam_email_pipeline` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_spam_email_pipeline_tr_5.5.0_3.0_1727020409409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turkish_turkish_spam_email_pipeline_tr_5.5.0_3.0_1727020409409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_turkish_turkish_spam_email_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_turkish_turkish_spam_email_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turkish_turkish_spam_email_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|254.1 MB| + +## References + +https://huggingface.co/anilguven/distilbert_tr_turkish_spam_email + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_condition_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_condition_classifier_pipeline_en.md new file mode 100644 index 00000000000000..5ba744db53ed24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_condition_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_condition_classifier_pipeline pipeline RoBertaForSequenceClassification from BanUrsus +author: John Snow Labs +name: distilroberta_base_finetuned_condition_classifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_condition_classifier_pipeline` is a English model originally trained by BanUrsus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_condition_classifier_pipeline_en_5.5.0_3.0_1727026427044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_condition_classifier_pipeline_en_5.5.0_3.0_1727026427044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_condition_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_condition_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_condition_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.4 MB| + +## References + +https://huggingface.co/BanUrsus/distilroberta-base-finetuned-condition-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-dopamin_post_training_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-dopamin_post_training_pipeline_en.md new file mode 100644 index 00000000000000..543b2218a27837 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-dopamin_post_training_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dopamin_post_training_pipeline pipeline RoBertaForSequenceClassification from Fsoft-AIC +author: John Snow Labs +name: dopamin_post_training_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dopamin_post_training_pipeline` is a English model originally trained by Fsoft-AIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dopamin_post_training_pipeline_en_5.5.0_3.0_1726967758843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dopamin_post_training_pipeline_en_5.5.0_3.0_1726967758843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dopamin_post_training_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dopamin_post_training_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dopamin_post_training_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/Fsoft-AIC/dopamin-post-training + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_en.md b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_en.md new file mode 100644 index 00000000000000..d4879658d6aa39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_en_5.5.0_3.0_1727037360173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_en_5.5.0_3.0_1727037360173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random3_seed1-twitter-roberta-base-2019-90m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fakenews_roberta_large_grad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-fakenews_roberta_large_grad_pipeline_en.md new file mode 100644 index 00000000000000..22cfdabc02a535 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fakenews_roberta_large_grad_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fakenews_roberta_large_grad_pipeline pipeline RoBertaForSequenceClassification from Denyol +author: John Snow Labs +name: fakenews_roberta_large_grad_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_roberta_large_grad_pipeline` is a English model originally trained by Denyol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_roberta_large_grad_pipeline_en_5.5.0_3.0_1727037681607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_roberta_large_grad_pipeline_en_5.5.0_3.0_1727037681607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fakenews_roberta_large_grad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fakenews_roberta_large_grad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_roberta_large_grad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Denyol/FakeNews-roberta-large-grad + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_en.md new file mode 100644 index 00000000000000..527d59cd95cd4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English final_ft__roberta_clinical_wl_spanish__70k_ultrasounds RoBertaEmbeddings from manucos +author: John Snow Labs +name: final_ft__roberta_clinical_wl_spanish__70k_ultrasounds +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_ft__roberta_clinical_wl_spanish__70k_ultrasounds` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_en_5.5.0_3.0_1726999500606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_en_5.5.0_3.0_1726999500606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("final_ft__roberta_clinical_wl_spanish__70k_ultrasounds","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("final_ft__roberta_clinical_wl_spanish__70k_ultrasounds","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_ft__roberta_clinical_wl_spanish__70k_ultrasounds| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|469.7 MB| + +## References + +https://huggingface.co/manucos/final-ft__roberta-clinical-wl-es__70k-ultrasounds \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_model_thebisso09_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_model_thebisso09_pipeline_en.md new file mode 100644 index 00000000000000..8a93b37891b754 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_model_thebisso09_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English final_model_thebisso09_pipeline pipeline DistilBertForSequenceClassification from Thebisso09 +author: John Snow Labs +name: final_model_thebisso09_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_model_thebisso09_pipeline` is a English model originally trained by Thebisso09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_model_thebisso09_pipeline_en_5.5.0_3.0_1727033721385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_model_thebisso09_pipeline_en_5.5.0_3.0_1727033721385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_model_thebisso09_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_model_thebisso09_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_model_thebisso09_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Thebisso09/final_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fine_tuned_roberta_nosql_injection_en.md b/docs/_posts/ahmedlone127/2024-09-22-fine_tuned_roberta_nosql_injection_en.md new file mode 100644 index 00000000000000..4343cd4aa349a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fine_tuned_roberta_nosql_injection_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuned_roberta_nosql_injection RoBertaEmbeddings from ankush-003 +author: John Snow Labs +name: fine_tuned_roberta_nosql_injection +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_roberta_nosql_injection` is a English model originally trained by ankush-003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_nosql_injection_en_5.5.0_3.0_1727041555348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_nosql_injection_en_5.5.0_3.0_1727041555348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("fine_tuned_roberta_nosql_injection","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("fine_tuned_roberta_nosql_injection","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_roberta_nosql_injection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/ankush-003/fine-tuned-roberta-nosql-injection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fine_tuned_roberta_nosql_injection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-fine_tuned_roberta_nosql_injection_pipeline_en.md new file mode 100644 index 00000000000000..f76f3d5b714439 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fine_tuned_roberta_nosql_injection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fine_tuned_roberta_nosql_injection_pipeline pipeline RoBertaEmbeddings from ankush-003 +author: John Snow Labs +name: fine_tuned_roberta_nosql_injection_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_roberta_nosql_injection_pipeline` is a English model originally trained by ankush-003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_nosql_injection_pipeline_en_5.5.0_3.0_1727041579383.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_nosql_injection_pipeline_en_5.5.0_3.0_1727041579383.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_roberta_nosql_injection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_roberta_nosql_injection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_roberta_nosql_injection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/ankush-003/fine-tuned-roberta-nosql-injection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuned_sentiment_model_imdb_distilbert_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuned_sentiment_model_imdb_distilbert_2_en.md new file mode 100644 index 00000000000000..07ec3d15cbfb7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuned_sentiment_model_imdb_distilbert_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_sentiment_model_imdb_distilbert_2 DistilBertForSequenceClassification from Tzimon +author: John Snow Labs +name: finetuned_sentiment_model_imdb_distilbert_2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sentiment_model_imdb_distilbert_2` is a English model originally trained by Tzimon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_model_imdb_distilbert_2_en_5.5.0_3.0_1727020969574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_model_imdb_distilbert_2_en_5.5.0_3.0_1727020969574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_sentiment_model_imdb_distilbert_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_sentiment_model_imdb_distilbert_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sentiment_model_imdb_distilbert_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Tzimon/finetuned_sentiment_model_imdb_distilbert_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_bianchidev_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_bianchidev_pipeline_en.md new file mode 100644 index 00000000000000..93d2d6ef1b69e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_bianchidev_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_bianchidev_pipeline pipeline DistilBertForSequenceClassification from BianchiDev +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_bianchidev_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_bianchidev_pipeline` is a English model originally trained by BianchiDev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bianchidev_pipeline_en_5.5.0_3.0_1727020698997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bianchidev_pipeline_en_5.5.0_3.0_1727020698997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_bianchidev_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_bianchidev_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_bianchidev_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BianchiDev/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_lwhite_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_lwhite_pipeline_en.md new file mode 100644 index 00000000000000..09612af17adcad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_lwhite_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_lwhite_pipeline pipeline DistilBertForSequenceClassification from lwhite +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_lwhite_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_lwhite_pipeline` is a English model originally trained by lwhite. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lwhite_pipeline_en_5.5.0_3.0_1726980318052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lwhite_pipeline_en_5.5.0_3.0_1726980318052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_lwhite_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_lwhite_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_lwhite_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lwhite/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_neo111x_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_neo111x_en.md new file mode 100644 index 00000000000000..69399fa0f89f4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_neo111x_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_neo111x DistilBertForSequenceClassification from Neo111x +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_neo111x +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_neo111x` is a English model originally trained by Neo111x. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_neo111x_en_5.5.0_3.0_1727020393301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_neo111x_en_5.5.0_3.0_1727020393301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_neo111x","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_neo111x", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_neo111x| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neo111x/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_nerproject7_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_nerproject7_en.md new file mode 100644 index 00000000000000..30451bbc02aff0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_nerproject7_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_nerproject7 DistilBertForSequenceClassification from nerproject7 +author: John Snow Labs +name: finetuning_sentiment_model_nerproject7 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_nerproject7` is a English model originally trained by nerproject7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_nerproject7_en_5.5.0_3.0_1726980204184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_nerproject7_en_5.5.0_3.0_1726980204184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_nerproject7","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_nerproject7", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_nerproject7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/nerproject7/finetuning-sentiment-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ft_distilbert_base_uncased_nlp_feup_en.md b/docs/_posts/ahmedlone127/2024-09-22-ft_distilbert_base_uncased_nlp_feup_en.md new file mode 100644 index 00000000000000..628db2ceb9a5fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ft_distilbert_base_uncased_nlp_feup_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ft_distilbert_base_uncased_nlp_feup DistilBertForSequenceClassification from NLP-FEUP +author: John Snow Labs +name: ft_distilbert_base_uncased_nlp_feup +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_distilbert_base_uncased_nlp_feup` is a English model originally trained by NLP-FEUP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_distilbert_base_uncased_nlp_feup_en_5.5.0_3.0_1727035506995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_distilbert_base_uncased_nlp_feup_en_5.5.0_3.0_1727035506995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_distilbert_base_uncased_nlp_feup","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ft_distilbert_base_uncased_nlp_feup", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_distilbert_base_uncased_nlp_feup| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/NLP-FEUP/FT-distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fyp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-fyp_pipeline_en.md new file mode 100644 index 00000000000000..e2eebabe3dbeaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fyp_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English fyp_pipeline pipeline T5Transformer from yaashwardhan +author: John Snow Labs +name: fyp_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fyp_pipeline` is a English model originally trained by yaashwardhan. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fyp_pipeline_en_5.5.0_3.0_1727034870402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fyp_pipeline_en_5.5.0_3.0_1727034870402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("fyp_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("fyp_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fyp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.0 MB| + +## References + +References + +https://huggingface.co/yaashwardhan/fyp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-genztranscribe_base_hindi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-genztranscribe_base_hindi_pipeline_en.md new file mode 100644 index 00000000000000..5b288edc521098 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-genztranscribe_base_hindi_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English genztranscribe_base_hindi_pipeline pipeline WhisperForCTC from KshitizPandya +author: John Snow Labs +name: genztranscribe_base_hindi_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`genztranscribe_base_hindi_pipeline` is a English model originally trained by KshitizPandya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/genztranscribe_base_hindi_pipeline_en_5.5.0_3.0_1726996400983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/genztranscribe_base_hindi_pipeline_en_5.5.0_3.0_1726996400983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("genztranscribe_base_hindi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("genztranscribe_base_hindi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|genztranscribe_base_hindi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|643.6 MB| + +## References + +https://huggingface.co/KshitizPandya/GenzTranscribe-base-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hindi_wordpiece_bert_test_2m_en.md b/docs/_posts/ahmedlone127/2024-09-22-hindi_wordpiece_bert_test_2m_en.md new file mode 100644 index 00000000000000..8b90382d0f7fa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hindi_wordpiece_bert_test_2m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hindi_wordpiece_bert_test_2m BertEmbeddings from rg1683 +author: John Snow Labs +name: hindi_wordpiece_bert_test_2m +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_wordpiece_bert_test_2m` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_wordpiece_bert_test_2m_en_5.5.0_3.0_1727008149400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_wordpiece_bert_test_2m_en_5.5.0_3.0_1727008149400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("hindi_wordpiece_bert_test_2m","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("hindi_wordpiece_bert_test_2m","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_wordpiece_bert_test_2m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|377.7 MB| + +## References + +https://huggingface.co/rg1683/hindi_wordpiece_bert_test_2m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hindi_wordpiece_bert_test_2m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hindi_wordpiece_bert_test_2m_pipeline_en.md new file mode 100644 index 00000000000000..bd91ff4be01a8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hindi_wordpiece_bert_test_2m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hindi_wordpiece_bert_test_2m_pipeline pipeline BertEmbeddings from rg1683 +author: John Snow Labs +name: hindi_wordpiece_bert_test_2m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_wordpiece_bert_test_2m_pipeline` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_wordpiece_bert_test_2m_pipeline_en_5.5.0_3.0_1727008166350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_wordpiece_bert_test_2m_pipeline_en_5.5.0_3.0_1727008166350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hindi_wordpiece_bert_test_2m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hindi_wordpiece_bert_test_2m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_wordpiece_bert_test_2m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|377.7 MB| + +## References + +https://huggingface.co/rg1683/hindi_wordpiece_bert_test_2m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hp_search_deberta_en.md b/docs/_posts/ahmedlone127/2024-09-22-hp_search_deberta_en.md new file mode 100644 index 00000000000000..7c1385a152f48c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hp_search_deberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hp_search_deberta BertForTokenClassification from cynthiachan +author: John Snow Labs +name: hp_search_deberta +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hp_search_deberta` is a English model originally trained by cynthiachan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hp_search_deberta_en_5.5.0_3.0_1726977667543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hp_search_deberta_en_5.5.0_3.0_1726977667543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("hp_search_deberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("hp_search_deberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hp_search_deberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.8 MB| + +## References + +https://huggingface.co/cynthiachan/hp-search-deberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imdb_5_en.md b/docs/_posts/ahmedlone127/2024-09-22-imdb_5_en.md new file mode 100644 index 00000000000000..b25f9e34e098c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imdb_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdb_5 DistilBertForSequenceClassification from draghicivlad +author: John Snow Labs +name: imdb_5 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_5` is a English model originally trained by draghicivlad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_5_en_5.5.0_3.0_1726980734662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_5_en_5.5.0_3.0_1726980734662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdb_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/draghicivlad/imdb_5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-irony_italian_it.md b/docs/_posts/ahmedlone127/2024-09-22-irony_italian_it.md new file mode 100644 index 00000000000000..ae1ace73586a59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-irony_italian_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian irony_italian BertForSequenceClassification from aequa-tech +author: John Snow Labs +name: irony_italian +date: 2024-09-22 +tags: [it, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`irony_italian` is a Italian model originally trained by aequa-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/irony_italian_it_5.5.0_3.0_1726976873858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/irony_italian_it_5.5.0_3.0_1726976873858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("irony_italian","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("irony_italian", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|irony_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|it| +|Size:|691.9 MB| + +## References + +https://huggingface.co/aequa-tech/irony-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-irony_italian_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-22-irony_italian_pipeline_it.md new file mode 100644 index 00000000000000..86448aae8736f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-irony_italian_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian irony_italian_pipeline pipeline BertForSequenceClassification from aequa-tech +author: John Snow Labs +name: irony_italian_pipeline +date: 2024-09-22 +tags: [it, open_source, pipeline, onnx] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`irony_italian_pipeline` is a Italian model originally trained by aequa-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/irony_italian_pipeline_it_5.5.0_3.0_1726976904272.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/irony_italian_pipeline_it_5.5.0_3.0_1726976904272.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("irony_italian_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("irony_italian_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|irony_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|691.9 MB| + +## References + +https://huggingface.co/aequa-tech/irony-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-kgi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-kgi_pipeline_en.md new file mode 100644 index 00000000000000..dc6a5b8a33f1de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-kgi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kgi_pipeline pipeline DistilBertForSequenceClassification from shrikant11 +author: John Snow Labs +name: kgi_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kgi_pipeline` is a English model originally trained by shrikant11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kgi_pipeline_en_5.5.0_3.0_1727033271387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kgi_pipeline_en_5.5.0_3.0_1727033271387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kgi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kgi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kgi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/shrikant11/KGI + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-kitchen_applinces_bert_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-22-kitchen_applinces_bert_classifier_en.md new file mode 100644 index 00000000000000..14d88f6cef24aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-kitchen_applinces_bert_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English kitchen_applinces_bert_classifier DistilBertForSequenceClassification from decepticonsIsAllYouNeed +author: John Snow Labs +name: kitchen_applinces_bert_classifier +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kitchen_applinces_bert_classifier` is a English model originally trained by decepticonsIsAllYouNeed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kitchen_applinces_bert_classifier_en_5.5.0_3.0_1727033567569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kitchen_applinces_bert_classifier_en_5.5.0_3.0_1727033567569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("kitchen_applinces_bert_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("kitchen_applinces_bert_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kitchen_applinces_bert_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.8 MB| + +## References + +https://huggingface.co/decepticonsIsAllYouNeed/kitchen_applinces_bert_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-legal_base_v1_5__checkpoint2_en.md b/docs/_posts/ahmedlone127/2024-09-22-legal_base_v1_5__checkpoint2_en.md new file mode 100644 index 00000000000000..fbd55f79e9292b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-legal_base_v1_5__checkpoint2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English legal_base_v1_5__checkpoint2 RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: legal_base_v1_5__checkpoint2 +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_base_v1_5__checkpoint2` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_base_v1_5__checkpoint2_en_5.5.0_3.0_1727041968560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_base_v1_5__checkpoint2_en_5.5.0_3.0_1727041968560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("legal_base_v1_5__checkpoint2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("legal_base_v1_5__checkpoint2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_base_v1_5__checkpoint2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|296.5 MB| + +## References + +https://huggingface.co/eduagarcia-temp/legal_base_v1_5__checkpoint2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-llm_practice001_en.md b/docs/_posts/ahmedlone127/2024-09-22-llm_practice001_en.md new file mode 100644 index 00000000000000..81797c2799908f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-llm_practice001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llm_practice001 DistilBertForSequenceClassification from JiAYu1997 +author: John Snow Labs +name: llm_practice001 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_practice001` is a English model originally trained by JiAYu1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_practice001_en_5.5.0_3.0_1727020505317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_practice001_en_5.5.0_3.0_1727020505317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_practice001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_practice001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_practice001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/JiAYu1997/LLM_Practice001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_from_bert_large_mrqa_en.md b/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_from_bert_large_mrqa_en.md new file mode 100644 index 00000000000000..bbb8cb0afd916c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_from_bert_large_mrqa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English minilmv2_l6_h384_from_bert_large_mrqa BertForQuestionAnswering from VMware +author: John Snow Labs +name: minilmv2_l6_h384_from_bert_large_mrqa +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minilmv2_l6_h384_from_bert_large_mrqa` is a English model originally trained by VMware. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h384_from_bert_large_mrqa_en_5.5.0_3.0_1726991819674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h384_from_bert_large_mrqa_en_5.5.0_3.0_1726991819674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("minilmv2_l6_h384_from_bert_large_mrqa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("minilmv2_l6_h384_from_bert_large_mrqa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minilmv2_l6_h384_from_bert_large_mrqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|84.3 MB| + +## References + +https://huggingface.co/VMware/minilmv2-l6-h384-from-bert-large-mrqa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_from_bert_large_mrqa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_from_bert_large_mrqa_pipeline_en.md new file mode 100644 index 00000000000000..073162d3dc6f72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_from_bert_large_mrqa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English minilmv2_l6_h384_from_bert_large_mrqa_pipeline pipeline BertForQuestionAnswering from VMware +author: John Snow Labs +name: minilmv2_l6_h384_from_bert_large_mrqa_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minilmv2_l6_h384_from_bert_large_mrqa_pipeline` is a English model originally trained by VMware. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h384_from_bert_large_mrqa_pipeline_en_5.5.0_3.0_1726991824080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h384_from_bert_large_mrqa_pipeline_en_5.5.0_3.0_1726991824080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("minilmv2_l6_h384_from_bert_large_mrqa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("minilmv2_l6_h384_from_bert_large_mrqa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minilmv2_l6_h384_from_bert_large_mrqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|84.3 MB| + +## References + +https://huggingface.co/VMware/minilmv2-l6-h384-from-bert-large-mrqa + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mmlu_physics_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-22-mmlu_physics_classifier_en.md new file mode 100644 index 00000000000000..2260a5ad6c0767 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mmlu_physics_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mmlu_physics_classifier RoBertaForSequenceClassification from chrisliu298 +author: John Snow Labs +name: mmlu_physics_classifier +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mmlu_physics_classifier` is a English model originally trained by chrisliu298. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mmlu_physics_classifier_en_5.5.0_3.0_1727026588008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mmlu_physics_classifier_en_5.5.0_3.0_1727026588008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("mmlu_physics_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("mmlu_physics_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mmlu_physics_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|439.3 MB| + +## References + +https://huggingface.co/chrisliu298/mmlu-physics_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_sentence_entailment_hackaton_coliee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_sentence_entailment_hackaton_coliee_pipeline_en.md new file mode 100644 index 00000000000000..a94e1ed8dce1d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_sentence_entailment_hackaton_coliee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_sentence_entailment_hackaton_coliee_pipeline pipeline RoBertaForSequenceClassification from ludoviciarraga +author: John Snow Labs +name: model_sentence_entailment_hackaton_coliee_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_sentence_entailment_hackaton_coliee_pipeline` is a English model originally trained by ludoviciarraga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_coliee_pipeline_en_5.5.0_3.0_1726967688097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_coliee_pipeline_en_5.5.0_3.0_1726967688097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_sentence_entailment_hackaton_coliee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_sentence_entailment_hackaton_coliee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_sentence_entailment_hackaton_coliee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ludoviciarraga/model_sentence_entailment_hackaton_coliee + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mpoclassification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-mpoclassification_pipeline_en.md new file mode 100644 index 00000000000000..402a3017699c32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mpoclassification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mpoclassification_pipeline pipeline DistilBertForSequenceClassification from inXistant +author: John Snow Labs +name: mpoclassification_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mpoclassification_pipeline` is a English model originally trained by inXistant. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpoclassification_pipeline_en_5.5.0_3.0_1727033901341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpoclassification_pipeline_en_5.5.0_3.0_1727033901341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mpoclassification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mpoclassification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mpoclassification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/inXistant/MPOClassification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-multiwd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-multiwd_pipeline_en.md new file mode 100644 index 00000000000000..2b072f49819893 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-multiwd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English multiwd_pipeline pipeline BertForSequenceClassification from Tianlin668 +author: John Snow Labs +name: multiwd_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multiwd_pipeline` is a English model originally trained by Tianlin668. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multiwd_pipeline_en_5.5.0_3.0_1727030531152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multiwd_pipeline_en_5.5.0_3.0_1727030531152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multiwd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multiwd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multiwd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.9 MB| + +## References + +https://huggingface.co/Tianlin668/MultiWD + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mysterious_bouncy_flan_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-mysterious_bouncy_flan_2_en.md new file mode 100644 index 00000000000000..5d2d649d26bbe0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mysterious_bouncy_flan_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mysterious_bouncy_flan_2 DistilBertForSequenceClassification from gaodrew +author: John Snow Labs +name: mysterious_bouncy_flan_2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mysterious_bouncy_flan_2` is a English model originally trained by gaodrew. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mysterious_bouncy_flan_2_en_5.5.0_3.0_1727012813647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mysterious_bouncy_flan_2_en_5.5.0_3.0_1727012813647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("mysterious_bouncy_flan_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("mysterious_bouncy_flan_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mysterious_bouncy_flan_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gaodrew/mysterious-bouncy-flan-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst2_padding50model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst2_padding50model_pipeline_en.md new file mode 100644 index 00000000000000..45e97a85a63079 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst2_padding50model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_sst2_padding50model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst2_padding50model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst2_padding50model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding50model_pipeline_en_5.5.0_3.0_1727020802858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding50model_pipeline_en_5.5.0_3.0_1727020802858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_sst2_padding50model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_sst2_padding50model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst2_padding50model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst2_padding50model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst5_padding10model_realgon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst5_padding10model_realgon_pipeline_en.md new file mode 100644 index 00000000000000..79e33f8ce1006e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst5_padding10model_realgon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_sst5_padding10model_realgon_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst5_padding10model_realgon_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding10model_realgon_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding10model_realgon_pipeline_en_5.5.0_3.0_1727033711955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding10model_realgon_pipeline_en_5.5.0_3.0_1727033711955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_sst5_padding10model_realgon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_sst5_padding10model_realgon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding10model_realgon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst5_padding10model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_gec_roberta_v3_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_gec_roberta_v3_en.md new file mode 100644 index 00000000000000..12f6965f6d1c38 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_gec_roberta_v3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_gec_roberta_v3 RoBertaForTokenClassification from fursov +author: John Snow Labs +name: ner_gec_roberta_v3 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_gec_roberta_v3` is a English model originally trained by fursov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_gec_roberta_v3_en_5.5.0_3.0_1727048525709.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_gec_roberta_v3_en_5.5.0_3.0_1727048525709.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ner_gec_roberta_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ner_gec_roberta_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_gec_roberta_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|439.1 MB| + +## References + +https://huggingface.co/fursov/ner-gec-roberta-v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-nlp2_base_3e_4_fixed_en.md b/docs/_posts/ahmedlone127/2024-09-22-nlp2_base_3e_4_fixed_en.md new file mode 100644 index 00000000000000..f3689c7482d605 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-nlp2_base_3e_4_fixed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp2_base_3e_4_fixed DistilBertForSequenceClassification from VRT-2428211 +author: John Snow Labs +name: nlp2_base_3e_4_fixed +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp2_base_3e_4_fixed` is a English model originally trained by VRT-2428211. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_4_fixed_en_5.5.0_3.0_1727033713332.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_4_fixed_en_5.5.0_3.0_1727033713332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp2_base_3e_4_fixed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp2_base_3e_4_fixed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp2_base_3e_4_fixed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VRT-2428211/NLP2_Base_3e-4_Fixed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-p_model_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-p_model_2_pipeline_en.md new file mode 100644 index 00000000000000..77c3d49c98d0c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-p_model_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English p_model_2_pipeline pipeline DistilBertForSequenceClassification from Habaznya +author: John Snow Labs +name: p_model_2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`p_model_2_pipeline` is a English model originally trained by Habaznya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/p_model_2_pipeline_en_5.5.0_3.0_1727012583305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/p_model_2_pipeline_en_5.5.0_3.0_1727012583305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("p_model_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("p_model_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|p_model_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/Habaznya/p_model_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_ricardo_talavera_en.md b/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_ricardo_talavera_en.md new file mode 100644 index 00000000000000..dab036b6654e72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_ricardo_talavera_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_ricardo_talavera RoBertaForSequenceClassification from ricardotalavera +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_ricardo_talavera +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_ricardo_talavera` is a English model originally trained by ricardotalavera. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_ricardo_talavera_en_5.5.0_3.0_1727026768137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_ricardo_talavera_en_5.5.0_3.0_1727026768137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_ricardo_talavera","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_ricardo_talavera", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_ricardo_talavera| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/ricardotalavera/platzi-distilroberta-base-mrpc-glue-ricardo-talavera \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-proposed_mediumf_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-proposed_mediumf_model_pipeline_en.md new file mode 100644 index 00000000000000..f9fc667b6f9be8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-proposed_mediumf_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English proposed_mediumf_model_pipeline pipeline RoBertaEmbeddings from athar +author: John Snow Labs +name: proposed_mediumf_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`proposed_mediumf_model_pipeline` is a English model originally trained by athar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/proposed_mediumf_model_pipeline_en_5.5.0_3.0_1727041567718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/proposed_mediumf_model_pipeline_en_5.5.0_3.0_1727041567718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("proposed_mediumf_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("proposed_mediumf_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|proposed_mediumf_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|364.7 MB| + +## References + +https://huggingface.co/athar/proposed_MEDIUMF-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-pytorch_distilbert3_fallsclassifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-pytorch_distilbert3_fallsclassifier_pipeline_en.md new file mode 100644 index 00000000000000..340cba8fa21a63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-pytorch_distilbert3_fallsclassifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pytorch_distilbert3_fallsclassifier_pipeline pipeline DistilBertForSequenceClassification from Blaise-MR +author: John Snow Labs +name: pytorch_distilbert3_fallsclassifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pytorch_distilbert3_fallsclassifier_pipeline` is a English model originally trained by Blaise-MR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pytorch_distilbert3_fallsclassifier_pipeline_en_5.5.0_3.0_1726980013810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pytorch_distilbert3_fallsclassifier_pipeline_en_5.5.0_3.0_1726980013810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pytorch_distilbert3_fallsclassifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pytorch_distilbert3_fallsclassifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pytorch_distilbert3_fallsclassifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Blaise-MR/pytorch_distilbert3_fallsclassifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-qa_bert_base_multilingual_cased_finetuned_squad_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-qa_bert_base_multilingual_cased_finetuned_squad_pipeline_xx.md new file mode 100644 index 00000000000000..3b6b961e010c8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-qa_bert_base_multilingual_cased_finetuned_squad_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual qa_bert_base_multilingual_cased_finetuned_squad_pipeline pipeline BertForQuestionAnswering from itsamitkumar +author: John Snow Labs +name: qa_bert_base_multilingual_cased_finetuned_squad_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_bert_base_multilingual_cased_finetuned_squad_pipeline` is a Multilingual model originally trained by itsamitkumar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_bert_base_multilingual_cased_finetuned_squad_pipeline_xx_5.5.0_3.0_1727049250023.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_bert_base_multilingual_cased_finetuned_squad_pipeline_xx_5.5.0_3.0_1727049250023.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qa_bert_base_multilingual_cased_finetuned_squad_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qa_bert_base_multilingual_cased_finetuned_squad_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_bert_base_multilingual_cased_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/itsamitkumar/qa_bert-base-multilingual-cased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-reward_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-reward_model_en.md new file mode 100644 index 00000000000000..ceee99d26db197 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-reward_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English reward_model RoBertaForSequenceClassification from lillybak +author: John Snow Labs +name: reward_model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`reward_model` is a English model originally trained by lillybak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/reward_model_en_5.5.0_3.0_1727037836585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/reward_model_en_5.5.0_3.0_1727037836585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("reward_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("reward_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|reward_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/lillybak/reward_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-robbert_emotions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-robbert_emotions_pipeline_en.md new file mode 100644 index 00000000000000..5f3e59e33d0735 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-robbert_emotions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robbert_emotions_pipeline pipeline RoBertaForSequenceClassification from rroell +author: John Snow Labs +name: robbert_emotions_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_emotions_pipeline` is a English model originally trained by rroell. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_emotions_pipeline_en_5.5.0_3.0_1727026887748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_emotions_pipeline_en_5.5.0_3.0_1727026887748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robbert_emotions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robbert_emotions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_emotions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.0 MB| + +## References + +https://huggingface.co/rroell/RoBBERT-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_dark_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_dark_en.md new file mode 100644 index 00000000000000..22b17a3476494e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_dark_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_dark RoBertaForSequenceClassification from geektech +author: John Snow Labs +name: roberta_base_finetuned_dark +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_dark` is a English model originally trained by geektech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_dark_en_5.5.0_3.0_1727026437444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_dark_en_5.5.0_3.0_1727026437444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_dark","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_dark", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_dark| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|423.3 MB| + +## References + +https://huggingface.co/geektech/roberta-base-finetuned-dark \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_hoax_classifier_defs_1h2r_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_hoax_classifier_defs_1h2r_pipeline_en.md new file mode 100644 index 00000000000000..3b7f749560b0b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_hoax_classifier_defs_1h2r_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_hoax_classifier_defs_1h2r_pipeline pipeline RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_base_hoax_classifier_defs_1h2r_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hoax_classifier_defs_1h2r_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h2r_pipeline_en_5.5.0_3.0_1727037412878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h2r_pipeline_en_5.5.0_3.0_1727037412878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_hoax_classifier_defs_1h2r_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_hoax_classifier_defs_1h2r_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hoax_classifier_defs_1h2r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|432.7 MB| + +## References + +https://huggingface.co/research-dump/roberta-base_hoax_classifier_defs_1h2r + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_md_gender_bias_trained_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_md_gender_bias_trained_pipeline_en.md new file mode 100644 index 00000000000000..5d0dcb4b6a0a31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_md_gender_bias_trained_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_md_gender_bias_trained_pipeline pipeline RoBertaForSequenceClassification from JakobKaiser +author: John Snow Labs +name: roberta_base_md_gender_bias_trained_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_md_gender_bias_trained_pipeline` is a English model originally trained by JakobKaiser. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_md_gender_bias_trained_pipeline_en_5.5.0_3.0_1727017119375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_md_gender_bias_trained_pipeline_en_5.5.0_3.0_1727017119375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_md_gender_bias_trained_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_md_gender_bias_trained_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_md_gender_bias_trained_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.0 MB| + +## References + +https://huggingface.co/JakobKaiser/roberta-base-md_gender_bias-trained + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_plausibility_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_plausibility_en.md new file mode 100644 index 00000000000000..2f58898c2ac32d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_plausibility_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_plausibility RoBertaForSequenceClassification from ianporada +author: John Snow Labs +name: roberta_base_plausibility +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_plausibility` is a English model originally trained by ianporada. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_plausibility_en_5.5.0_3.0_1727026440179.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_plausibility_en_5.5.0_3.0_1727026440179.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_plausibility","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_plausibility", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_plausibility| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|420.3 MB| + +## References + +https://huggingface.co/ianporada/roberta_base_plausibility \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_bert_10_unmalicious_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_bert_10_unmalicious_pipeline_en.md new file mode 100644 index 00000000000000..44b388c74b3f59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_bert_10_unmalicious_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_bert_10_unmalicious_pipeline pipeline RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: roberta_bert_10_unmalicious_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_bert_10_unmalicious_pipeline` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_bert_10_unmalicious_pipeline_en_5.5.0_3.0_1726999645200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_bert_10_unmalicious_pipeline_en_5.5.0_3.0_1726999645200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_bert_10_unmalicious_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_bert_10_unmalicious_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_bert_10_unmalicious_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/ubaskota/roberta_BERT_10_unmalicious + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_cws_ctb6_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_cws_ctb6_en.md new file mode 100644 index 00000000000000..960ca7773f42c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_cws_ctb6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_cws_ctb6 BertForTokenClassification from tjspross +author: John Snow Labs +name: roberta_cws_ctb6 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cws_ctb6` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cws_ctb6_en_5.5.0_3.0_1727045698272.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cws_ctb6_en_5.5.0_3.0_1727045698272.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("roberta_cws_ctb6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("roberta_cws_ctb6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cws_ctb6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tjspross/roberta_cws_ctb6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_cws_ctb6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_cws_ctb6_pipeline_en.md new file mode 100644 index 00000000000000..2179a17a482cfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_cws_ctb6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_cws_ctb6_pipeline pipeline BertForTokenClassification from tjspross +author: John Snow Labs +name: roberta_cws_ctb6_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cws_ctb6_pipeline` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cws_ctb6_pipeline_en_5.5.0_3.0_1727045755776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cws_ctb6_pipeline_en_5.5.0_3.0_1727045755776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_cws_ctb6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_cws_ctb6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cws_ctb6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tjspross/roberta_cws_ctb6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_large_mnli_model3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_mnli_model3_pipeline_en.md new file mode 100644 index 00000000000000..794907ad110878 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_mnli_model3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_mnli_model3_pipeline pipeline RoBertaForSequenceClassification from varun-v-rao +author: John Snow Labs +name: roberta_large_mnli_model3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_mnli_model3_pipeline` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_model3_pipeline_en_5.5.0_3.0_1727038084170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_model3_pipeline_en_5.5.0_3.0_1727038084170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_mnli_model3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_mnli_model3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_mnli_model3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/varun-v-rao/roberta-large-mnli-model3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_astro_hep_bert_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_astro_hep_bert_en.md new file mode 100644 index 00000000000000..474e149420dee4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_astro_hep_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_astro_hep_bert BertSentenceEmbeddings from arnosimons +author: John Snow Labs +name: sent_astro_hep_bert +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_astro_hep_bert` is a English model originally trained by arnosimons. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_astro_hep_bert_en_5.5.0_3.0_1726964678554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_astro_hep_bert_en_5.5.0_3.0_1726964678554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_astro_hep_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_astro_hep_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_astro_hep_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|404.1 MB| + +## References + +https://huggingface.co/arnosimons/astro-hep-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_finnish_europeana_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_finnish_europeana_cased_pipeline_en.md new file mode 100644 index 00000000000000..ef1938c0260ae6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_finnish_europeana_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_finnish_europeana_cased_pipeline pipeline BertSentenceEmbeddings from dbmdz +author: John Snow Labs +name: sent_bert_base_finnish_europeana_cased_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_finnish_europeana_cased_pipeline` is a English model originally trained by dbmdz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_finnish_europeana_cased_pipeline_en_5.5.0_3.0_1727013611484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_finnish_europeana_cased_pipeline_en_5.5.0_3.0_1727013611484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_finnish_europeana_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_finnish_europeana_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_finnish_europeana_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.9 MB| + +## References + +https://huggingface.co/dbmdz/bert-base-finnish-europeana-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_cased_finetuned_swiss_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_cased_finetuned_swiss_pipeline_de.md new file mode 100644 index 00000000000000..d87e16d4b4a069 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_cased_finetuned_swiss_pipeline_de.md @@ -0,0 +1,71 @@ +--- +layout: model +title: German sent_bert_base_german_cased_finetuned_swiss_pipeline pipeline BertSentenceEmbeddings from statworx +author: John Snow Labs +name: sent_bert_base_german_cased_finetuned_swiss_pipeline +date: 2024-09-22 +tags: [de, open_source, pipeline, onnx] +task: Embeddings +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_german_cased_finetuned_swiss_pipeline` is a German model originally trained by statworx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_cased_finetuned_swiss_pipeline_de_5.5.0_3.0_1727047063507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_cased_finetuned_swiss_pipeline_de_5.5.0_3.0_1727047063507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_german_cased_finetuned_swiss_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_german_cased_finetuned_swiss_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_german_cased_finetuned_swiss_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|407.4 MB| + +## References + +https://huggingface.co/statworx/bert-base-german-cased-finetuned-swiss + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_finetuned_swahili_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_finetuned_swahili_pipeline_xx.md new file mode 100644 index 00000000000000..fba8ba76e20848 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_finetuned_swahili_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_swahili_pipeline pipeline BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_swahili_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_swahili_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_swahili_pipeline_xx_5.5.0_3.0_1727001928114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_swahili_pipeline_xx_5.5.0_3.0_1727001928114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_swahili_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_swahili_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|664.7 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_bible_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_bible_en.md new file mode 100644 index 00000000000000..06980b4b50c23a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_bible_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_bible BertSentenceEmbeddings from Pragash-Mohanarajah +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_bible +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_bible` is a English model originally trained by Pragash-Mohanarajah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_bible_en_5.5.0_3.0_1727001517315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_bible_en_5.5.0_3.0_1727001517315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_bible","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_bible","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_bible| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/Pragash-Mohanarajah/bert-base-uncased-finetuned-bible \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_transformersbook_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_transformersbook_en.md new file mode 100644 index 00000000000000..297032ee0aff4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_transformersbook_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_transformersbook BertSentenceEmbeddings from transformersbook +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_transformersbook +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_transformersbook` is a English model originally trained by transformersbook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_transformersbook_en_5.5.0_3.0_1727013623210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_transformersbook_en_5.5.0_3.0_1727013623210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_transformersbook","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_transformersbook","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_transformersbook| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/transformersbook/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bioptimus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bioptimus_pipeline_en.md new file mode 100644 index 00000000000000..d8ce22ac4bcccc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bioptimus_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bioptimus_pipeline pipeline BertSentenceEmbeddings from rttl-ai +author: John Snow Labs +name: sent_bioptimus_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bioptimus_pipeline` is a English model originally trained by rttl-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bioptimus_pipeline_en_5.5.0_3.0_1727047088568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bioptimus_pipeline_en_5.5.0_3.0_1727047088568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bioptimus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bioptimus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bioptimus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.7 MB| + +## References + +https://huggingface.co/rttl-ai/BIOptimus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_hindi_bpe_bert_test_2m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_hindi_bpe_bert_test_2m_pipeline_en.md new file mode 100644 index 00000000000000..2a7677f5e7bda0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_hindi_bpe_bert_test_2m_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_hindi_bpe_bert_test_2m_pipeline pipeline BertSentenceEmbeddings from rg1683 +author: John Snow Labs +name: sent_hindi_bpe_bert_test_2m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_bpe_bert_test_2m_pipeline` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_bpe_bert_test_2m_pipeline_en_5.5.0_3.0_1727004347253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_bpe_bert_test_2m_pipeline_en_5.5.0_3.0_1727004347253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_hindi_bpe_bert_test_2m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_hindi_bpe_bert_test_2m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_bpe_bert_test_2m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|378.3 MB| + +## References + +https://huggingface.co/rg1683/hindi_bpe_bert_test_2m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_ksl_bert_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_ksl_bert_en.md new file mode 100644 index 00000000000000..9e5b0673a3ef86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_ksl_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_ksl_bert BertSentenceEmbeddings from dobbytk +author: John Snow Labs +name: sent_ksl_bert +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_ksl_bert` is a English model originally trained by dobbytk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_ksl_bert_en_5.5.0_3.0_1727047077059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_ksl_bert_en_5.5.0_3.0_1727047077059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_ksl_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_ksl_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_ksl_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.2 MB| + +## References + +https://huggingface.co/dobbytk/KSL-BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_ksl_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_ksl_bert_pipeline_en.md new file mode 100644 index 00000000000000..599dec27802abe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_ksl_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_ksl_bert_pipeline pipeline BertSentenceEmbeddings from dobbytk +author: John Snow Labs +name: sent_ksl_bert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_ksl_bert_pipeline` is a English model originally trained by dobbytk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_ksl_bert_pipeline_en_5.5.0_3.0_1727047097969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_ksl_bert_pipeline_en_5.5.0_3.0_1727047097969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_ksl_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_ksl_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_ksl_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/dobbytk/KSL-BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentence_classification_bitnet_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentence_classification_bitnet_en.md new file mode 100644 index 00000000000000..abbe265caace01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentence_classification_bitnet_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentence_classification_bitnet DistilBertForSequenceClassification from sanjeev-bhandari01 +author: John Snow Labs +name: sentence_classification_bitnet +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentence_classification_bitnet` is a English model originally trained by sanjeev-bhandari01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentence_classification_bitnet_en_5.5.0_3.0_1726980414810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentence_classification_bitnet_en_5.5.0_3.0_1726980414810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentence_classification_bitnet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentence_classification_bitnet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentence_classification_bitnet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|248.5 MB| + +## References + +https://huggingface.co/sanjeev-bhandari01/sentence_classification_bitnet \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_preetham04_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_preetham04_en.md new file mode 100644 index 00000000000000..f5f7266b11bc5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_preetham04_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_preetham04 BertForSequenceClassification from Preetham04 +author: John Snow Labs +name: sentiment_analysis_preetham04 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_preetham04` is a English model originally trained by Preetham04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_preetham04_en_5.5.0_3.0_1727034100449.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_preetham04_en_5.5.0_3.0_1727034100449.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sentiment_analysis_preetham04","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sentiment_analysis_preetham04", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_preetham04| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Preetham04/sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_en.md new file mode 100644 index 00000000000000..030a582ef40375 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020 RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_en_5.5.0_3.0_1727026421434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_en_5.5.0_3.0_1727026421434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random0_seed2-twitter-roberta-base-dec2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_bertweet_large_en.md new file mode 100644 index 00000000000000..6a6afeec5861ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_bertweet_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bertweet_large_en_5.5.0_3.0_1727037682282.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bertweet_large_en_5.5.0_3.0_1727037682282.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..ed537da0cff14c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline_en_5.5.0_3.0_1727037780253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline_en_5.5.0_3.0_1727037780253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_roberta_base_en.md new file mode 100644 index 00000000000000..8a028ebaf92777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_roberta_base RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_roberta_base +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_roberta_base` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_roberta_base_en_5.5.0_3.0_1727026583169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_roberta_base_en_5.5.0_3.0_1727026583169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_roberta_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_roberta_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|430.5 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed1_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed1_bertweet_large_en.md new file mode 100644 index 00000000000000..d771b5b5aecffe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed1_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed1_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed1_bertweet_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed1_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed1_bertweet_large_en_5.5.0_3.0_1727037875534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed1_bertweet_large_en_5.5.0_3.0_1727037875534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed1_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed1_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed1_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed1-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..70a364b82d7fc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline_en_5.5.0_3.0_1727037968650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline_en_5.5.0_3.0_1727037968650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed1_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed1-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-spam_en.md b/docs/_posts/ahmedlone127/2024-09-22-spam_en.md new file mode 100644 index 00000000000000..223d28120f71ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-spam_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spam DistilBertForSequenceClassification from Luisdahuis +author: John Snow Labs +name: spam +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spam` is a English model originally trained by Luisdahuis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spam_en_5.5.0_3.0_1727020393173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spam_en_5.5.0_3.0_1727020393173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("spam","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("spam", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spam| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Luisdahuis/spam \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-tanya_mama_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-tanya_mama_ner_pipeline_en.md new file mode 100644 index 00000000000000..bec6e4d2a3e527 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-tanya_mama_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tanya_mama_ner_pipeline pipeline XlmRoBertaForTokenClassification from Domo123 +author: John Snow Labs +name: tanya_mama_ner_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tanya_mama_ner_pipeline` is a English model originally trained by Domo123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tanya_mama_ner_pipeline_en_5.5.0_3.0_1727019586437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tanya_mama_ner_pipeline_en_5.5.0_3.0_1727019586437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tanya_mama_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tanya_mama_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tanya_mama_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|841.6 MB| + +## References + +https://huggingface.co/Domo123/tanya-mama-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_glue_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_glue_en.md new file mode 100644 index 00000000000000..68c7fd4022fc56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_glue_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_glue DistilBertForSequenceClassification from honghk +author: John Snow Labs +name: test_glue +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_glue` is a English model originally trained by honghk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_glue_en_5.5.0_3.0_1727012749644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_glue_en_5.5.0_3.0_1727012749644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_glue","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_glue", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_glue| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/honghk/test-glue \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random1_seed1_roberta_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random1_seed1_roberta_large_pipeline_en.md new file mode 100644 index 00000000000000..50118f187bc6fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random1_seed1_roberta_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English topic_topic_random1_seed1_roberta_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random1_seed1_roberta_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random1_seed1_roberta_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed1_roberta_large_pipeline_en_5.5.0_3.0_1727027029830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed1_roberta_large_pipeline_en_5.5.0_3.0_1727027029830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("topic_topic_random1_seed1_roberta_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("topic_topic_random1_seed1_roberta_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random1_seed1_roberta_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random1_seed1-roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random2_seed2_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random2_seed2_bertweet_large_en.md new file mode 100644 index 00000000000000..c56319d3ccc6c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random2_seed2_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random2_seed2_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random2_seed2_bertweet_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random2_seed2_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random2_seed2_bertweet_large_en_5.5.0_3.0_1727016881218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random2_seed2_bertweet_large_en_5.5.0_3.0_1727016881218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("topic_topic_random2_seed2_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("topic_topic_random2_seed2_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random2_seed2_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random2_seed2-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random2_seed2_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random2_seed2_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..1a8c1edaf62be6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random2_seed2_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English topic_topic_random2_seed2_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random2_seed2_bertweet_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random2_seed2_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random2_seed2_bertweet_large_pipeline_en_5.5.0_3.0_1727016953455.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random2_seed2_bertweet_large_pipeline_en_5.5.0_3.0_1727016953455.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("topic_topic_random2_seed2_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("topic_topic_random2_seed2_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random2_seed2_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random2_seed2-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-transformer_classification_lex_ceo_test_en.md b/docs/_posts/ahmedlone127/2024-09-22-transformer_classification_lex_ceo_test_en.md new file mode 100644 index 00000000000000..aa5f1591bf1b5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-transformer_classification_lex_ceo_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English transformer_classification_lex_ceo_test RoBertaForSequenceClassification from rd-1 +author: John Snow Labs +name: transformer_classification_lex_ceo_test +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transformer_classification_lex_ceo_test` is a English model originally trained by rd-1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transformer_classification_lex_ceo_test_en_5.5.0_3.0_1727027006388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transformer_classification_lex_ceo_test_en_5.5.0_3.0_1727027006388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("transformer_classification_lex_ceo_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("transformer_classification_lex_ceo_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transformer_classification_lex_ceo_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/rd-1/transformer_classification_lex_ceo_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-uned_tfg_08_62_mas_frecuentes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-uned_tfg_08_62_mas_frecuentes_pipeline_en.md new file mode 100644 index 00000000000000..25257bcaf249ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-uned_tfg_08_62_mas_frecuentes_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English uned_tfg_08_62_mas_frecuentes_pipeline pipeline RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_62_mas_frecuentes_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_62_mas_frecuentes_pipeline` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_62_mas_frecuentes_pipeline_en_5.5.0_3.0_1727027016389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_62_mas_frecuentes_pipeline_en_5.5.0_3.0_1727027016389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("uned_tfg_08_62_mas_frecuentes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("uned_tfg_08_62_mas_frecuentes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_62_mas_frecuentes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|430.8 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.62_mas_frecuentes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base_germanmed_full_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_germanmed_full_pipeline_de.md new file mode 100644 index 00000000000000..59714bd0ae1e60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_germanmed_full_pipeline_de.md @@ -0,0 +1,69 @@ +--- +layout: model +title: German whisper_base_germanmed_full_pipeline pipeline WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_base_germanmed_full_pipeline +date: 2024-09-22 +tags: [de, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_germanmed_full_pipeline` is a German model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_germanmed_full_pipeline_de_5.5.0_3.0_1727023141529.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_germanmed_full_pipeline_de_5.5.0_3.0_1727023141529.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_germanmed_full_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_germanmed_full_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_germanmed_full_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|614.6 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-base-GermanMed-full + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabict12_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabict12_pipeline_ar.md new file mode 100644 index 00000000000000..7e0f1bf5bb10db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabict12_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_arabict12_pipeline pipeline WhisperForCTC from taqwa92 +author: John Snow Labs +name: whisper_small_arabict12_pipeline +date: 2024-09-22 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabict12_pipeline` is a Arabic model originally trained by taqwa92. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabict12_pipeline_ar_5.5.0_3.0_1726994585142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabict12_pipeline_ar_5.5.0_3.0_1726994585142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabict12_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabict12_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabict12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/taqwa92/whisper-small-ArabicT12 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_atc_san2003m_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_atc_san2003m_en.md new file mode 100644 index 00000000000000..080bb3436725f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_atc_san2003m_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_atc_san2003m WhisperForCTC from san2003m +author: John Snow Labs +name: whisper_small_atc_san2003m +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_atc_san2003m` is a English model originally trained by san2003m. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_atc_san2003m_en_5.5.0_3.0_1726983312154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_atc_san2003m_en_5.5.0_3.0_1726983312154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_atc_san2003m","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_atc_san2003m", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_atc_san2003m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/san2003m/whisper-small-atc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_avnishkanungo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_avnishkanungo_pipeline_en.md new file mode 100644 index 00000000000000..a70907333a7bc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_avnishkanungo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_divehi_avnishkanungo_pipeline pipeline WhisperForCTC from avnishkanungo +author: John Snow Labs +name: whisper_small_divehi_avnishkanungo_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_avnishkanungo_pipeline` is a English model originally trained by avnishkanungo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_avnishkanungo_pipeline_en_5.5.0_3.0_1727024706116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_avnishkanungo_pipeline_en_5.5.0_3.0_1727024706116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_avnishkanungo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_avnishkanungo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_avnishkanungo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/avnishkanungo/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_macedonian_mk.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_macedonian_mk.md new file mode 100644 index 00000000000000..e380fc71db0e7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_macedonian_mk.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Macedonian whisper_small_macedonian WhisperForCTC from goran +author: John Snow Labs +name: whisper_small_macedonian +date: 2024-09-22 +tags: [mk, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mk +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_macedonian` is a Macedonian model originally trained by goran. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_macedonian_mk_5.5.0_3.0_1726995006790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_macedonian_mk_5.5.0_3.0_1726995006790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_macedonian","mk") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_macedonian", "mk") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_macedonian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mk| +|Size:|1.7 GB| + +## References + +https://huggingface.co/goran/whisper-small.mk \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_r2_50k_2ep_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_r2_50k_2ep_en.md new file mode 100644 index 00000000000000..4606a058038e4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_r2_50k_2ep_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_r2_50k_2ep WhisperForCTC from spsither +author: John Snow Labs +name: whisper_small_r2_50k_2ep +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_r2_50k_2ep` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_r2_50k_2ep_en_5.5.0_3.0_1727024814275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_r2_50k_2ep_en_5.5.0_3.0_1727024814275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_r2_50k_2ep","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_r2_50k_2ep", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_r2_50k_2ep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/spsither/whisper-small-r2-50k-2ep \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_turkish_cp2_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_turkish_cp2_pipeline_tr.md new file mode 100644 index 00000000000000..437fd1f9924849 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_turkish_cp2_pipeline_tr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Turkish whisper_small_turkish_cp2_pipeline pipeline WhisperForCTC from Kiwipirate +author: John Snow Labs +name: whisper_small_turkish_cp2_pipeline +date: 2024-09-22 +tags: [tr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_turkish_cp2_pipeline` is a Turkish model originally trained by Kiwipirate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_cp2_pipeline_tr_5.5.0_3.0_1727025076704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_cp2_pipeline_tr_5.5.0_3.0_1727025076704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_turkish_cp2_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_turkish_cp2_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_turkish_cp2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Kiwipirate/whisper-small-tr-cp2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uzbek_with_uzbekvoice_pipeline_uz.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uzbek_with_uzbekvoice_pipeline_uz.md new file mode 100644 index 00000000000000..3b8f4d20fbdc81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uzbek_with_uzbekvoice_pipeline_uz.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Uzbek whisper_small_uzbek_with_uzbekvoice_pipeline pipeline WhisperForCTC from aslon1213 +author: John Snow Labs +name: whisper_small_uzbek_with_uzbekvoice_pipeline +date: 2024-09-22 +tags: [uz, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: uz +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_uzbek_with_uzbekvoice_pipeline` is a Uzbek model originally trained by aslon1213. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_uzbek_with_uzbekvoice_pipeline_uz_5.5.0_3.0_1726984917682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_uzbek_with_uzbekvoice_pipeline_uz_5.5.0_3.0_1726984917682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_uzbek_with_uzbekvoice_pipeline", lang = "uz") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_uzbek_with_uzbekvoice_pipeline", lang = "uz") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_uzbek_with_uzbekvoice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|uz| +|Size:|1.7 GB| + +## References + +https://huggingface.co/aslon1213/whisper-small-uz-with-uzbekvoice + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uzbek_with_uzbekvoice_uz.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uzbek_with_uzbekvoice_uz.md new file mode 100644 index 00000000000000..10b019058b5d98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_uzbek_with_uzbekvoice_uz.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Uzbek whisper_small_uzbek_with_uzbekvoice WhisperForCTC from aslon1213 +author: John Snow Labs +name: whisper_small_uzbek_with_uzbekvoice +date: 2024-09-22 +tags: [uz, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: uz +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_uzbek_with_uzbekvoice` is a Uzbek model originally trained by aslon1213. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_uzbek_with_uzbekvoice_uz_5.5.0_3.0_1726984832480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_uzbek_with_uzbekvoice_uz_5.5.0_3.0_1726984832480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_uzbek_with_uzbekvoice","uz") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_uzbek_with_uzbekvoice", "uz") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_uzbek_with_uzbekvoice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|uz| +|Size:|1.7 GB| + +## References + +https://huggingface.co/aslon1213/whisper-small-uz-with-uzbekvoice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tamil_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tamil_v2_en.md new file mode 100644 index 00000000000000..7cbaf241c2121f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tamil_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tamil_v2 WhisperForCTC from tamilnlpSLIIT +author: John Snow Labs +name: whisper_tamil_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tamil_v2` is a English model originally trained by tamilnlpSLIIT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tamil_v2_en_5.5.0_3.0_1726994139593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tamil_v2_en_5.5.0_3.0_1726994139593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tamil_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tamil_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tamil_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/tamilnlpSLIIT/whisper-ta-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_finetuned_minds14_english_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_finetuned_minds14_english_v2_en.md new file mode 100644 index 00000000000000..a17df38f3bee9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_finetuned_minds14_english_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_finetuned_minds14_english_v2 WhisperForCTC from vineetsharma +author: John Snow Labs +name: whisper_tiny_finetuned_minds14_english_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_finetuned_minds14_english_v2` is a English model originally trained by vineetsharma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_english_v2_en_5.5.0_3.0_1726981231933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_english_v2_en_5.5.0_3.0_1726981231933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_finetuned_minds14_english_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_finetuned_minds14_english_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_finetuned_minds14_english_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/vineetsharma/whisper-tiny-finetuned-minds14-en-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_hewliyang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_hewliyang_pipeline_en.md new file mode 100644 index 00000000000000..164fe8f34b3e97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_hewliyang_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_hewliyang_pipeline pipeline WhisperForCTC from hewliyang +author: John Snow Labs +name: whisper_tiny_minds14_hewliyang_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_hewliyang_pipeline` is a English model originally trained by hewliyang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_hewliyang_pipeline_en_5.5.0_3.0_1727022290284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_hewliyang_pipeline_en_5.5.0_3.0_1727022290284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_hewliyang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_hewliyang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_hewliyang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/hewliyang/whisper-tiny-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_olegs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_olegs_pipeline_en.md new file mode 100644 index 00000000000000..48c97c73aa499e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_olegs_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_olegs_pipeline pipeline WhisperForCTC from olegs +author: John Snow Labs +name: whisper_tiny_minds14_olegs_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_olegs_pipeline` is a English model originally trained by olegs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_olegs_pipeline_en_5.5.0_3.0_1726994160440.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_olegs_pipeline_en_5.5.0_3.0_1726994160440.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_olegs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_olegs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_olegs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/olegs/whisper-tiny-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-withinapps_ndd_mrbs_test_tags_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-22-withinapps_ndd_mrbs_test_tags_cwadj_en.md new file mode 100644 index 00000000000000..8ce181a1d323f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-withinapps_ndd_mrbs_test_tags_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_mrbs_test_tags_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mrbs_test_tags_cwadj +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mrbs_test_tags_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mrbs_test_tags_cwadj_en_5.5.0_3.0_1727012231607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mrbs_test_tags_cwadj_en_5.5.0_3.0_1727012231607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mrbs_test_tags_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mrbs_test_tags_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mrbs_test_tags_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mrbs_test-tags-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline_en.md new file mode 100644 index 00000000000000..ca5a9656bc457f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline pipeline XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline_en_5.5.0_3.0_1726970706114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline_en_5.5.0_3.0_1726970706114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ashrielbrian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_rupe_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_rupe_pipeline_en.md new file mode 100644 index 00000000000000..1c64bd8c88e603 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_rupe_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_rupe_pipeline pipeline XlmRoBertaForTokenClassification from RupE +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_rupe_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_rupe_pipeline` is a English model originally trained by RupE. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_rupe_pipeline_en_5.5.0_3.0_1727018823943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_rupe_pipeline_en_5.5.0_3.0_1727018823943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_rupe_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_rupe_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_rupe_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|833.0 MB| + +## References + +https://huggingface.co/RupE/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_gewissta_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_gewissta_en.md new file mode 100644 index 00000000000000..ab1ce69a1792ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_gewissta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_gewissta XlmRoBertaForTokenClassification from gewissta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_gewissta +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_gewissta` is a English model originally trained by gewissta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gewissta_en_5.5.0_3.0_1727019295325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gewissta_en_5.5.0_3.0_1727019295325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_gewissta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_gewissta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_gewissta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/gewissta/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline_en.md new file mode 100644 index 00000000000000..18df591d9e0666 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline pipeline XlmRoBertaForTokenClassification from hcy5561 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline_en_5.5.0_3.0_1727019320920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline_en_5.5.0_3.0_1727019320920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hcy5561_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/hcy5561/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline_en.md new file mode 100644 index 00000000000000..56f5cf02a99829 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline pipeline XlmRoBertaForTokenClassification from k3lana +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline` is a English model originally trained by k3lana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline_en_5.5.0_3.0_1726970528017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline_en_5.5.0_3.0_1726970528017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_k3lana_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/k3lana/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_tamil_neelrr_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_tamil_neelrr_en.md new file mode 100644 index 00000000000000..97aeb6c08f458d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_tamil_neelrr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_tamil_neelrr XlmRoBertaForTokenClassification from neelrr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_tamil_neelrr +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_tamil_neelrr` is a English model originally trained by neelrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_neelrr_en_5.5.0_3.0_1727018501487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_neelrr_en_5.5.0_3.0_1727018501487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_tamil_neelrr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_tamil_neelrr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_tamil_neelrr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|833.6 MB| + +## References + +https://huggingface.co/neelrr/xlm-roberta-base-finetuned-panx-ta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline_en.md new file mode 100644 index 00000000000000..bd7297a062980b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline pipeline XlmRoBertaForTokenClassification from iceman2434 +author: John Snow Labs +name: xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline` is a English model originally trained by iceman2434. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline_en_5.5.0_3.0_1727019297152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline_en_5.5.0_3.0_1727019297152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|805.3 MB| + +## References + +https://huggingface.co/iceman2434/xlm-roberta-base_ft_udpos213-top8lang-st + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_en.md new file mode 100644 index 00000000000000..94485777eda9c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_en_5.5.0_3.0_1727010064521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_en_5.5.0_3.0_1727010064521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|800.8 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr2e-05_seed42_kin-hau-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_en.md new file mode 100644 index 00000000000000..3fb1fb4147d91b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_en_5.5.0_3.0_1727009816875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_en_5.5.0_3.0_1727009816875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.7 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-New_VietNam-aug_insert_synonym-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_vietnam_aug_insert_w2v_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_vietnam_aug_insert_w2v_en.md new file mode 100644 index 00000000000000..9db83f8bb9f5da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_vietnam_aug_insert_w2v_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_vietnam_aug_insert_w2v XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_vietnam_aug_insert_w2v +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vietnam_aug_insert_w2v` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vietnam_aug_insert_w2v_en_5.5.0_3.0_1727009589410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vietnam_aug_insert_w2v_en_5.5.0_3.0_1727009589410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_vietnam_aug_insert_w2v","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_vietnam_aug_insert_w2v", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vietnam_aug_insert_w2v| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.0 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-VietNam-aug_insert_w2v \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline_en.md new file mode 100644 index 00000000000000..cb883535365765 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline_en_5.5.0_3.0_1727009372385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline_en_5.5.0_3.0_1727009372385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_german_trimmed_german_30000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.3 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-de-trimmed-de-30000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q1_25p_filtered_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q1_25p_filtered_en.md new file mode 100644 index 00000000000000..60395920c87ef7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q1_25p_filtered_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q1_25p_filtered RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_25p_filtered +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_25p_filtered` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_25p_filtered_en_5.5.0_3.0_1727121898228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_25p_filtered_en_5.5.0_3.0_1727121898228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q1_25p_filtered","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q1_25p_filtered","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_25p_filtered| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-25p-filtered \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q4_test_v0_3_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q4_test_v0_3_en.md new file mode 100644 index 00000000000000..85e4398cabf19b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q4_test_v0_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q4_test_v0_3 RoBertaEmbeddings from Magdk01 +author: John Snow Labs +name: 2020_q4_test_v0_3 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_test_v0_3` is a English model originally trained by Magdk01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_test_v0_3_en_5.5.0_3.0_1727080978995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_test_v0_3_en_5.5.0_3.0_1727080978995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q4_test_v0_3","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q4_test_v0_3","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_test_v0_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/Magdk01/2020_Q4_test_v0.3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q4_test_v0_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q4_test_v0_3_pipeline_en.md new file mode 100644 index 00000000000000..828db38fe05b9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q4_test_v0_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q4_test_v0_3_pipeline pipeline RoBertaEmbeddings from Magdk01 +author: John Snow Labs +name: 2020_q4_test_v0_3_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_test_v0_3_pipeline` is a English model originally trained by Magdk01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_test_v0_3_pipeline_en_5.5.0_3.0_1727081001295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_test_v0_3_pipeline_en_5.5.0_3.0_1727081001295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q4_test_v0_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q4_test_v0_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_test_v0_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/Magdk01/2020_Q4_test_v0.3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-agnews_padding0model_en.md b/docs/_posts/ahmedlone127/2024-09-23-agnews_padding0model_en.md new file mode 100644 index 00000000000000..3d2f619c520eb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-agnews_padding0model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English agnews_padding0model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: agnews_padding0model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`agnews_padding0model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/agnews_padding0model_en_5.5.0_3.0_1727108318948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/agnews_padding0model_en_5.5.0_3.0_1727108318948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("agnews_padding0model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("agnews_padding0model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|agnews_padding0model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/agnews_padding0model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_auto_and_commute_1000_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_auto_and_commute_1000_16_5_oos_en.md new file mode 100644 index 00000000000000..cd1d419cf647c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_auto_and_commute_1000_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_auto_and_commute_1000_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_auto_and_commute_1000_16_5_oos +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_auto_and_commute_1000_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_auto_and_commute_1000_16_5_oos_en_5.5.0_3.0_1727134875001.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_auto_and_commute_1000_16_5_oos_en_5.5.0_3.0_1727134875001.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_auto_and_commute_1000_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_auto_and_commute_1000_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_auto_and_commute_1000_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-auto_and_commute-1000-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-amazon_0_en.md b/docs/_posts/ahmedlone127/2024-09-23-amazon_0_en.md new file mode 100644 index 00000000000000..70f7a5ff3ccd32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-amazon_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amazon_0 DistilBertForSequenceClassification from draghicivlad +author: John Snow Labs +name: amazon_0 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_0` is a English model originally trained by draghicivlad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_0_en_5.5.0_3.0_1727108745660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_0_en_5.5.0_3.0_1727108745660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("amazon_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("amazon_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/draghicivlad/amazon_0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-apps2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-apps2_pipeline_en.md new file mode 100644 index 00000000000000..deb3a892ed1b9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-apps2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English apps2_pipeline pipeline DistilBertForSequenceClassification from Frana9812 +author: John Snow Labs +name: apps2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`apps2_pipeline` is a English model originally trained by Frana9812. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/apps2_pipeline_en_5.5.0_3.0_1727094085604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/apps2_pipeline_en_5.5.0_3.0_1727094085604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("apps2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("apps2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|apps2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Frana9812/apps2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-araroberta_luxembourgish_ar.md b/docs/_posts/ahmedlone127/2024-09-23-araroberta_luxembourgish_ar.md new file mode 100644 index 00000000000000..6cf6f43ce46131 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-araroberta_luxembourgish_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic araroberta_luxembourgish RoBertaEmbeddings from reemalyami +author: John Snow Labs +name: araroberta_luxembourgish +date: 2024-09-23 +tags: [ar, open_source, onnx, embeddings, roberta] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`araroberta_luxembourgish` is a Arabic model originally trained by reemalyami. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/araroberta_luxembourgish_ar_5.5.0_3.0_1727121659638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/araroberta_luxembourgish_ar_5.5.0_3.0_1727121659638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("araroberta_luxembourgish","ar") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("araroberta_luxembourgish","ar") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|araroberta_luxembourgish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|ar| +|Size:|470.6 MB| + +## References + +https://huggingface.co/reemalyami/AraRoBERTa-LB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-augmented_model_one_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-augmented_model_one_pipeline_en.md new file mode 100644 index 00000000000000..dc3ea507e4b441 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-augmented_model_one_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English augmented_model_one_pipeline pipeline DistilBertForSequenceClassification from LeonardoFettucciari +author: John Snow Labs +name: augmented_model_one_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`augmented_model_one_pipeline` is a English model originally trained by LeonardoFettucciari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/augmented_model_one_pipeline_en_5.5.0_3.0_1727087121223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/augmented_model_one_pipeline_en_5.5.0_3.0_1727087121223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("augmented_model_one_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("augmented_model_one_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|augmented_model_one_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeonardoFettucciari/augmented_model_one + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-autotrain_qr7os_gstst_en.md b/docs/_posts/ahmedlone127/2024-09-23-autotrain_qr7os_gstst_en.md new file mode 100644 index 00000000000000..66ed9c5e830ef9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-autotrain_qr7os_gstst_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_qr7os_gstst RoBertaForSequenceClassification from Nishthaa321 +author: John Snow Labs +name: autotrain_qr7os_gstst +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_qr7os_gstst` is a English model originally trained by Nishthaa321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_qr7os_gstst_en_5.5.0_3.0_1727135288651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_qr7os_gstst_en_5.5.0_3.0_1727135288651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("autotrain_qr7os_gstst","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("autotrain_qr7os_gstst", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_qr7os_gstst| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/Nishthaa321/autotrain-qr7os-gstst \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-autotrain_qr7os_gstst_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-autotrain_qr7os_gstst_pipeline_en.md new file mode 100644 index 00000000000000..6a1aefdad443bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-autotrain_qr7os_gstst_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_qr7os_gstst_pipeline pipeline RoBertaForSequenceClassification from Nishthaa321 +author: John Snow Labs +name: autotrain_qr7os_gstst_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_qr7os_gstst_pipeline` is a English model originally trained by Nishthaa321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_qr7os_gstst_pipeline_en_5.5.0_3.0_1727135312742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_qr7os_gstst_pipeline_en_5.5.0_3.0_1727135312742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_qr7os_gstst_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_qr7os_gstst_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_qr7os_gstst_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/Nishthaa321/autotrain-qr7os-gstst + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-autotrain_xlmroberta_iuexist_50302120401_en.md b/docs/_posts/ahmedlone127/2024-09-23-autotrain_xlmroberta_iuexist_50302120401_en.md new file mode 100644 index 00000000000000..6e66a131d647fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-autotrain_xlmroberta_iuexist_50302120401_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_xlmroberta_iuexist_50302120401 XlmRoBertaForSequenceClassification from Muhsabrys +author: John Snow Labs +name: autotrain_xlmroberta_iuexist_50302120401 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_xlmroberta_iuexist_50302120401` is a English model originally trained by Muhsabrys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_xlmroberta_iuexist_50302120401_en_5.5.0_3.0_1727125993259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_xlmroberta_iuexist_50302120401_en_5.5.0_3.0_1727125993259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("autotrain_xlmroberta_iuexist_50302120401","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("autotrain_xlmroberta_iuexist_50302120401", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_xlmroberta_iuexist_50302120401| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Muhsabrys/autotrain-xlmroberta-iuexist-50302120401 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bds_en.md b/docs/_posts/ahmedlone127/2024-09-23-bds_en.md new file mode 100644 index 00000000000000..de32c0d32a35c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bds_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bds DistilBertForSequenceClassification from LogischeIP +author: John Snow Labs +name: bds +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bds` is a English model originally trained by LogischeIP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bds_en_5.5.0_3.0_1727087001288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bds_en_5.5.0_3.0_1727087001288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bds","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bds", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bds| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LogischeIP/BDS \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_en.md new file mode 100644 index 00000000000000..e479b908877e30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_squad_v1_1_portuguese_ibama_v0_1 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_cased_squad_v1_1_portuguese_ibama_v0_1 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_squad_v1_1_portuguese_ibama_v0_1` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_en_5.5.0_3.0_1727127794172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_en_5.5.0_3.0_1727127794172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_squad_v1_1_portuguese_ibama_v0_1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_squad_v1_1_portuguese_ibama_v0_1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_squad_v1_1_portuguese_ibama_v0_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-cased-squad-v1.1-pt_IBAMA_v0.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline_en.md new file mode 100644 index 00000000000000..e8b095a1d7b9de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline_en_5.5.0_3.0_1727127815103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline_en_5.5.0_3.0_1727127815103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_squad_v1_1_portuguese_ibama_v0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-cased-squad-v1.1-pt_IBAMA_v0.1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_finetuned_qa_tar_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_finetuned_qa_tar_en.md new file mode 100644 index 00000000000000..17e36efcc35109 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_finetuned_qa_tar_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_spanish_wwm_cased_finetuned_qa_tar BertForQuestionAnswering from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_cased_finetuned_qa_tar +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_finetuned_qa_tar` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_tar_en_5.5.0_3.0_1727127872293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_tar_en_5.5.0_3.0_1727127872293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_spanish_wwm_cased_finetuned_qa_tar","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_spanish_wwm_cased_finetuned_qa_tar", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_finetuned_qa_tar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased-finetuned-qa-tar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline_en.md new file mode 100644 index 00000000000000..b959cc37a54a63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline pipeline BertForQuestionAnswering from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline_en_5.5.0_3.0_1727127893217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline_en_5.5.0_3.0_1727127893217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_finetuned_qa_tar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased-finetuned-qa-tar + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_en.md new file mode 100644 index 00000000000000..ba063d3f94f42b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_en_5.5.0_3.0_1727127885221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_en_5.5.0_3.0_1727127885221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.220240904191111 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_en.md new file mode 100644 index 00000000000000..57ca8ab1caaa15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_en_5.5.0_3.0_1727128019484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_en_5.5.0_3.0_1727128019484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240914220642 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline_en.md new file mode 100644 index 00000000000000..24a1e540c6aa7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline_en_5.5.0_3.0_1727128040724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline_en_5.5.0_3.0_1727128040724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240914220642_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240914220642 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_en.md new file mode 100644 index 00000000000000..2217b154314396 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_en_5.5.0_3.0_1727127747366.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_en_5.5.0_3.0_1727127747366.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915001955 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline_en.md new file mode 100644 index 00000000000000..cdee93cda10ecf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline_en_5.5.0_3.0_1727127770037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline_en_5.5.0_3.0_1727127770037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915001955_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915001955 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..08f9904ba2f2f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727127772746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727127772746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.45-b-32-lr-1.2e-06-dp-0.3-ss-300-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_en.md new file mode 100644 index 00000000000000..38821d503e02e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_en_5.5.0_3.0_1727050170173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_en_5.5.0_3.0_1727050170173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.8-lr-1e-05-wd-0.001-dp-0.99999-ss-140000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..59042b0c3dfcc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727049969744.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727049969744.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-10.0-lr-4e-07-wd-1e-05-dp-1.0-ss-100-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline_en.md new file mode 100644 index 00000000000000..3bd744f5790d2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline_en_5.5.0_3.0_1727050282501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline_en_5.5.0_3.0_1727050282501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-05-wd-0.001-dp-0.99999-ss-900 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en.md new file mode 100644 index 00000000000000..c951f366efd8ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en_5.5.0_3.0_1727049603752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline_en_5.5.0_3.0_1727049603752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-100 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..b5d4c0f997a4cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727128027683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727128027683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-4e-07-wd-1e-05-dp-1.0-ss-100-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..60df331cab47d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727128050307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727128050307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-4e-07-wd-1e-05-dp-1.0-ss-100-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_368items_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_368items_en.md new file mode 100644 index 00000000000000..2331595dd7d658 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_368items_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_368items BertForSequenceClassification from luminar9 +author: John Snow Labs +name: bert_finetuned_368items +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_368items` is a English model originally trained by luminar9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_368items_en_5.5.0_3.0_1727095940278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_368items_en_5.5.0_3.0_1727095940278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_finetuned_368items","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_finetuned_368items", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_368items| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/luminar9/bert-finetuned-368items \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_ner_asos_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_ner_asos_uncased_pipeline_en.md new file mode 100644 index 00000000000000..96a6d534921933 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_ner_asos_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_asos_uncased_pipeline pipeline BertForTokenClassification from vantagediscovery +author: John Snow Labs +name: bert_finetuned_ner_asos_uncased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_asos_uncased_pipeline` is a English model originally trained by vantagediscovery. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_asos_uncased_pipeline_en_5.5.0_3.0_1727111847300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_asos_uncased_pipeline_en_5.5.0_3.0_1727111847300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_asos_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_asos_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_asos_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/vantagediscovery/bert-finetuned-ner-asos-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuning_demo_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuning_demo_en.md new file mode 100644 index 00000000000000..296eefc00b892f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuning_demo_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuning_demo BertForQuestionAnswering from internetoftim +author: John Snow Labs +name: bert_finetuning_demo +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuning_demo` is a English model originally trained by internetoftim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuning_demo_en_5.5.0_3.0_1727128443496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuning_demo_en_5.5.0_3.0_1727128443496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuning_demo","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuning_demo", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuning_demo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|797.5 MB| + +## References + +https://huggingface.co/internetoftim/BERT-Finetuning-Demo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuning_demo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuning_demo_pipeline_en.md new file mode 100644 index 00000000000000..ccc91947ec1731 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuning_demo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuning_demo_pipeline pipeline BertForQuestionAnswering from internetoftim +author: John Snow Labs +name: bert_finetuning_demo_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuning_demo_pipeline` is a English model originally trained by internetoftim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuning_demo_pipeline_en_5.5.0_3.0_1727128670577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuning_demo_pipeline_en_5.5.0_3.0_1727128670577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuning_demo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuning_demo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuning_demo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|797.5 MB| + +## References + +https://huggingface.co/internetoftim/BERT-Finetuning-Demo + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_policy_number_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_policy_number_en.md new file mode 100644 index 00000000000000..2181c47e0a3072 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_policy_number_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_finetuned_policy_number BertForQuestionAnswering from Ineract +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_finetuned_policy_number +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_finetuned_policy_number` is a English model originally trained by Ineract. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_policy_number_en_5.5.0_3.0_1727049964871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_policy_number_en_5.5.0_3.0_1727049964871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_finetuned_policy_number","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_finetuned_policy_number", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_finetuned_policy_number| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Ineract/bert-large-uncased-whole-word-masking-finetuned-policy-number \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_en.md new file mode 100644 index 00000000000000..10390f16cc927d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_squad2_train_data_unmodified BertForQuestionAnswering from mdzrg +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_squad2_train_data_unmodified +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_squad2_train_data_unmodified` is a English model originally trained by mdzrg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_en_5.5.0_3.0_1727128644295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_en_5.5.0_3.0_1727128644295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_squad2_train_data_unmodified","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_squad2_train_data_unmodified", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_squad2_train_data_unmodified| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/mdzrg/bert-large-uncased-whole-word-masking-squad2-train-data-unmodified \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline_en.md new file mode 100644 index 00000000000000..4936e8fb2dbbd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline pipeline BertForQuestionAnswering from mdzrg +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline` is a English model originally trained by mdzrg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline_en_5.5.0_3.0_1727128703878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline_en_5.5.0_3.0_1727128703878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_squad2_train_data_unmodified_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/mdzrg/bert-large-uncased-whole-word-masking-squad2-train-data-unmodified + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_massa_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-23-bert_massa_pipeline_es.md new file mode 100644 index 00000000000000..ea4fd44444ea17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_massa_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bert_massa_pipeline pipeline XlmRoBertaForSequenceClassification from nmarinnn +author: John Snow Labs +name: bert_massa_pipeline +date: 2024-09-23 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_massa_pipeline` is a Castilian, Spanish model originally trained by nmarinnn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_massa_pipeline_es_5.5.0_3.0_1727126157927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_massa_pipeline_es_5.5.0_3.0_1727126157927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_massa_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_massa_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_massa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/nmarinnn/bert-massa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_medquad_500_tokens_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_medquad_500_tokens_en.md new file mode 100644 index 00000000000000..f56772eb9bd36d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_medquad_500_tokens_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_medquad_500_tokens BertForQuestionAnswering from DataScientist1122 +author: John Snow Labs +name: bert_medquad_500_tokens +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_medquad_500_tokens` is a English model originally trained by DataScientist1122. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_medquad_500_tokens_en_5.5.0_3.0_1727128486045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_medquad_500_tokens_en_5.5.0_3.0_1727128486045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_medquad_500_tokens","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_medquad_500_tokens", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_medquad_500_tokens| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/DataScientist1122/BERT_MedQuad_500_tokens \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_medquad_500_tokens_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_medquad_500_tokens_pipeline_en.md new file mode 100644 index 00000000000000..40bcb3dc8a5858 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_medquad_500_tokens_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_medquad_500_tokens_pipeline pipeline BertForQuestionAnswering from DataScientist1122 +author: John Snow Labs +name: bert_medquad_500_tokens_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_medquad_500_tokens_pipeline` is a English model originally trained by DataScientist1122. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_medquad_500_tokens_pipeline_en_5.5.0_3.0_1727128508041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_medquad_500_tokens_pipeline_en_5.5.0_3.0_1727128508041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_medquad_500_tokens_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_medquad_500_tokens_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_medquad_500_tokens_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/DataScientist1122/BERT_MedQuad_500_tokens + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_poop_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_poop_0_pipeline_en.md new file mode 100644 index 00000000000000..de7bb25bcc4c71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_poop_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_poop_0_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_poop_0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_poop_0_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_poop_0_pipeline_en_5.5.0_3.0_1727082124408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_poop_0_pipeline_en_5.5.0_3.0_1727082124408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_poop_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_poop_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_poop_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_poop_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_portuguese_squad_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_portuguese_squad_en.md new file mode 100644 index 00000000000000..b68d4f49e9956a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_portuguese_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_portuguese_squad BertForQuestionAnswering from lfcc +author: John Snow Labs +name: bert_portuguese_squad +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_portuguese_squad` is a English model originally trained by lfcc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_portuguese_squad_en_5.5.0_3.0_1727127915089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_portuguese_squad_en_5.5.0_3.0_1727127915089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_portuguese_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_portuguese_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_portuguese_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/lfcc/bert-portuguese-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_portuguese_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_portuguese_squad_pipeline_en.md new file mode 100644 index 00000000000000..52f236944e0f26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_portuguese_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_portuguese_squad_pipeline pipeline BertForQuestionAnswering from lfcc +author: John Snow Labs +name: bert_portuguese_squad_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_portuguese_squad_pipeline` is a English model originally trained by lfcc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_portuguese_squad_pipeline_en_5.5.0_3.0_1727127936015.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_portuguese_squad_pipeline_en_5.5.0_3.0_1727127936015.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_portuguese_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_portuguese_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_portuguese_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/lfcc/bert-portuguese-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bertin_roberta_base_spanish_finetuned_xnli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bertin_roberta_base_spanish_finetuned_xnli_pipeline_en.md new file mode 100644 index 00000000000000..9fbf6aa222e35a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bertin_roberta_base_spanish_finetuned_xnli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertin_roberta_base_spanish_finetuned_xnli_pipeline pipeline RoBertaForSequenceClassification from dccuchile +author: John Snow Labs +name: bertin_roberta_base_spanish_finetuned_xnli_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertin_roberta_base_spanish_finetuned_xnli_pipeline` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertin_roberta_base_spanish_finetuned_xnli_pipeline_en_5.5.0_3.0_1727135255172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertin_roberta_base_spanish_finetuned_xnli_pipeline_en_5.5.0_3.0_1727135255172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertin_roberta_base_spanish_finetuned_xnli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertin_roberta_base_spanish_finetuned_xnli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertin_roberta_base_spanish_finetuned_xnli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.5 MB| + +## References + +https://huggingface.co/dccuchile/bertin-roberta-base-spanish-finetuned-xnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bge_large_repmus_cross_entropy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bge_large_repmus_cross_entropy_pipeline_en.md new file mode 100644 index 00000000000000..c3f7cbbd10166a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bge_large_repmus_cross_entropy_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_repmus_cross_entropy_pipeline pipeline BGEEmbeddings from tessimago +author: John Snow Labs +name: bge_large_repmus_cross_entropy_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_repmus_cross_entropy_pipeline` is a English model originally trained by tessimago. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_repmus_cross_entropy_pipeline_en_5.5.0_3.0_1727106012986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_repmus_cross_entropy_pipeline_en_5.5.0_3.0_1727106012986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_repmus_cross_entropy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_repmus_cross_entropy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_repmus_cross_entropy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tessimago/bge-large-repmus-cross_entropy + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bnt5_101_en.md b/docs/_posts/ahmedlone127/2024-09-23-bnt5_101_en.md new file mode 100644 index 00000000000000..e765143650e742 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bnt5_101_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bnt5_101 T5Transformer from kawsarahmd +author: John Snow Labs +name: bnt5_101 +date: 2024-09-23 +tags: [en, open_source, onnx, t5, question_answering, summarization, translation, text_generation] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: T5Transformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bnt5_101` is a English model originally trained by kawsarahmd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bnt5_101_en_5.5.0_3.0_1727124636646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bnt5_101_en_5.5.0_3.0_1727124636646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +t5 = T5Transformer.pretrained("bnt5_101","en") \ + .setInputCols(["document"]) \ + .setOutputCol("output") + +pipeline = Pipeline().setStages([documentAssembler, t5]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val t5 = T5Transformer.pretrained("bnt5_101", "en") + .setInputCols(Array("documents")) + .setOutputCol("output") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, t5)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bnt5_101| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[output]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/kawsarahmd/bnt5-101 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bnt5_101_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bnt5_101_pipeline_en.md new file mode 100644 index 00000000000000..09657fb91bcc2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bnt5_101_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bnt5_101_pipeline pipeline T5Transformer from kawsarahmd +author: John Snow Labs +name: bnt5_101_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bnt5_101_pipeline` is a English model originally trained by kawsarahmd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bnt5_101_pipeline_en_5.5.0_3.0_1727124684796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bnt5_101_pipeline_en_5.5.0_3.0_1727124684796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bnt5_101_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bnt5_101_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bnt5_101_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/kawsarahmd/bnt5-101 + +## Included Models + +- DocumentAssembler +- T5Transformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bpe_selfies_pubchem_shard00_70k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bpe_selfies_pubchem_shard00_70k_pipeline_en.md new file mode 100644 index 00000000000000..7f620799b0847a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bpe_selfies_pubchem_shard00_70k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bpe_selfies_pubchem_shard00_70k_pipeline pipeline RoBertaEmbeddings from seyonec +author: John Snow Labs +name: bpe_selfies_pubchem_shard00_70k_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bpe_selfies_pubchem_shard00_70k_pipeline` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_70k_pipeline_en_5.5.0_3.0_1727092340901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_70k_pipeline_en_5.5.0_3.0_1727092340901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bpe_selfies_pubchem_shard00_70k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bpe_selfies_pubchem_shard00_70k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bpe_selfies_pubchem_shard00_70k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.5 MB| + +## References + +https://huggingface.co/seyonec/BPE_SELFIES_PubChem_shard00_70k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_eitanli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_eitanli_pipeline_en.md new file mode 100644 index 00000000000000..c205ae5a6cb673 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_eitanli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_eitanli_pipeline pipeline RoBertaEmbeddings from Eitanli +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_eitanli_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_eitanli_pipeline` is a English model originally trained by Eitanli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eitanli_pipeline_en_5.5.0_3.0_1727080664803.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eitanli_pipeline_en_5.5.0_3.0_1727080664803.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_eitanli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_eitanli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_eitanli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Eitanli/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_nerdygene_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_nerdygene_en.md new file mode 100644 index 00000000000000..e7a36473f62ea3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_nerdygene_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_nerdygene RoBertaEmbeddings from nerdygene +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_nerdygene +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_nerdygene` is a English model originally trained by nerdygene. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_nerdygene_en_5.5.0_3.0_1727121581035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_nerdygene_en_5.5.0_3.0_1727121581035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_nerdygene","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_nerdygene","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_nerdygene| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/nerdygene/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_nerdygene_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_nerdygene_pipeline_en.md new file mode 100644 index 00000000000000..1718993a6f2df1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_nerdygene_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_nerdygene_pipeline pipeline RoBertaEmbeddings from nerdygene +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_nerdygene_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_nerdygene_pipeline` is a English model originally trained by nerdygene. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_nerdygene_pipeline_en_5.5.0_3.0_1727121595562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_nerdygene_pipeline_en_5.5.0_3.0_1727121595562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_nerdygene_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_nerdygene_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_nerdygene_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/nerdygene/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_brianrigoni_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_brianrigoni_en.md new file mode 100644 index 00000000000000..ff879bf7908d5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_brianrigoni_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_brianrigoni DistilBertForSequenceClassification from brianrigoni +author: John Snow Labs +name: burmese_awesome_model_brianrigoni +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_brianrigoni` is a English model originally trained by brianrigoni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_brianrigoni_en_5.5.0_3.0_1727097055185.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_brianrigoni_en_5.5.0_3.0_1727097055185.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_brianrigoni","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_brianrigoni", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_brianrigoni| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/brianrigoni/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_jbar646_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_jbar646_pipeline_en.md new file mode 100644 index 00000000000000..08d838562f25de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_jbar646_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_jbar646_pipeline pipeline DistilBertForSequenceClassification from jbar646 +author: John Snow Labs +name: burmese_awesome_model_jbar646_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_jbar646_pipeline` is a English model originally trained by jbar646. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jbar646_pipeline_en_5.5.0_3.0_1727059256013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jbar646_pipeline_en_5.5.0_3.0_1727059256013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_jbar646_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_jbar646_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_jbar646_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jbar646/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nandini54_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nandini54_pipeline_en.md new file mode 100644 index 00000000000000..044dc6952be30b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nandini54_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_nandini54_pipeline pipeline DistilBertForSequenceClassification from Nandini54 +author: John Snow Labs +name: burmese_awesome_model_nandini54_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_nandini54_pipeline` is a English model originally trained by Nandini54. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_nandini54_pipeline_en_5.5.0_3.0_1727108304530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_nandini54_pipeline_en_5.5.0_3.0_1727108304530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_nandini54_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_nandini54_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_nandini54_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Nandini54/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nataliacristina_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nataliacristina_en.md new file mode 100644 index 00000000000000..320a1fda96730e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nataliacristina_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_nataliacristina DistilBertForSequenceClassification from nataliacristina +author: John Snow Labs +name: burmese_awesome_model_nataliacristina +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_nataliacristina` is a English model originally trained by nataliacristina. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_nataliacristina_en_5.5.0_3.0_1727073747794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_nataliacristina_en_5.5.0_3.0_1727073747794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_nataliacristina","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_nataliacristina", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_nataliacristina| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nataliacristina/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_omertnks_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_omertnks_en.md new file mode 100644 index 00000000000000..6c0cecb5f5dde6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_omertnks_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_omertnks DistilBertForSequenceClassification from omertnks +author: John Snow Labs +name: burmese_awesome_model_omertnks +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_omertnks` is a English model originally trained by omertnks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_omertnks_en_5.5.0_3.0_1727110600913.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_omertnks_en_5.5.0_3.0_1727110600913.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_omertnks","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_omertnks", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_omertnks| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/omertnks/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_efar98_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_efar98_pipeline_en.md new file mode 100644 index 00000000000000..486c2d30796129 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_efar98_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_efar98_pipeline pipeline DistilBertForTokenClassification from Efar98 +author: John Snow Labs +name: burmese_awesome_wnut_model_efar98_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_efar98_pipeline` is a English model originally trained by Efar98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_efar98_pipeline_en_5.5.0_3.0_1727065452944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_efar98_pipeline_en_5.5.0_3.0_1727065452944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_model_efar98_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_model_efar98_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_efar98_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Efar98/my_awesome_wnut_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_mandel94_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_mandel94_en.md new file mode 100644 index 00000000000000..dd5de948ff00f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_mandel94_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_mandel94 DistilBertForTokenClassification from mandel94 +author: John Snow Labs +name: burmese_awesome_wnut_model_mandel94 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_mandel94` is a English model originally trained by mandel94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_mandel94_en_5.5.0_3.0_1727120665317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_mandel94_en_5.5.0_3.0_1727120665317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_mandel94","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_mandel94", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_mandel94| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/mandel94/my_awesome_wnut_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_mandel94_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_mandel94_pipeline_en.md new file mode 100644 index 00000000000000..83024f907fd6a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_mandel94_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_mandel94_pipeline pipeline DistilBertForTokenClassification from mandel94 +author: John Snow Labs +name: burmese_awesome_wnut_model_mandel94_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_mandel94_pipeline` is a English model originally trained by mandel94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_mandel94_pipeline_en_5.5.0_3.0_1727120677223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_mandel94_pipeline_en_5.5.0_3.0_1727120677223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_model_mandel94_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_model_mandel94_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_mandel94_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/mandel94/my_awesome_wnut_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-canbert_en.md b/docs/_posts/ahmedlone127/2024-09-23-canbert_en.md new file mode 100644 index 00000000000000..a34f2f1fc0ea10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-canbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English canbert RoBertaEmbeddings from ebelenwaf +author: John Snow Labs +name: canbert +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`canbert` is a English model originally trained by ebelenwaf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/canbert_en_5.5.0_3.0_1727056582415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/canbert_en_5.5.0_3.0_1727056582415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("canbert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("canbert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|canbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.4 MB| + +## References + +https://huggingface.co/ebelenwaf/canbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cebfil_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-cebfil_roberta_pipeline_en.md new file mode 100644 index 00000000000000..c2c13f601879a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cebfil_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cebfil_roberta_pipeline pipeline RoBertaEmbeddings from jfernandez +author: John Snow Labs +name: cebfil_roberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cebfil_roberta_pipeline` is a English model originally trained by jfernandez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cebfil_roberta_pipeline_en_5.5.0_3.0_1727057025789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cebfil_roberta_pipeline_en_5.5.0_3.0_1727057025789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cebfil_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cebfil_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cebfil_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.5 MB| + +## References + +https://huggingface.co/jfernandez/cebfil-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-code_search_codebert_base_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-code_search_codebert_base_2_pipeline_en.md new file mode 100644 index 00000000000000..a304601fd13fdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-code_search_codebert_base_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English code_search_codebert_base_2_pipeline pipeline RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_2_pipeline` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_2_pipeline_en_5.5.0_3.0_1727081625835.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_2_pipeline_en_5.5.0_3.0_1727081625835.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("code_search_codebert_base_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("code_search_codebert_base_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-coha1920s_en.md b/docs/_posts/ahmedlone127/2024-09-23-coha1920s_en.md new file mode 100644 index 00000000000000..f965f782b36426 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-coha1920s_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English coha1920s RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1920s +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1920s` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1920s_en_5.5.0_3.0_1727121976760.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1920s_en_5.5.0_3.0_1727121976760.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("coha1920s","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("coha1920s","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1920s| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.4 MB| + +## References + +https://huggingface.co/simonmun/COHA1920s \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-coha1920s_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-coha1920s_pipeline_en.md new file mode 100644 index 00000000000000..9a5de14eb39629 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-coha1920s_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English coha1920s_pipeline pipeline RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1920s_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1920s_pipeline` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1920s_pipeline_en_5.5.0_3.0_1727121990916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1920s_pipeline_en_5.5.0_3.0_1727121990916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("coha1920s_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("coha1920s_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1920s_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/simonmun/COHA1920s + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-coha1930s_en.md b/docs/_posts/ahmedlone127/2024-09-23-coha1930s_en.md new file mode 100644 index 00000000000000..276548f45c575e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-coha1930s_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English coha1930s RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1930s +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1930s` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1930s_en_5.5.0_3.0_1727121606144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1930s_en_5.5.0_3.0_1727121606144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("coha1930s","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("coha1930s","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1930s| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.8 MB| + +## References + +https://huggingface.co/simonmun/COHA1930s \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-coha1930s_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-coha1930s_pipeline_en.md new file mode 100644 index 00000000000000..7d693a28c08f97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-coha1930s_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English coha1930s_pipeline pipeline RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1930s_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1930s_pipeline` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1930s_pipeline_en_5.5.0_3.0_1727121620775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1930s_pipeline_en_5.5.0_3.0_1727121620775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("coha1930s_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("coha1930s_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1930s_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.9 MB| + +## References + +https://huggingface.co/simonmun/COHA1930s + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr23_seed1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr23_seed1_pipeline_en.md new file mode 100644 index 00000000000000..9e64ec18c6eb0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr23_seed1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr23_seed1_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr23_seed1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr23_seed1_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr23_seed1_pipeline_en_5.5.0_3.0_1727135614447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr23_seed1_pipeline_en_5.5.0_3.0_1727135614447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr23_seed1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr23_seed1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr23_seed1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr23-seed1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr25_seed4_en.md b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr25_seed4_en.md new file mode 100644 index 00000000000000..12390a1fea6691 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr25_seed4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr25_seed4 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr25_seed4 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr25_seed4` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr25_seed4_en_5.5.0_3.0_1727134691420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr25_seed4_en_5.5.0_3.0_1727134691420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr25_seed4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr25_seed4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr25_seed4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr25-seed4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_en.md b/docs/_posts/ahmedlone127/2024-09-23-correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_en.md new file mode 100644 index 00000000000000..88adf4a4682744 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19 BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_en_5.5.0_3.0_1727111269967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_en_5.5.0_3.0_1727111269967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_all_01_03_2022-15_52_19 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline_en.md new file mode 100644 index 00000000000000..37998785e19abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline pipeline RoBertaEmbeddings from ltuzova +author: John Snow Labs +name: dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline` is a English model originally trained by ltuzova. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline_en_5.5.0_3.0_1727122125631.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline_en_5.5.0_3.0_1727122125631.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dapt_plus_tapt_helpfulness_base_pretraining_model_ltuzova_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/ltuzova/dapt_plus_tapt_helpfulness_base_pretraining_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-deepset_bert_base_cased_squad2_orkg_which_5e_05_en.md b/docs/_posts/ahmedlone127/2024-09-23-deepset_bert_base_cased_squad2_orkg_which_5e_05_en.md new file mode 100644 index 00000000000000..27ee3b783a84d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-deepset_bert_base_cased_squad2_orkg_which_5e_05_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English deepset_bert_base_cased_squad2_orkg_which_5e_05 BertForQuestionAnswering from Moussab +author: John Snow Labs +name: deepset_bert_base_cased_squad2_orkg_which_5e_05 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deepset_bert_base_cased_squad2_orkg_which_5e_05` is a English model originally trained by Moussab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_which_5e_05_en_5.5.0_3.0_1727070757142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_which_5e_05_en_5.5.0_3.0_1727070757142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("deepset_bert_base_cased_squad2_orkg_which_5e_05","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("deepset_bert_base_cased_squad2_orkg_which_5e_05", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepset_bert_base_cased_squad2_orkg_which_5e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/Moussab/deepset_bert-base-cased-squad2-orkg-which-5e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline_en.md new file mode 100644 index 00000000000000..0effd7df1db152 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline pipeline BertForQuestionAnswering from Moussab +author: John Snow Labs +name: deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline` is a English model originally trained by Moussab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline_en_5.5.0_3.0_1727070776049.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline_en_5.5.0_3.0_1727070776049.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepset_bert_base_cased_squad2_orkg_which_5e_05_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Moussab/deepset_bert-base-cased-squad2-orkg-which-5e-05 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-denilsenaxel_xlm_roberta_finetuned_language_detection_en.md b/docs/_posts/ahmedlone127/2024-09-23-denilsenaxel_xlm_roberta_finetuned_language_detection_en.md new file mode 100644 index 00000000000000..a31b7e78dbca6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-denilsenaxel_xlm_roberta_finetuned_language_detection_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English denilsenaxel_xlm_roberta_finetuned_language_detection XlmRoBertaForSequenceClassification from DenilsenAxel +author: John Snow Labs +name: denilsenaxel_xlm_roberta_finetuned_language_detection +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`denilsenaxel_xlm_roberta_finetuned_language_detection` is a English model originally trained by DenilsenAxel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/denilsenaxel_xlm_roberta_finetuned_language_detection_en_5.5.0_3.0_1727126352193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/denilsenaxel_xlm_roberta_finetuned_language_detection_en_5.5.0_3.0_1727126352193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("denilsenaxel_xlm_roberta_finetuned_language_detection","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("denilsenaxel_xlm_roberta_finetuned_language_detection", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|denilsenaxel_xlm_roberta_finetuned_language_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|792.3 MB| + +## References + +https://huggingface.co/DenilsenAxel/denilsenaxel-xlm-roberta-finetuned-language-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-deproberta_v4_en.md b/docs/_posts/ahmedlone127/2024-09-23-deproberta_v4_en.md new file mode 100644 index 00000000000000..c31ca8280ee7e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-deproberta_v4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deproberta_v4 RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: deproberta_v4 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deproberta_v4` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deproberta_v4_en_5.5.0_3.0_1727135610488.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deproberta_v4_en_5.5.0_3.0_1727135610488.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("deproberta_v4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("deproberta_v4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deproberta_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/DepRoBERTa-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-deproberta_v4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-deproberta_v4_pipeline_en.md new file mode 100644 index 00000000000000..b99c92c1576aee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-deproberta_v4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deproberta_v4_pipeline pipeline RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: deproberta_v4_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deproberta_v4_pipeline` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deproberta_v4_pipeline_en_5.5.0_3.0_1727135683679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deproberta_v4_pipeline_en_5.5.0_3.0_1727135683679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deproberta_v4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deproberta_v4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deproberta_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/DepRoBERTa-v4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-diabetes_bert_two_en.md b/docs/_posts/ahmedlone127/2024-09-23-diabetes_bert_two_en.md new file mode 100644 index 00000000000000..832d787f8fdcd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-diabetes_bert_two_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English diabetes_bert_two RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: diabetes_bert_two +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`diabetes_bert_two` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/diabetes_bert_two_en_5.5.0_3.0_1727122254339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/diabetes_bert_two_en_5.5.0_3.0_1727122254339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("diabetes_bert_two","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("diabetes_bert_two","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|diabetes_bert_two| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/ubaskota/diabetes_BERT_two \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-diabetes_bert_two_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-diabetes_bert_two_pipeline_en.md new file mode 100644 index 00000000000000..c72e575d6988cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-diabetes_bert_two_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English diabetes_bert_two_pipeline pipeline RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: diabetes_bert_two_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`diabetes_bert_two_pipeline` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/diabetes_bert_two_pipeline_en_5.5.0_3.0_1727122276200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/diabetes_bert_two_pipeline_en_5.5.0_3.0_1727122276200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("diabetes_bert_two_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("diabetes_bert_two_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|diabetes_bert_two_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/ubaskota/diabetes_BERT_two + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-disaster_tweet_3_en.md b/docs/_posts/ahmedlone127/2024-09-23-disaster_tweet_3_en.md new file mode 100644 index 00000000000000..4900b42201bf5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-disaster_tweet_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English disaster_tweet_3 RoBertaForSequenceClassification from aellxx +author: John Snow Labs +name: disaster_tweet_3 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`disaster_tweet_3` is a English model originally trained by aellxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/disaster_tweet_3_en_5.5.0_3.0_1727134743885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/disaster_tweet_3_en_5.5.0_3.0_1727134743885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("disaster_tweet_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("disaster_tweet_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|disaster_tweet_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/aellxx/disaster-tweet-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-disaster_tweet_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-disaster_tweet_3_pipeline_en.md new file mode 100644 index 00000000000000..2224702088916e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-disaster_tweet_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English disaster_tweet_3_pipeline pipeline RoBertaForSequenceClassification from aellxx +author: John Snow Labs +name: disaster_tweet_3_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`disaster_tweet_3_pipeline` is a English model originally trained by aellxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/disaster_tweet_3_pipeline_en_5.5.0_3.0_1727134767965.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/disaster_tweet_3_pipeline_en_5.5.0_3.0_1727134767965.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("disaster_tweet_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("disaster_tweet_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|disaster_tweet_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/aellxx/disaster-tweet-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distil_whisper_medium_hindi_test_v2_en.md b/docs/_posts/ahmedlone127/2024-09-23-distil_whisper_medium_hindi_test_v2_en.md new file mode 100644 index 00000000000000..a9dddbd9a73498 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distil_whisper_medium_hindi_test_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English distil_whisper_medium_hindi_test_v2 WhisperForCTC from yi-ching +author: John Snow Labs +name: distil_whisper_medium_hindi_test_v2 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_whisper_medium_hindi_test_v2` is a English model originally trained by yi-ching. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_whisper_medium_hindi_test_v2_en_5.5.0_3.0_1727077585696.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_whisper_medium_hindi_test_v2_en_5.5.0_3.0_1727077585696.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("distil_whisper_medium_hindi_test_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("distil_whisper_medium_hindi_test_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_whisper_medium_hindi_test_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.8 GB| + +## References + +https://huggingface.co/yi-ching/distil-whisper-medium-hi-test-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding50model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding50model_pipeline_en.md new file mode 100644 index 00000000000000..6774137e8f5b60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding50model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_agnews_padding50model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding50model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding50model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding50model_pipeline_en_5.5.0_3.0_1727087120314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding50model_pipeline_en_5.5.0_3.0_1727087120314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_agnews_padding50model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_agnews_padding50model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding50model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding50model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_cased_hatespeech_ft_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_cased_hatespeech_ft_en.md new file mode 100644 index 00000000000000..64d300b9313990 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_cased_hatespeech_ft_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_cased_hatespeech_ft DistilBertForSequenceClassification from EgehanEralp +author: John Snow Labs +name: distilbert_base_cased_hatespeech_ft +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_hatespeech_ft` is a English model originally trained by EgehanEralp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_hatespeech_ft_en_5.5.0_3.0_1727082637367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_hatespeech_ft_en_5.5.0_3.0_1727082637367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_cased_hatespeech_ft","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_cased_hatespeech_ft", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_hatespeech_ft| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/EgehanEralp/distilbert-base-cased-hatespeech-ft \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_3epoch10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_3epoch10_pipeline_en.md new file mode 100644 index 00000000000000..1b2de858511fcf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_3epoch10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_3epoch10_pipeline pipeline DistilBertForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: distilbert_base_uncased_3epoch10_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_3epoch10_pipeline` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch10_pipeline_en_5.5.0_3.0_1727093930102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch10_pipeline_en_5.5.0_3.0_1727093930102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_3epoch10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_3epoch10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_3epoch10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dianamihalache27/distilbert-base-uncased_3epoch10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw_pipeline_en.md new file mode 100644 index 00000000000000..36c046c9b721a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw_pipeline pipeline DistilBertForSequenceClassification from Hongu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw_pipeline` is a English model originally trained by Hongu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw_pipeline_en_5.5.0_3.0_1727074058740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw_pipeline_en_5.5.0_3.0_1727074058740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Hongu/distilbert-base-uncased-finetuned-adl_hw + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cc_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cc_en.md new file mode 100644 index 00000000000000..2edf96dcf65e6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cc DistilBertForSequenceClassification from gtalibov +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cc +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cc` is a English model originally trained by gtalibov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cc_en_5.5.0_3.0_1727093714112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cc_en_5.5.0_3.0_1727093714112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/gtalibov/distilbert-base-uncased-finetuned-CC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_bobtk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_bobtk_pipeline_en.md new file mode 100644 index 00000000000000..19950c8deec632 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_bobtk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_bobtk_pipeline pipeline DistilBertForSequenceClassification from bobtk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_bobtk_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_bobtk_pipeline` is a English model originally trained by bobtk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bobtk_pipeline_en_5.5.0_3.0_1727093722353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bobtk_pipeline_en_5.5.0_3.0_1727093722353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_bobtk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_bobtk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_bobtk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/bobtk/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_cjfghk5697_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_cjfghk5697_en.md new file mode 100644 index 00000000000000..0d26be7cc437b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_cjfghk5697_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_cjfghk5697 DistilBertForSequenceClassification from cjfghk5697 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_cjfghk5697 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_cjfghk5697` is a English model originally trained by cjfghk5697. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_cjfghk5697_en_5.5.0_3.0_1727059227313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_cjfghk5697_en_5.5.0_3.0_1727059227313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_cjfghk5697","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_cjfghk5697", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_cjfghk5697| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/cjfghk5697/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_hrayrm_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_hrayrm_en.md new file mode 100644 index 00000000000000..fe8024f3d4045c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_hrayrm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_hrayrm DistilBertForSequenceClassification from HrayrM +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_hrayrm +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_hrayrm` is a English model originally trained by HrayrM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_hrayrm_en_5.5.0_3.0_1727110387998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_hrayrm_en_5.5.0_3.0_1727110387998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_hrayrm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_hrayrm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_hrayrm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/HrayrM/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_gamallo_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_gamallo_en.md new file mode 100644 index 00000000000000..94d712ee054f16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_gamallo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_gamallo DistilBertForSequenceClassification from gamallo +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_gamallo +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_gamallo` is a English model originally trained by gamallo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_gamallo_en_5.5.0_3.0_1727059375729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_gamallo_en_5.5.0_3.0_1727059375729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_gamallo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_gamallo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_gamallo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gamallo/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_poodja_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_poodja_pipeline_en.md new file mode 100644 index 00000000000000..3129e1ffbf1fe5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_poodja_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_poodja_pipeline pipeline DistilBertForSequenceClassification from Poodja +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_poodja_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_poodja_pipeline` is a English model originally trained by Poodja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_poodja_pipeline_en_5.5.0_3.0_1727108555603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_poodja_pipeline_en_5.5.0_3.0_1727108555603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_poodja_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_poodja_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_poodja_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Poodja/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline_en.md new file mode 100644 index 00000000000000..9722661a5f8043 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline pipeline DistilBertForSequenceClassification from Zeid-Hazboun +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline` is a English model originally trained by Zeid-Hazboun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline_en_5.5.0_3.0_1727108410486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline_en_5.5.0_3.0_1727108410486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_zeid_hazboun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Zeid-Hazboun/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_conceptos_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_conceptos_en.md new file mode 100644 index 00000000000000..420f5ab8d7914b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_conceptos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_conceptos DistilBertForSequenceClassification from jcesquivel +author: John Snow Labs +name: distilbert_base_uncased_finetuned_conceptos +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_conceptos` is a English model originally trained by jcesquivel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_conceptos_en_5.5.0_3.0_1727087112051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_conceptos_en_5.5.0_3.0_1727087112051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_conceptos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_conceptos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_conceptos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jcesquivel/distilbert-base-uncased-finetuned-conceptos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_dataset_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_dataset_en.md new file mode 100644 index 00000000000000..6353d60aaddadf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_dataset_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_dataset DistilBertForSequenceClassification from boisalai +author: John Snow Labs +name: distilbert_base_uncased_finetuned_dataset +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_dataset` is a English model originally trained by boisalai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_dataset_en_5.5.0_3.0_1727059927006.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_dataset_en_5.5.0_3.0_1727059927006.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_dataset","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_dataset", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_dataset| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/boisalai/distilbert-base-uncased-finetuned-dataset \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_disaster_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_disaster_en.md new file mode 100644 index 00000000000000..8b217fd6b2d0ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_disaster_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_disaster DistilBertForSequenceClassification from RaiRachit +author: John Snow Labs +name: distilbert_base_uncased_finetuned_disaster +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_disaster` is a English model originally trained by RaiRachit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_disaster_en_5.5.0_3.0_1727108517271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_disaster_en_5.5.0_3.0_1727108517271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_disaster","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_disaster", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_disaster| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RaiRachit/distilbert-base-uncased-finetuned-disaster \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_2hab_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_2hab_en.md new file mode 100644 index 00000000000000..65c61138751766 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_2hab_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_2hab DistilBertForSequenceClassification from 2hab +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_2hab +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_2hab` is a English model originally trained by 2hab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_2hab_en_5.5.0_3.0_1727110394824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_2hab_en_5.5.0_3.0_1727110394824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_2hab","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_2hab", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_2hab| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/2hab/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cereline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cereline_en.md new file mode 100644 index 00000000000000..a87c560d35b55d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cereline_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cereline DistilBertForSequenceClassification from cereline +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cereline +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cereline` is a English model originally trained by cereline. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cereline_en_5.5.0_3.0_1727110610693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cereline_en_5.5.0_3.0_1727110610693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cereline","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cereline", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cereline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cereline/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline_en.md new file mode 100644 index 00000000000000..64036a0c256625 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline pipeline DistilBertForSequenceClassification from dljh1214 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline` is a English model originally trained by dljh1214. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline_en_5.5.0_3.0_1727094032502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline_en_5.5.0_3.0_1727094032502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_dljh1214_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dljh1214/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_hun0520_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_hun0520_en.md new file mode 100644 index 00000000000000..fd74dd0e39de3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_hun0520_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hun0520 DistilBertForSequenceClassification from hun0520 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hun0520 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hun0520` is a English model originally trained by hun0520. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hun0520_en_5.5.0_3.0_1727093929959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hun0520_en_5.5.0_3.0_1727093929959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hun0520","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hun0520", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hun0520| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hun0520/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jeongyeom_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jeongyeom_en.md new file mode 100644 index 00000000000000..58eca5db095881 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jeongyeom_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jeongyeom DistilBertForSequenceClassification from jeongyeom +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jeongyeom +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jeongyeom` is a English model originally trained by jeongyeom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jeongyeom_en_5.5.0_3.0_1727059771316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jeongyeom_en_5.5.0_3.0_1727059771316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jeongyeom","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jeongyeom", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jeongyeom| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jeongyeom/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline_en.md new file mode 100644 index 00000000000000..eec3b59d0cf6e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline pipeline DistilBertForSequenceClassification from jeongyeom +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline` is a English model originally trained by jeongyeom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline_en_5.5.0_3.0_1727059784678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline_en_5.5.0_3.0_1727059784678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jeongyeom_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jeongyeom/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ladoza03_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ladoza03_en.md new file mode 100644 index 00000000000000..a8d2baa413f734 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ladoza03_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ladoza03 DistilBertForSequenceClassification from ladoza03 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ladoza03 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ladoza03` is a English model originally trained by ladoza03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ladoza03_en_5.5.0_3.0_1727110497479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ladoza03_en_5.5.0_3.0_1727110497479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ladoza03","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ladoza03", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ladoza03| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ladoza03/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline_en.md new file mode 100644 index 00000000000000..89c2828ac442aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline pipeline DistilBertForSequenceClassification from ladoza03 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline` is a English model originally trained by ladoza03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline_en_5.5.0_3.0_1727110509485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline_en_5.5.0_3.0_1727110509485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ladoza03_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ladoza03/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_mikhab_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_mikhab_en.md new file mode 100644 index 00000000000000..849f27927e500c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_mikhab_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_mikhab DistilBertForSequenceClassification from mikhab +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_mikhab +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_mikhab` is a English model originally trained by mikhab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mikhab_en_5.5.0_3.0_1727059479937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mikhab_en_5.5.0_3.0_1727059479937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mikhab","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_mikhab", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_mikhab| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mikhab/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_sharadhonavar_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_sharadhonavar_en.md new file mode 100644 index 00000000000000..8c74aa12482a09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_sharadhonavar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_sharadhonavar DistilBertForSequenceClassification from Sharadhonavar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_sharadhonavar +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_sharadhonavar` is a English model originally trained by Sharadhonavar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sharadhonavar_en_5.5.0_3.0_1727097276844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sharadhonavar_en_5.5.0_3.0_1727097276844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_sharadhonavar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_sharadhonavar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_sharadhonavar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Sharadhonavar/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_trsekhar_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_trsekhar_en.md new file mode 100644 index 00000000000000..891f91aed04e3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_trsekhar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_trsekhar DistilBertForSequenceClassification from trsekhar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_trsekhar +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_trsekhar` is a English model originally trained by trsekhar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_trsekhar_en_5.5.0_3.0_1727073952136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_trsekhar_en_5.5.0_3.0_1727073952136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_trsekhar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_trsekhar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_trsekhar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/trsekhar/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ziwone_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ziwone_pipeline_en.md new file mode 100644 index 00000000000000..20ae2c2b5d1366 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ziwone_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ziwone_pipeline pipeline DistilBertForSequenceClassification from ziwone +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ziwone_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ziwone_pipeline` is a English model originally trained by ziwone. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ziwone_pipeline_en_5.5.0_3.0_1727074069147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ziwone_pipeline_en_5.5.0_3.0_1727074069147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ziwone_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ziwone_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ziwone_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ziwone/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_m_share_facts_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_m_share_facts_pipeline_en.md new file mode 100644 index 00000000000000..bd24666b649460 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_m_share_facts_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_m_share_facts_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_m_share_facts_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_m_share_facts_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_share_facts_pipeline_en_5.5.0_3.0_1727082214896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_share_facts_pipeline_en_5.5.0_3.0_1727082214896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_m_share_facts_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_m_share_facts_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_m_share_facts_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-m_share_facts + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_en.md new file mode 100644 index 00000000000000..d26a9101f864f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen DistilBertForSequenceClassification from kghanlon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_en_5.5.0_3.0_1727096984328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_en_5.5.0_3.0_1727096984328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/kghanlon/distilbert-base-uncased-finetuned-MP-unannotated-half-frozen-v1-FULL_CLASSES-v1_un_frozen \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline_en.md new file mode 100644 index 00000000000000..c1dc311d8ee632 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline pipeline DistilBertForSequenceClassification from kghanlon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline_en_5.5.0_3.0_1727096996121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline_en_5.5.0_3.0_1727096996121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_mp_unannotated_half_frozen_v1_full_classes_v1_un_frozen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/kghanlon/distilbert-base-uncased-finetuned-MP-unannotated-half-frozen-v1-FULL_CLASSES-v1_un_frozen + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_pad_clf_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_pad_clf_v2_pipeline_en.md new file mode 100644 index 00000000000000..604640f2ee4291 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_pad_clf_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_pad_clf_v2_pipeline pipeline DistilBertForSequenceClassification from netoferraz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_pad_clf_v2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_pad_clf_v2_pipeline` is a English model originally trained by netoferraz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_pad_clf_v2_pipeline_en_5.5.0_3.0_1727110399689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_pad_clf_v2_pipeline_en_5.5.0_3.0_1727110399689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_pad_clf_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_pad_clf_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_pad_clf_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/netoferraz/distilbert-base-uncased-finetuned-pad-clf-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_en.md new file mode 100644 index 00000000000000..eb4df75da01309 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024 DistilBertForSequenceClassification from Beijaflor2024 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024` is a English model originally trained by Beijaflor2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_en_5.5.0_3.0_1727093987044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_en_5.5.0_3.0_1727093987044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Beijaflor2024/distilbert-base-uncased-finetuned-sst-2-english \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_tweets_dataset_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_tweets_dataset_en.md new file mode 100644 index 00000000000000..5760775125dbe8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_tweets_dataset_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_tweets_dataset DistilBertForSequenceClassification from lambda101 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_tweets_dataset +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_tweets_dataset` is a English model originally trained by lambda101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweets_dataset_en_5.5.0_3.0_1727086751833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweets_dataset_en_5.5.0_3.0_1727086751833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_tweets_dataset","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_tweets_dataset", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_tweets_dataset| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lambda101/distilbert-base-uncased-finetuned-tweets-dataset \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_en.md new file mode 100644 index 00000000000000..6e5c35bb4feb95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1727108557731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1727108557731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut1largePfxNf_simsp300_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_en.md new file mode 100644 index 00000000000000..605a4b0414a49c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_en_5.5.0_3.0_1727093720340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_en_5.5.0_3.0_1727093720340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut5_PLPrefix0stlarge17_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..a7d29db27afc9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline_en_5.5.0_3.0_1727093733181.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline_en_5.5.0_3.0_1727093733181.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut5_plprefix0stlarge17_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut5_PLPrefix0stlarge17_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..12d752d8199483 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727097220467.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727097220467.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st1sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md new file mode 100644 index 00000000000000..ffd4e83b16ec4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727093568416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727093568416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline_en.md new file mode 100644 index 00000000000000..5b9853bf6070fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline_en_5.5.0_3.0_1727108654944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline_en_5.5.0_3.0_1727108654944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge30_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline_en.md new file mode 100644 index 00000000000000..9bde458637c8fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline_en_5.5.0_3.0_1727093835824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline_en_5.5.0_3.0_1727093835824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut92ut1_PL0stlarge42_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_resumesclasssifierv1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_resumesclasssifierv1_pipeline_en.md new file mode 100644 index 00000000000000..53bf534fec794f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_resumesclasssifierv1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_resumesclasssifierv1_pipeline pipeline DistilBertForSequenceClassification from youssefkhalil320 +author: John Snow Labs +name: distilbert_base_uncased_resumesclasssifierv1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_resumesclasssifierv1_pipeline` is a English model originally trained by youssefkhalil320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_resumesclasssifierv1_pipeline_en_5.5.0_3.0_1727073682591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_resumesclasssifierv1_pipeline_en_5.5.0_3.0_1727073682591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_resumesclasssifierv1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_resumesclasssifierv1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_resumesclasssifierv1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/youssefkhalil320/distilbert-base-uncased-resumesClasssifierV1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline_en.md new file mode 100644 index 00000000000000..0a6b2cc8d0100d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline_en_5.5.0_3.0_1727087350420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline_en_5.5.0_3.0_1727087350420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut52ut1_ad7_sp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline_en.md new file mode 100644 index 00000000000000..0bf782309c3e28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline_en_5.5.0_3.0_1727108578579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline_en_5.5.0_3.0_1727108578579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut1_plainPrefix_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_gthivaios_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_gthivaios_pipeline_en.md new file mode 100644 index 00000000000000..856e77f46c62fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_gthivaios_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_gthivaios_pipeline pipeline DistilBertForSequenceClassification from gthivaios +author: John Snow Labs +name: distilbert_emotion_gthivaios_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_gthivaios_pipeline` is a English model originally trained by gthivaios. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_gthivaios_pipeline_en_5.5.0_3.0_1727087111405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_gthivaios_pipeline_en_5.5.0_3.0_1727087111405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_gthivaios_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_gthivaios_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_gthivaios_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gthivaios/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_fine_turned_classification_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_fine_turned_classification_en.md new file mode 100644 index 00000000000000..60deb08f9cd405 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_fine_turned_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_fine_turned_classification DistilBertForSequenceClassification from abhimanyuaryan +author: John Snow Labs +name: distilbert_fine_turned_classification +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_fine_turned_classification` is a English model originally trained by abhimanyuaryan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_fine_turned_classification_en_5.5.0_3.0_1727110503231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_fine_turned_classification_en_5.5.0_3.0_1727110503231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_fine_turned_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_fine_turned_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_fine_turned_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abhimanyuaryan/distilbert-fine-turned-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_foundation_category_funders_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_foundation_category_funders_en.md new file mode 100644 index 00000000000000..c8035faee74e1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_foundation_category_funders_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_foundation_category_funders DistilBertForSequenceClassification from eric-mc2 +author: John Snow Labs +name: distilbert_foundation_category_funders +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_foundation_category_funders` is a English model originally trained by eric-mc2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_funders_en_5.5.0_3.0_1727108670250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_funders_en_5.5.0_3.0_1727108670250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_foundation_category_funders","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_foundation_category_funders", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_foundation_category_funders| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eric-mc2/distilbert-foundation-category-funders \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_kuma9831_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_kuma9831_pipeline_en.md new file mode 100644 index 00000000000000..50202b671e5287 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_kuma9831_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_imdb_kuma9831_pipeline pipeline DistilBertForSequenceClassification from kuma9831 +author: John Snow Labs +name: distilbert_imdb_kuma9831_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_kuma9831_pipeline` is a English model originally trained by kuma9831. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_kuma9831_pipeline_en_5.5.0_3.0_1727097259922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_kuma9831_pipeline_en_5.5.0_3.0_1727097259922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_imdb_kuma9831_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_imdb_kuma9831_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_kuma9831_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kuma9831/distilbert-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_pd_books_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_pd_books_pipeline_en.md new file mode 100644 index 00000000000000..e16c0516a907f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_pd_books_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_pd_books_pipeline pipeline DistilBertForSequenceClassification from Gaxys +author: John Snow Labs +name: distilbert_pd_books_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_pd_books_pipeline` is a English model originally trained by Gaxys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_pd_books_pipeline_en_5.5.0_3.0_1727087238320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_pd_books_pipeline_en_5.5.0_3.0_1727087238320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_pd_books_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_pd_books_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_pd_books_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gaxys/DistilBERT-PD_Books + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline_en.md new file mode 100644 index 00000000000000..c1cbe420d5f582 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline_en_5.5.0_3.0_1727082204816.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline_en_5.5.0_3.0_1727082204816.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_cola_256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline_en.md new file mode 100644 index 00000000000000..d11c685f568269 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline_en_5.5.0_3.0_1727097085462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline_en_5.5.0_3.0_1727097085462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|111.9 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qnli_384 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sentiment_test_2023dec_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sentiment_test_2023dec_en.md new file mode 100644 index 00000000000000..3b65635eaf2eeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sentiment_test_2023dec_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sentiment_test_2023dec DistilBertForSequenceClassification from FungSung +author: John Snow Labs +name: distilbert_sentiment_test_2023dec +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sentiment_test_2023dec` is a English model originally trained by FungSung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_test_2023dec_en_5.5.0_3.0_1727108675899.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_test_2023dec_en_5.5.0_3.0_1727108675899.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sentiment_test_2023dec","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sentiment_test_2023dec", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sentiment_test_2023dec| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/FungSung/distilBert_sentiment_test_2023DEC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_twitterfin_padding70model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_twitterfin_padding70model_pipeline_en.md new file mode 100644 index 00000000000000..545bc323a13a55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_twitterfin_padding70model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_twitterfin_padding70model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding70model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding70model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding70model_pipeline_en_5.5.0_3.0_1727059757491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding70model_pipeline_en_5.5.0_3.0_1727059757491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_twitterfin_padding70model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_twitterfin_padding70model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding70model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding70model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distillbert_indiv_vocab_ver4_1_en.md b/docs/_posts/ahmedlone127/2024-09-23-distillbert_indiv_vocab_ver4_1_en.md new file mode 100644 index 00000000000000..bac514a2c10da3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distillbert_indiv_vocab_ver4_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distillbert_indiv_vocab_ver4_1 DistilBertForTokenClassification from AdiShingote +author: John Snow Labs +name: distillbert_indiv_vocab_ver4_1 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_indiv_vocab_ver4_1` is a English model originally trained by AdiShingote. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_indiv_vocab_ver4_1_en_5.5.0_3.0_1727120690210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_indiv_vocab_ver4_1_en_5.5.0_3.0_1727120690210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distillbert_indiv_vocab_ver4_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distillbert_indiv_vocab_ver4_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_indiv_vocab_ver4_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|352.8 MB| + +## References + +https://huggingface.co/AdiShingote/Distillbert-indiv-vocab-ver4.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distillbert_indiv_vocab_ver4_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distillbert_indiv_vocab_ver4_1_pipeline_en.md new file mode 100644 index 00000000000000..c693fce390488b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distillbert_indiv_vocab_ver4_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distillbert_indiv_vocab_ver4_1_pipeline pipeline DistilBertForTokenClassification from AdiShingote +author: John Snow Labs +name: distillbert_indiv_vocab_ver4_1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_indiv_vocab_ver4_1_pipeline` is a English model originally trained by AdiShingote. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_indiv_vocab_ver4_1_pipeline_en_5.5.0_3.0_1727120706053.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_indiv_vocab_ver4_1_pipeline_en_5.5.0_3.0_1727120706053.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distillbert_indiv_vocab_ver4_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distillbert_indiv_vocab_ver4_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_indiv_vocab_ver4_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|352.8 MB| + +## References + +https://huggingface.co/AdiShingote/Distillbert-indiv-vocab-ver4.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_fitness_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_fitness_en.md new file mode 100644 index 00000000000000..29ef3a392dd9e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_fitness_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_fitness RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_fitness +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_fitness` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_fitness_en_5.5.0_3.0_1727121703017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_fitness_en_5.5.0_3.0_1727121703017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_fitness","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_fitness","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_fitness| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-Fitness \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_fitness_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_fitness_pipeline_en.md new file mode 100644 index 00000000000000..19b34ba14d8203 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_fitness_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_fitness_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_fitness_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_fitness_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_fitness_pipeline_en_5.5.0_3.0_1727121716644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_fitness_pipeline_en_5.5.0_3.0_1727121716644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_fitness_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_fitness_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_fitness_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-Fitness + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-dp_roberta_large_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-dp_roberta_large_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..fa15ba9d299395 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-dp_roberta_large_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dp_roberta_large_finetuned_pipeline pipeline RoBertaForSequenceClassification from GRMenon +author: John Snow Labs +name: dp_roberta_large_finetuned_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dp_roberta_large_finetuned_pipeline` is a English model originally trained by GRMenon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dp_roberta_large_finetuned_pipeline_en_5.5.0_3.0_1727086075857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dp_roberta_large_finetuned_pipeline_en_5.5.0_3.0_1727086075857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dp_roberta_large_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dp_roberta_large_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dp_roberta_large_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/GRMenon/dp-roberta-large-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ellis_v2_emotion_leadership_multi_label_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-ellis_v2_emotion_leadership_multi_label_pipeline_en.md new file mode 100644 index 00000000000000..19cc838667dadd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ellis_v2_emotion_leadership_multi_label_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ellis_v2_emotion_leadership_multi_label_pipeline pipeline DistilBertForSequenceClassification from gsl22 +author: John Snow Labs +name: ellis_v2_emotion_leadership_multi_label_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ellis_v2_emotion_leadership_multi_label_pipeline` is a English model originally trained by gsl22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ellis_v2_emotion_leadership_multi_label_pipeline_en_5.5.0_3.0_1727082282019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ellis_v2_emotion_leadership_multi_label_pipeline_en_5.5.0_3.0_1727082282019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ellis_v2_emotion_leadership_multi_label_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ellis_v2_emotion_leadership_multi_label_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ellis_v2_emotion_leadership_multi_label_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gsl22/ellis-v2-emotion-leadership-multi-label + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ensemble_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-ensemble_roberta_en.md new file mode 100644 index 00000000000000..24f608a4499512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ensemble_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ensemble_roberta RoBertaForSequenceClassification from Crayo1902 +author: John Snow Labs +name: ensemble_roberta +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ensemble_roberta` is a English model originally trained by Crayo1902. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ensemble_roberta_en_5.5.0_3.0_1727135664732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ensemble_roberta_en_5.5.0_3.0_1727135664732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ensemble_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ensemble_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ensemble_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|430.7 MB| + +## References + +https://huggingface.co/Crayo1902/ensemble-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ensemble_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-ensemble_roberta_pipeline_en.md new file mode 100644 index 00000000000000..4a8b4e4958135a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ensemble_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ensemble_roberta_pipeline pipeline RoBertaForSequenceClassification from Crayo1902 +author: John Snow Labs +name: ensemble_roberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ensemble_roberta_pipeline` is a English model originally trained by Crayo1902. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ensemble_roberta_pipeline_en_5.5.0_3.0_1727135692542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ensemble_roberta_pipeline_en_5.5.0_3.0_1727135692542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ensemble_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ensemble_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ensemble_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|430.7 MB| + +## References + +https://huggingface.co/Crayo1902/ensemble-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-environmentalbert_forest_en.md b/docs/_posts/ahmedlone127/2024-09-23-environmentalbert_forest_en.md new file mode 100644 index 00000000000000..bb74f8e63682c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-environmentalbert_forest_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English environmentalbert_forest RoBertaForSequenceClassification from ESGBERT +author: John Snow Labs +name: environmentalbert_forest +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`environmentalbert_forest` is a English model originally trained by ESGBERT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/environmentalbert_forest_en_5.5.0_3.0_1727135072808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/environmentalbert_forest_en_5.5.0_3.0_1727135072808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("environmentalbert_forest","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("environmentalbert_forest", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|environmentalbert_forest| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/ESGBERT/EnvironmentalBERT-forest \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-fake_news_en.md b/docs/_posts/ahmedlone127/2024-09-23-fake_news_en.md new file mode 100644 index 00000000000000..5fdc41856f5811 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-fake_news_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fake_news DistilBertForSequenceClassification from nlp-godfathers +author: John Snow Labs +name: fake_news +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news` is a English model originally trained by nlp-godfathers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_en_5.5.0_3.0_1727082217772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_en_5.5.0_3.0_1727082217772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fake_news","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fake_news", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.2 MB| + +## References + +https://huggingface.co/nlp-godfathers/fake_news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-financial_phrasebank_fulltraindata_8020split_en.md b/docs/_posts/ahmedlone127/2024-09-23-financial_phrasebank_fulltraindata_8020split_en.md new file mode 100644 index 00000000000000..1a970bf5ac722b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-financial_phrasebank_fulltraindata_8020split_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English financial_phrasebank_fulltraindata_8020split RoBertaForSequenceClassification from kruthof +author: John Snow Labs +name: financial_phrasebank_fulltraindata_8020split +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_phrasebank_fulltraindata_8020split` is a English model originally trained by kruthof. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_phrasebank_fulltraindata_8020split_en_5.5.0_3.0_1727135422668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_phrasebank_fulltraindata_8020split_en_5.5.0_3.0_1727135422668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("financial_phrasebank_fulltraindata_8020split","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("financial_phrasebank_fulltraindata_8020split", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_phrasebank_fulltraindata_8020split| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.1 MB| + +## References + +https://huggingface.co/kruthof/financial_phrasebank_fullTrainData_8020split \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-financial_phrasebank_fulltraindata_8020split_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-financial_phrasebank_fulltraindata_8020split_pipeline_en.md new file mode 100644 index 00000000000000..a9ca6f10ff88c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-financial_phrasebank_fulltraindata_8020split_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English financial_phrasebank_fulltraindata_8020split_pipeline pipeline RoBertaForSequenceClassification from kruthof +author: John Snow Labs +name: financial_phrasebank_fulltraindata_8020split_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_phrasebank_fulltraindata_8020split_pipeline` is a English model originally trained by kruthof. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_phrasebank_fulltraindata_8020split_pipeline_en_5.5.0_3.0_1727135453320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_phrasebank_fulltraindata_8020split_pipeline_en_5.5.0_3.0_1727135453320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("financial_phrasebank_fulltraindata_8020split_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("financial_phrasebank_fulltraindata_8020split_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_phrasebank_fulltraindata_8020split_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|435.1 MB| + +## References + +https://huggingface.co/kruthof/financial_phrasebank_fullTrainData_8020split + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_bio_clinicalbert_2012i2b2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_bio_clinicalbert_2012i2b2_pipeline_en.md new file mode 100644 index 00000000000000..945d0e508f7bd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_bio_clinicalbert_2012i2b2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_bio_clinicalbert_2012i2b2_pipeline pipeline BertForTokenClassification from xiaojingduan +author: John Snow Labs +name: finetuned_bio_clinicalbert_2012i2b2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bio_clinicalbert_2012i2b2_pipeline` is a English model originally trained by xiaojingduan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bio_clinicalbert_2012i2b2_pipeline_en_5.5.0_3.0_1727060807463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bio_clinicalbert_2012i2b2_pipeline_en_5.5.0_3.0_1727060807463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_bio_clinicalbert_2012i2b2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_bio_clinicalbert_2012i2b2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bio_clinicalbert_2012i2b2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/xiaojingduan/finetuned_bio_clinicalbert_2012i2b2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_imsoumyaneel_25k_epoch_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_imsoumyaneel_25k_epoch_10_pipeline_en.md new file mode 100644 index 00000000000000..afd44c39985dd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_imsoumyaneel_25k_epoch_10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_model_imsoumyaneel_25k_epoch_10_pipeline pipeline DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: finetuned_model_imsoumyaneel_25k_epoch_10_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_model_imsoumyaneel_25k_epoch_10_pipeline` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_model_imsoumyaneel_25k_epoch_10_pipeline_en_5.5.0_3.0_1727086887236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_model_imsoumyaneel_25k_epoch_10_pipeline_en_5.5.0_3.0_1727086887236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_model_imsoumyaneel_25k_epoch_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_model_imsoumyaneel_25k_epoch_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_model_imsoumyaneel_25k_epoch_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Finetuned-model-imsoumyaneel-25k-Epoch-10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_sentiment_modell_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_sentiment_modell_pipeline_en.md new file mode 100644 index 00000000000000..5a9aa911b6c056 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_sentiment_modell_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_sentiment_modell_pipeline pipeline XlmRoBertaForSequenceClassification from Justin-J +author: John Snow Labs +name: finetuned_sentiment_modell_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sentiment_modell_pipeline` is a English model originally trained by Justin-J. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_modell_pipeline_en_5.5.0_3.0_1727126220838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_modell_pipeline_en_5.5.0_3.0_1727126220838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_sentiment_modell_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_sentiment_modell_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sentiment_modell_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Justin-J/finetuned_sentiment_modell + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_eric313_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_eric313_en.md new file mode 100644 index 00000000000000..03c2b24faf0b16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_eric313_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_emotion_model_eric313 DistilBertForSequenceClassification from Eric313 +author: John Snow Labs +name: finetuning_emotion_model_eric313 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_emotion_model_eric313` is a English model originally trained by Eric313. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_eric313_en_5.5.0_3.0_1727074054564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_eric313_en_5.5.0_3.0_1727074054564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_emotion_model_eric313","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_emotion_model_eric313", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_emotion_model_eric313| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Eric313/finetuning-emotion-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_eric313_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_eric313_pipeline_en.md new file mode 100644 index 00000000000000..ffbcbb6b39e513 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_eric313_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_emotion_model_eric313_pipeline pipeline DistilBertForSequenceClassification from Eric313 +author: John Snow Labs +name: finetuning_emotion_model_eric313_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_emotion_model_eric313_pipeline` is a English model originally trained by Eric313. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_eric313_pipeline_en_5.5.0_3.0_1727074066174.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_eric313_pipeline_en_5.5.0_3.0_1727074066174.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_emotion_model_eric313_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_emotion_model_eric313_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_emotion_model_eric313_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Eric313/finetuning-emotion-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_assoboss_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_assoboss_en.md new file mode 100644 index 00000000000000..552fd6348a54ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_assoboss_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_assoboss DistilBertForSequenceClassification from assoboss +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_assoboss +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_assoboss` is a English model originally trained by assoboss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_assoboss_en_5.5.0_3.0_1727074269741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_assoboss_en_5.5.0_3.0_1727074269741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_assoboss","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_assoboss", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_assoboss| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/assoboss/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_emmaly0937245_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_emmaly0937245_en.md new file mode 100644 index 00000000000000..f70dc8870db5b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_emmaly0937245_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_emmaly0937245 DistilBertForSequenceClassification from emmaly0937245 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_emmaly0937245 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_emmaly0937245` is a English model originally trained by emmaly0937245. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_emmaly0937245_en_5.5.0_3.0_1727059655993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_emmaly0937245_en_5.5.0_3.0_1727059655993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_emmaly0937245","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_emmaly0937245", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_emmaly0937245| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/emmaly0937245/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline_en.md new file mode 100644 index 00000000000000..c98fc2486bcaf9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline pipeline DistilBertForSequenceClassification from emmaly0937245 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline` is a English model originally trained by emmaly0937245. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline_en_5.5.0_3.0_1727059668020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline_en_5.5.0_3.0_1727059668020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_emmaly0937245_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/emmaly0937245/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline_en.md new file mode 100644 index 00000000000000..88a2219f69d1aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline pipeline DistilBertForSequenceClassification from ih8l1ght +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline` is a English model originally trained by ih8l1ght. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline_en_5.5.0_3.0_1727094039365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline_en_5.5.0_3.0_1727094039365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ih8l1ght_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ih8l1ght/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_inn_ctrl_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_inn_ctrl_en.md new file mode 100644 index 00000000000000..e53501897118a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_inn_ctrl_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_inn_ctrl DistilBertForSequenceClassification from inn-ctrl +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_inn_ctrl +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_inn_ctrl` is a English model originally trained by inn-ctrl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_inn_ctrl_en_5.5.0_3.0_1727110394593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_inn_ctrl_en_5.5.0_3.0_1727110394593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_inn_ctrl","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_inn_ctrl", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_inn_ctrl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/inn-ctrl/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline_en.md new file mode 100644 index 00000000000000..a354d6f29381a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline pipeline DistilBertForSequenceClassification from inn-ctrl +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline` is a English model originally trained by inn-ctrl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline_en_5.5.0_3.0_1727110406993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline_en_5.5.0_3.0_1727110406993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_inn_ctrl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/inn-ctrl/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_nandyala12_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_nandyala12_en.md new file mode 100644 index 00000000000000..a222ffa089c2e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_nandyala12_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_nandyala12 DistilBertForSequenceClassification from Nandyala12 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_nandyala12 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_nandyala12` is a English model originally trained by Nandyala12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nandyala12_en_5.5.0_3.0_1727097358137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nandyala12_en_5.5.0_3.0_1727097358137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_nandyala12","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_nandyala12", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_nandyala12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Nandyala12/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ritesh47_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ritesh47_en.md new file mode 100644 index 00000000000000..3e4216f5bc15cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ritesh47_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ritesh47 DistilBertForSequenceClassification from ritesh47 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ritesh47 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ritesh47` is a English model originally trained by ritesh47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ritesh47_en_5.5.0_3.0_1727073633754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ritesh47_en_5.5.0_3.0_1727073633754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ritesh47","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ritesh47", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ritesh47| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ritesh47/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-gal_sayula_popoluca_iw_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-gal_sayula_popoluca_iw_2_en.md new file mode 100644 index 00000000000000..078d36d78761aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-gal_sayula_popoluca_iw_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English gal_sayula_popoluca_iw_2 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_sayula_popoluca_iw_2 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_sayula_popoluca_iw_2` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iw_2_en_5.5.0_3.0_1727132889328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iw_2_en_5.5.0_3.0_1727132889328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_sayula_popoluca_iw_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_sayula_popoluca_iw_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_sayula_popoluca_iw_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|417.1 MB| + +## References + +https://huggingface.co/homersimpson/gal-pos-iw-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-groberta_goemotions_en.md b/docs/_posts/ahmedlone127/2024-09-23-groberta_goemotions_en.md new file mode 100644 index 00000000000000..7a3218d1ded30d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-groberta_goemotions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English groberta_goemotions RoBertaForSequenceClassification from Mukundhan32 +author: John Snow Labs +name: groberta_goemotions +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`groberta_goemotions` is a English model originally trained by Mukundhan32. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/groberta_goemotions_en_5.5.0_3.0_1727085809523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/groberta_goemotions_en_5.5.0_3.0_1727085809523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("groberta_goemotions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("groberta_goemotions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|groberta_goemotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|453.3 MB| + +## References + +https://huggingface.co/Mukundhan32/Groberta-goemotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_en.md b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_en.md new file mode 100644 index 00000000000000..a06c37e990072d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1727055474650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1727055474650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random2_seed0-twitter-roberta-base-2021-124m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline_en.md new file mode 100644 index 00000000000000..7e28c52af1c38f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline pipeline DistilBertForSequenceClassification from darmendarizp +author: John Snow Labs +name: imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline` is a English model originally trained by darmendarizp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline_en_5.5.0_3.0_1727082664941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline_en_5.5.0_3.0_1727082664941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_distilbert_sst2_transfer_learning_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/darmendarizp/imdbreviews_classification_distilbert_sst2_transfer_learning + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-kanglish_offensive_language_identification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-kanglish_offensive_language_identification_pipeline_en.md new file mode 100644 index 00000000000000..ec0ee3747fe3f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-kanglish_offensive_language_identification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kanglish_offensive_language_identification_pipeline pipeline RoBertaForSequenceClassification from seanbenhur +author: John Snow Labs +name: kanglish_offensive_language_identification_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kanglish_offensive_language_identification_pipeline` is a English model originally trained by seanbenhur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kanglish_offensive_language_identification_pipeline_en_5.5.0_3.0_1727134938987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kanglish_offensive_language_identification_pipeline_en_5.5.0_3.0_1727134938987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kanglish_offensive_language_identification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kanglish_offensive_language_identification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kanglish_offensive_language_identification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|451.8 MB| + +## References + +https://huggingface.co/seanbenhur/kanglish-offensive-language-identification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-kor_bert_qa_test_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-kor_bert_qa_test_2_en.md new file mode 100644 index 00000000000000..db63e0edae4675 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-kor_bert_qa_test_2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English kor_bert_qa_test_2 BertForQuestionAnswering from lemonTree5366 +author: John Snow Labs +name: kor_bert_qa_test_2 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kor_bert_qa_test_2` is a English model originally trained by lemonTree5366. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kor_bert_qa_test_2_en_5.5.0_3.0_1727070299469.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kor_bert_qa_test_2_en_5.5.0_3.0_1727070299469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("kor_bert_qa_test_2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("kor_bert_qa_test_2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kor_bert_qa_test_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|441.2 MB| + +## References + +https://huggingface.co/lemonTree5366/kor_bert_qa_test_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-kor_bert_qa_test_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-kor_bert_qa_test_2_pipeline_en.md new file mode 100644 index 00000000000000..2f47827171af5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-kor_bert_qa_test_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English kor_bert_qa_test_2_pipeline pipeline BertForQuestionAnswering from lemonTree5366 +author: John Snow Labs +name: kor_bert_qa_test_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kor_bert_qa_test_2_pipeline` is a English model originally trained by lemonTree5366. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kor_bert_qa_test_2_pipeline_en_5.5.0_3.0_1727070321205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kor_bert_qa_test_2_pipeline_en_5.5.0_3.0_1727070321205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kor_bert_qa_test_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kor_bert_qa_test_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kor_bert_qa_test_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|441.2 MB| + +## References + +https://huggingface.co/lemonTree5366/kor_bert_qa_test_2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lab2_whisper_swedish_hi.md b/docs/_posts/ahmedlone127/2024-09-23-lab2_whisper_swedish_hi.md new file mode 100644 index 00000000000000..5faf945a3a1300 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lab2_whisper_swedish_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi lab2_whisper_swedish WhisperForCTC from SodraZatre +author: John Snow Labs +name: lab2_whisper_swedish +date: 2024-09-23 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab2_whisper_swedish` is a Hindi model originally trained by SodraZatre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab2_whisper_swedish_hi_5.5.0_3.0_1727116461072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab2_whisper_swedish_hi_5.5.0_3.0_1727116461072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("lab2_whisper_swedish","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("lab2_whisper_swedish", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab2_whisper_swedish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/SodraZatre/lab2-whisper-sv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lnm_ner_en.md b/docs/_posts/ahmedlone127/2024-09-23-lnm_ner_en.md new file mode 100644 index 00000000000000..a53cf63d2a4b74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lnm_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lnm_ner BertForTokenClassification from suyashmittal +author: John Snow Labs +name: lnm_ner +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lnm_ner` is a English model originally trained by suyashmittal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lnm_ner_en_5.5.0_3.0_1727129850854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lnm_ner_en_5.5.0_3.0_1727129850854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("lnm_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("lnm_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lnm_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/suyashmittal/lnm-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-model_1_8_en.md b/docs/_posts/ahmedlone127/2024-09-23-model_1_8_en.md new file mode 100644 index 00000000000000..4b9d05b7c96055 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-model_1_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_1_8 RoBertaForSequenceClassification from raydentseng +author: John Snow Labs +name: model_1_8 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_1_8` is a English model originally trained by raydentseng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_1_8_en_5.5.0_3.0_1727135064897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_1_8_en_5.5.0_3.0_1727135064897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_1_8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_1_8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_1_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.9 MB| + +## References + +https://huggingface.co/raydentseng/model_1_8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-model_1_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-model_1_8_pipeline_en.md new file mode 100644 index 00000000000000..9dc2ac5d17ef21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-model_1_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_1_8_pipeline pipeline RoBertaForSequenceClassification from raydentseng +author: John Snow Labs +name: model_1_8_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_1_8_pipeline` is a English model originally trained by raydentseng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_1_8_pipeline_en_5.5.0_3.0_1727135087186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_1_8_pipeline_en_5.5.0_3.0_1727135087186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_1_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_1_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_1_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.0 MB| + +## References + +https://huggingface.co/raydentseng/model_1_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-model_sentence_entailment_hackaton_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-model_sentence_entailment_hackaton_2_en.md new file mode 100644 index 00000000000000..fe20e7d95a0695 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-model_sentence_entailment_hackaton_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_sentence_entailment_hackaton_2 RoBertaForSequenceClassification from ludoviciarraga +author: John Snow Labs +name: model_sentence_entailment_hackaton_2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_sentence_entailment_hackaton_2` is a English model originally trained by ludoviciarraga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_2_en_5.5.0_3.0_1727135295638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_2_en_5.5.0_3.0_1727135295638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_sentence_entailment_hackaton_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_sentence_entailment_hackaton_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_sentence_entailment_hackaton_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ludoviciarraga/model_sentence_entailment_hackaton_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-model_sentence_entailment_hackaton_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-model_sentence_entailment_hackaton_2_pipeline_en.md new file mode 100644 index 00000000000000..07b37a002441fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-model_sentence_entailment_hackaton_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_sentence_entailment_hackaton_2_pipeline pipeline RoBertaForSequenceClassification from ludoviciarraga +author: John Snow Labs +name: model_sentence_entailment_hackaton_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_sentence_entailment_hackaton_2_pipeline` is a English model originally trained by ludoviciarraga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_2_pipeline_en_5.5.0_3.0_1727135371470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_2_pipeline_en_5.5.0_3.0_1727135371470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_sentence_entailment_hackaton_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_sentence_entailment_hackaton_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_sentence_entailment_hackaton_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ludoviciarraga/model_sentence_entailment_hackaton_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding20model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding20model_pipeline_en.md new file mode 100644 index 00000000000000..415bd98d40877c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding20model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_sst5_padding20model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst5_padding20model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding20model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding20model_pipeline_en_5.5.0_3.0_1727110763622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding20model_pipeline_en_5.5.0_3.0_1727110763622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_sst5_padding20model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_sst5_padding20model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding20model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst5_padding20model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding60model_en.md b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding60model_en.md new file mode 100644 index 00000000000000..3161373f706be1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding60model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_sst5_padding60model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst5_padding60model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding60model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding60model_en_5.5.0_3.0_1727093928776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding60model_en_5.5.0_3.0_1727093928776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding60model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding60model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding60model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst5_padding60model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_twitterfin_padding30model_en.md b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_twitterfin_padding30model_en.md new file mode 100644 index 00000000000000..325b8beed62dbf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_twitterfin_padding30model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_twitterfin_padding30model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_twitterfin_padding30model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_twitterfin_padding30model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding30model_en_5.5.0_3.0_1727110724511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding30model_en_5.5.0_3.0_1727110724511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_twitterfin_padding30model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_twitterfin_padding30model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_twitterfin_padding30model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_twitterfin_padding30model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_dataset_bert_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_dataset_bert_en.md new file mode 100644 index 00000000000000..2954f7f6c61df6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_dataset_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nepal_bhasa_dataset_bert RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: nepal_bhasa_dataset_bert +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_dataset_bert` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_dataset_bert_en_5.5.0_3.0_1727122069822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_dataset_bert_en_5.5.0_3.0_1727122069822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("nepal_bhasa_dataset_bert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("nepal_bhasa_dataset_bert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_dataset_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|461.7 MB| + +## References + +https://huggingface.co/ubaskota/new_dataset_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_dataset_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_dataset_bert_pipeline_en.md new file mode 100644 index 00000000000000..789269d0439f5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_dataset_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_dataset_bert_pipeline pipeline RoBertaEmbeddings from ubaskota +author: John Snow Labs +name: nepal_bhasa_dataset_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_dataset_bert_pipeline` is a English model originally trained by ubaskota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_dataset_bert_pipeline_en_5.5.0_3.0_1727122092171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_dataset_bert_pipeline_en_5.5.0_3.0_1727122092171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_dataset_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_dataset_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_dataset_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|461.7 MB| + +## References + +https://huggingface.co/ubaskota/new_dataset_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_phishing_email_detection2_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_phishing_email_detection2_en.md new file mode 100644 index 00000000000000..32e3351f3e7299 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_phishing_email_detection2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nepal_bhasa_phishing_email_detection2 DistilBertForSequenceClassification from kamikaze20 +author: John Snow Labs +name: nepal_bhasa_phishing_email_detection2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_phishing_email_detection2` is a English model originally trained by kamikaze20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_phishing_email_detection2_en_5.5.0_3.0_1727074260367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_phishing_email_detection2_en_5.5.0_3.0_1727074260367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_phishing_email_detection2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_phishing_email_detection2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_phishing_email_detection2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/kamikaze20/new_phishing-email-detection2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ner_clue_en.md b/docs/_posts/ahmedlone127/2024-09-23-ner_clue_en.md new file mode 100644 index 00000000000000..300018cc981e5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ner_clue_en.md @@ -0,0 +1,88 @@ +--- +layout: model +title: English ner_clue T5Transformer from helloya0908 +author: John Snow Labs +name: ner_clue +date: 2024-09-23 +tags: [en, open_source, onnx, t5, question_answering, summarization, translation, text_generation] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: T5Transformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_clue` is a English model originally trained by helloya0908. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_clue_en_5.5.0_3.0_1727124792734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_clue_en_5.5.0_3.0_1727124792734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +t5 = T5Transformer.pretrained("ner_clue","en") \ + .setInputCols(["document"]) \ + .setOutputCol("output") + +pipeline = Pipeline().setStages([documentAssembler, t5]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val t5 = T5Transformer.pretrained("ner_clue", "en") + .setInputCols(Array("documents")) + .setOutputCol("output") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, t5)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_clue| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[output]| +|Language:|en| +|Size:|950.4 MB| + +## References + +References + +https://huggingface.co/helloya0908/NER_CLUE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ner_clue_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-ner_clue_pipeline_en.md new file mode 100644 index 00000000000000..9434ed18a628d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ner_clue_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English ner_clue_pipeline pipeline T5Transformer from helloya0908 +author: John Snow Labs +name: ner_clue_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_clue_pipeline` is a English model originally trained by helloya0908. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_clue_pipeline_en_5.5.0_3.0_1727124855743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_clue_pipeline_en_5.5.0_3.0_1727124855743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("ner_clue_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("ner_clue_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_clue_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|950.4 MB| + +## References + +References + +https://huggingface.co/helloya0908/NER_CLUE + +## Included Models + +- DocumentAssembler +- T5Transformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ner_model_nathali99_en.md b/docs/_posts/ahmedlone127/2024-09-23-ner_model_nathali99_en.md new file mode 100644 index 00000000000000..b6a052d2090d5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ner_model_nathali99_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_model_nathali99 BertForTokenClassification from Nathali99 +author: John Snow Labs +name: ner_model_nathali99 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_model_nathali99` is a English model originally trained by Nathali99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_model_nathali99_en_5.5.0_3.0_1727129840770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_model_nathali99_en_5.5.0_3.0_1727129840770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_model_nathali99","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_model_nathali99", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_model_nathali99| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Nathali99/ner-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ner_ner_random1_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-23-ner_ner_random1_seed0_bernice_en.md new file mode 100644 index 00000000000000..d57952b78b534f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ner_ner_random1_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_ner_random1_seed0_bernice XlmRoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random1_seed0_bernice +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random1_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random1_seed0_bernice_en_5.5.0_3.0_1727132537180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random1_seed0_bernice_en_5.5.0_3.0_1727132537180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ner_ner_random1_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ner_ner_random1_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random1_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|802.5 MB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random1_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nerd_nerd_random0_seed1_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-23-nerd_nerd_random0_seed1_roberta_large_en.md new file mode 100644 index 00000000000000..dfb3d08e4e3860 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nerd_nerd_random0_seed1_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random0_seed1_roberta_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random0_seed1_roberta_large +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random0_seed1_roberta_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed1_roberta_large_en_5.5.0_3.0_1727135019648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed1_roberta_large_en_5.5.0_3.0_1727135019648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random0_seed1_roberta_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random0_seed1_roberta_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random0_seed1_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random0_seed1-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nerd_nerd_random0_seed1_roberta_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-nerd_nerd_random0_seed1_roberta_large_pipeline_en.md new file mode 100644 index 00000000000000..c7d3d083afd8ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nerd_nerd_random0_seed1_roberta_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerd_nerd_random0_seed1_roberta_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random0_seed1_roberta_large_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random0_seed1_roberta_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed1_roberta_large_pipeline_en_5.5.0_3.0_1727135092501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed1_roberta_large_pipeline_en_5.5.0_3.0_1727135092501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerd_nerd_random0_seed1_roberta_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerd_nerd_random0_seed1_roberta_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random0_seed1_roberta_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random0_seed1-roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-paraquantizar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-paraquantizar_pipeline_en.md new file mode 100644 index 00000000000000..9d2031dda8d805 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-paraquantizar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English paraquantizar_pipeline pipeline RoBertaForSequenceClassification from Heber77 +author: John Snow Labs +name: paraquantizar_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paraquantizar_pipeline` is a English model originally trained by Heber77. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paraquantizar_pipeline_en_5.5.0_3.0_1727055597488.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paraquantizar_pipeline_en_5.5.0_3.0_1727055597488.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("paraquantizar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("paraquantizar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paraquantizar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/Heber77/paraquantizar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-paws_x_xlm_r_only_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-paws_x_xlm_r_only_spanish_pipeline_en.md new file mode 100644 index 00000000000000..52af452662e97b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-paws_x_xlm_r_only_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English paws_x_xlm_r_only_spanish_pipeline pipeline XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: paws_x_xlm_r_only_spanish_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paws_x_xlm_r_only_spanish_pipeline` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paws_x_xlm_r_only_spanish_pipeline_en_5.5.0_3.0_1727099457879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paws_x_xlm_r_only_spanish_pipeline_en_5.5.0_3.0_1727099457879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("paws_x_xlm_r_only_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("paws_x_xlm_r_only_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paws_x_xlm_r_only_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|803.4 MB| + +## References + +https://huggingface.co/semindan/paws_x_xlm_r_only_es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-phowhisper_small_vi.md b/docs/_posts/ahmedlone127/2024-09-23-phowhisper_small_vi.md new file mode 100644 index 00000000000000..966e2f72c6b635 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-phowhisper_small_vi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Vietnamese phowhisper_small WhisperForCTC from huuquyet +author: John Snow Labs +name: phowhisper_small +date: 2024-09-23 +tags: [vi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phowhisper_small` is a Vietnamese model originally trained by huuquyet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phowhisper_small_vi_5.5.0_3.0_1727117014948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phowhisper_small_vi_5.5.0_3.0_1727117014948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("phowhisper_small","vi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("phowhisper_small", "vi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phowhisper_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|vi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/huuquyet/PhoWhisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-predict_perception_xlmr_cause_human_en.md b/docs/_posts/ahmedlone127/2024-09-23-predict_perception_xlmr_cause_human_en.md new file mode 100644 index 00000000000000..1596a5ac143e30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-predict_perception_xlmr_cause_human_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English predict_perception_xlmr_cause_human XlmRoBertaForSequenceClassification from responsibility-framing +author: John Snow Labs +name: predict_perception_xlmr_cause_human +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`predict_perception_xlmr_cause_human` is a English model originally trained by responsibility-framing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_human_en_5.5.0_3.0_1727126499130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_human_en_5.5.0_3.0_1727126499130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("predict_perception_xlmr_cause_human","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("predict_perception_xlmr_cause_human", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|predict_perception_xlmr_cause_human| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|837.6 MB| + +## References + +https://huggingface.co/responsibility-framing/predict-perception-xlmr-cause-human \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-predict_perception_xlmr_cause_human_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-predict_perception_xlmr_cause_human_pipeline_en.md new file mode 100644 index 00000000000000..264b921e61fe2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-predict_perception_xlmr_cause_human_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English predict_perception_xlmr_cause_human_pipeline pipeline XlmRoBertaForSequenceClassification from responsibility-framing +author: John Snow Labs +name: predict_perception_xlmr_cause_human_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`predict_perception_xlmr_cause_human_pipeline` is a English model originally trained by responsibility-framing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_human_pipeline_en_5.5.0_3.0_1727126563385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/predict_perception_xlmr_cause_human_pipeline_en_5.5.0_3.0_1727126563385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("predict_perception_xlmr_cause_human_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("predict_perception_xlmr_cause_human_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|predict_perception_xlmr_cause_human_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|837.6 MB| + +## References + +https://huggingface.co/responsibility-framing/predict-perception-xlmr-cause-human + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-qa_persian_albert_persian_farsi_zwnj_base_v2_en.md b/docs/_posts/ahmedlone127/2024-09-23-qa_persian_albert_persian_farsi_zwnj_base_v2_en.md new file mode 100644 index 00000000000000..1ab5f6e7b695a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-qa_persian_albert_persian_farsi_zwnj_base_v2_en.md @@ -0,0 +1,88 @@ +--- +layout: model +title: English qa_persian_albert_persian_farsi_zwnj_base_v2 AlbertForQuestionAnswering from makhataei +author: John Snow Labs +name: qa_persian_albert_persian_farsi_zwnj_base_v2 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, albert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_persian_albert_persian_farsi_zwnj_base_v2` is a English model originally trained by makhataei. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_persian_albert_persian_farsi_zwnj_base_v2_en_5.5.0_3.0_1727128416736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_persian_albert_persian_farsi_zwnj_base_v2_en_5.5.0_3.0_1727128416736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = AlbertForQuestionAnswering.pretrained("qa_persian_albert_persian_farsi_zwnj_base_v2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = AlbertForQuestionAnswering.pretrained("qa_persian_albert_persian_farsi_zwnj_base_v2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_persian_albert_persian_farsi_zwnj_base_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|41.8 MB| + +## References + +References + +https://huggingface.co/makhataei/qa-persian-albert-fa-zwnj-base-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline_en.md new file mode 100644 index 00000000000000..9035bb7eb9dee5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline pipeline AlbertForQuestionAnswering from makhataei +author: John Snow Labs +name: qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline` is a English model originally trained by makhataei. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline_en_5.5.0_3.0_1727128419239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline_en_5.5.0_3.0_1727128419239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_persian_albert_persian_farsi_zwnj_base_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.8 MB| + +## References + +References + +https://huggingface.co/makhataei/qa-persian-albert-fa-zwnj-base-v2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-results_yildizt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-results_yildizt_pipeline_en.md new file mode 100644 index 00000000000000..2daccee7238df3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-results_yildizt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English results_yildizt_pipeline pipeline DistilBertForSequenceClassification from yildizt +author: John Snow Labs +name: results_yildizt_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_yildizt_pipeline` is a English model originally trained by yildizt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_yildizt_pipeline_en_5.5.0_3.0_1727087309816.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_yildizt_pipeline_en_5.5.0_3.0_1727087309816.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("results_yildizt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("results_yildizt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_yildizt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yildizt/results + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-robbert_cosmetic_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-23-robbert_cosmetic_finetuned_en.md new file mode 100644 index 00000000000000..9c249942247cc0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-robbert_cosmetic_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robbert_cosmetic_finetuned RoBertaEmbeddings from ymelka +author: John Snow Labs +name: robbert_cosmetic_finetuned +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_cosmetic_finetuned` is a English model originally trained by ymelka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_finetuned_en_5.5.0_3.0_1727121936523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_finetuned_en_5.5.0_3.0_1727121936523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robbert_cosmetic_finetuned","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robbert_cosmetic_finetuned","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_cosmetic_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|443.8 MB| + +## References + +https://huggingface.co/ymelka/robbert-cosmetic-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-robbert_cosmetic_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-robbert_cosmetic_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..5715b6f78cff1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-robbert_cosmetic_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robbert_cosmetic_finetuned_pipeline pipeline RoBertaEmbeddings from ymelka +author: John Snow Labs +name: robbert_cosmetic_finetuned_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_cosmetic_finetuned_pipeline` is a English model originally trained by ymelka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_finetuned_pipeline_en_5.5.0_3.0_1727121957099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_finetuned_pipeline_en_5.5.0_3.0_1727121957099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robbert_cosmetic_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robbert_cosmetic_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_cosmetic_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|443.8 MB| + +## References + +https://huggingface.co/ymelka/robbert-cosmetic-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_airlines_news_multi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_airlines_news_multi_pipeline_en.md new file mode 100644 index 00000000000000..6cf3f5cd8f7acb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_airlines_news_multi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_airlines_news_multi_pipeline pipeline RoBertaForSequenceClassification from dahe827 +author: John Snow Labs +name: roberta_base_airlines_news_multi_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_airlines_news_multi_pipeline` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_multi_pipeline_en_5.5.0_3.0_1727085408158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_multi_pipeline_en_5.5.0_3.0_1727085408158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_airlines_news_multi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_airlines_news_multi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_airlines_news_multi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|434.2 MB| + +## References + +https://huggingface.co/dahe827/roberta-base-airlines-news-multi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_en.md new file mode 100644 index 00000000000000..3d0617f6c3a7c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad RoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_en_5.5.0_3.0_1727135401091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_en_5.5.0_3.0_1727135401091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-analisis-sentimiento-textos-turisticos-mx-polaridad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline_en.md new file mode 100644 index 00000000000000..1f93189266948f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline pipeline RoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline_en_5.5.0_3.0_1727135425452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline_en_5.5.0_3.0_1727135425452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_analisis_sentimiento_textos_turisticos_mx_polaridad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-analisis-sentimiento-textos-turisticos-mx-polaridad + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_45_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_45_en.md new file mode 100644 index 00000000000000..990b5508e437e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_45_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_45 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_45 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_45` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_45_en_5.5.0_3.0_1727056789478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_45_en_5.5.0_3.0_1727056789478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_45","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_45","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_45| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_45 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_46_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_46_en.md new file mode 100644 index 00000000000000..f1ab1ca8443aa6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_46_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_46 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_46 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_46` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_46_en_5.5.0_3.0_1727122195755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_46_en_5.5.0_3.0_1727122195755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_46","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_46","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_46| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_46 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_50_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_50_en.md new file mode 100644 index 00000000000000..cc661ff73d9762 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_50_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_50 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_50 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_50` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_50_en_5.5.0_3.0_1727121907390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_50_en_5.5.0_3.0_1727121907390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_50","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_50","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_50| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_50 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_50_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_50_pipeline_en.md new file mode 100644 index 00000000000000..b32b562fd2fa48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_50_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_50_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_50_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_50_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_50_pipeline_en_5.5.0_3.0_1727121989570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_50_pipeline_en_5.5.0_3.0_1727121989570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_50_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_50_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_50_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_50 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_he111111_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_he111111_en.md new file mode 100644 index 00000000000000..9120b794fe0e86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_he111111_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_he111111 RoBertaForSequenceClassification from he111111 +author: John Snow Labs +name: roberta_base_he111111 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_he111111` is a English model originally trained by he111111. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_he111111_en_5.5.0_3.0_1727135155055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_he111111_en_5.5.0_3.0_1727135155055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_he111111","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_he111111", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_he111111| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|451.4 MB| + +## References + +https://huggingface.co/he111111/Roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_he111111_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_he111111_pipeline_en.md new file mode 100644 index 00000000000000..2d21f206d437a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_he111111_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_he111111_pipeline pipeline RoBertaForSequenceClassification from he111111 +author: John Snow Labs +name: roberta_base_he111111_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_he111111_pipeline` is a English model originally trained by he111111. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_he111111_pipeline_en_5.5.0_3.0_1727135183943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_he111111_pipeline_en_5.5.0_3.0_1727135183943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_he111111_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_he111111_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_he111111_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|451.4 MB| + +## References + +https://huggingface.co/he111111/Roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_sentiment_en.md new file mode 100644 index 00000000000000..7c59688a7e235c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_sentiment RoBertaForSequenceClassification from 51la5 +author: John Snow Labs +name: roberta_base_sentiment +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sentiment` is a English model originally trained by 51la5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sentiment_en_5.5.0_3.0_1727135542779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sentiment_en_5.5.0_3.0_1727135542779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|433.2 MB| + +## References + +https://huggingface.co/51la5/roberta-base-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..2e89720f0a4403 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_sentiment_pipeline pipeline RoBertaForSequenceClassification from 51la5 +author: John Snow Labs +name: roberta_base_sentiment_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sentiment_pipeline` is a English model originally trained by 51la5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sentiment_pipeline_en_5.5.0_3.0_1727135582929.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sentiment_pipeline_en_5.5.0_3.0_1727135582929.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|433.2 MB| + +## References + +https://huggingface.co/51la5/roberta-base-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_snli_mtreviso_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_snli_mtreviso_en.md new file mode 100644 index 00000000000000..e9f7c02f67bf39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_snli_mtreviso_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_snli_mtreviso RoBertaForSequenceClassification from mtreviso +author: John Snow Labs +name: roberta_base_snli_mtreviso +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_snli_mtreviso` is a English model originally trained by mtreviso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_snli_mtreviso_en_5.5.0_3.0_1727134839277.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_snli_mtreviso_en_5.5.0_3.0_1727134839277.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_snli_mtreviso","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_snli_mtreviso", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_snli_mtreviso| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|447.8 MB| + +## References + +https://huggingface.co/mtreviso/roberta-base-snli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_conll_epoch_4_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_conll_epoch_4_en.md new file mode 100644 index 00000000000000..bb1bea33e0397e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_conll_epoch_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_conll_epoch_4 RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_conll_epoch_4 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_conll_epoch_4` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_4_en_5.5.0_3.0_1727081465346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_4_en_5.5.0_3.0_1727081465346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_conll_epoch_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_conll_epoch_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_conll_epoch_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_conll_epoch_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_english_financialnews_tuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_english_financialnews_tuned_pipeline_en.md new file mode 100644 index 00000000000000..354f2553c4afeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_english_financialnews_tuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_english_financialnews_tuned_pipeline pipeline RoBertaEmbeddings from CCCCC5 +author: John Snow Labs +name: roberta_english_financialnews_tuned_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_english_financialnews_tuned_pipeline` is a English model originally trained by CCCCC5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_english_financialnews_tuned_pipeline_en_5.5.0_3.0_1727066359701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_english_financialnews_tuned_pipeline_en_5.5.0_3.0_1727066359701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_english_financialnews_tuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_english_financialnews_tuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_english_financialnews_tuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.0 MB| + +## References + +https://huggingface.co/CCCCC5/RoBERTa_English_FinancialNews_tuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_inspirational_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_inspirational_en.md new file mode 100644 index 00000000000000..416907077dbd07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_inspirational_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_finetuned_inspirational XlmRoBertaForSequenceClassification from reecursion +author: John Snow Labs +name: roberta_finetuned_inspirational +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_inspirational` is a English model originally trained by reecursion. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_inspirational_en_5.5.0_3.0_1727127152928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_inspirational_en_5.5.0_3.0_1727127152928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("roberta_finetuned_inspirational","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("roberta_finetuned_inspirational", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_inspirational| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|813.0 MB| + +## References + +https://huggingface.co/reecursion/roberta-finetuned-inspirational \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_inspirational_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_inspirational_pipeline_en.md new file mode 100644 index 00000000000000..ce8804b49b39bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_inspirational_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_finetuned_inspirational_pipeline pipeline XlmRoBertaForSequenceClassification from reecursion +author: John Snow Labs +name: roberta_finetuned_inspirational_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_inspirational_pipeline` is a English model originally trained by reecursion. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_inspirational_pipeline_en_5.5.0_3.0_1727127269042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_inspirational_pipeline_en_5.5.0_3.0_1727127269042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_inspirational_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_inspirational_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_inspirational_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|813.1 MB| + +## References + +https://huggingface.co/reecursion/roberta-finetuned-inspirational + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_fever_sagnikrayc_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_fever_sagnikrayc_en.md new file mode 100644 index 00000000000000..6fdd455353f7c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_fever_sagnikrayc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_fever_sagnikrayc RoBertaForSequenceClassification from sagnikrayc +author: John Snow Labs +name: roberta_large_fever_sagnikrayc +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_fever_sagnikrayc` is a English model originally trained by sagnikrayc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_fever_sagnikrayc_en_5.5.0_3.0_1727135453464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_fever_sagnikrayc_en_5.5.0_3.0_1727135453464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_fever_sagnikrayc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_fever_sagnikrayc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_fever_sagnikrayc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/sagnikrayc/roberta-large-fever \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_fever_sagnikrayc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_fever_sagnikrayc_pipeline_en.md new file mode 100644 index 00000000000000..4bd6161f3fbb26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_fever_sagnikrayc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_fever_sagnikrayc_pipeline pipeline RoBertaForSequenceClassification from sagnikrayc +author: John Snow Labs +name: roberta_large_fever_sagnikrayc_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_fever_sagnikrayc_pipeline` is a English model originally trained by sagnikrayc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_fever_sagnikrayc_pipeline_en_5.5.0_3.0_1727135524693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_fever_sagnikrayc_pipeline_en_5.5.0_3.0_1727135524693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_fever_sagnikrayc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_fever_sagnikrayc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_fever_sagnikrayc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/sagnikrayc/roberta-large-fever + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_movie_genre_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_movie_genre_en.md new file mode 100644 index 00000000000000..7c7418fe2a21fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_movie_genre_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_movie_genre RoBertaEmbeddings from Shiro +author: John Snow Labs +name: roberta_large_movie_genre +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_movie_genre` is a English model originally trained by Shiro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_movie_genre_en_5.5.0_3.0_1727121940506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_movie_genre_en_5.5.0_3.0_1727121940506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_large_movie_genre","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_large_movie_genre","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_movie_genre| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Shiro/roberta-large-movie-genre \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_movie_genre_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_movie_genre_pipeline_en.md new file mode 100644 index 00000000000000..1b54e56ca21add --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_movie_genre_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_movie_genre_pipeline pipeline RoBertaEmbeddings from Shiro +author: John Snow Labs +name: roberta_large_movie_genre_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_movie_genre_pipeline` is a English model originally trained by Shiro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_movie_genre_pipeline_en_5.5.0_3.0_1727122002400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_movie_genre_pipeline_en_5.5.0_3.0_1727122002400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_movie_genre_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_movie_genre_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_movie_genre_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Shiro/roberta-large-movie-genre + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_unlabeled_gab_semeval2023_task10_45000sample_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_unlabeled_gab_semeval2023_task10_45000sample_en.md new file mode 100644 index 00000000000000..e7092ac4948f0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_unlabeled_gab_semeval2023_task10_45000sample_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_unlabeled_gab_semeval2023_task10_45000sample RoBertaEmbeddings from HPL +author: John Snow Labs +name: roberta_large_unlabeled_gab_semeval2023_task10_45000sample +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_unlabeled_gab_semeval2023_task10_45000sample` is a English model originally trained by HPL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_unlabeled_gab_semeval2023_task10_45000sample_en_5.5.0_3.0_1727121645108.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_unlabeled_gab_semeval2023_task10_45000sample_en_5.5.0_3.0_1727121645108.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_large_unlabeled_gab_semeval2023_task10_45000sample","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_large_unlabeled_gab_semeval2023_task10_45000sample","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_unlabeled_gab_semeval2023_task10_45000sample| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/HPL/roberta-large-unlabeled-gab-semeval2023-task10-45000sample \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ruroberta_large_neg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-ruroberta_large_neg_pipeline_en.md new file mode 100644 index 00000000000000..2b5a29e6f331ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ruroberta_large_neg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ruroberta_large_neg_pipeline pipeline RoBertaForTokenClassification from DimasikKurd +author: John Snow Labs +name: ruroberta_large_neg_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruroberta_large_neg_pipeline` is a English model originally trained by DimasikKurd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruroberta_large_neg_pipeline_en_5.5.0_3.0_1727072720324.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruroberta_large_neg_pipeline_en_5.5.0_3.0_1727072720324.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ruroberta_large_neg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ruroberta_large_neg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruroberta_large_neg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DimasikKurd/ruRoberta-large_neg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-s_ohm_en.md b/docs/_posts/ahmedlone127/2024-09-23-s_ohm_en.md new file mode 100644 index 00000000000000..acb6929db97a92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-s_ohm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English s_ohm RoBertaEmbeddings from anandohm +author: John Snow Labs +name: s_ohm +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`s_ohm` is a English model originally trained by anandohm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/s_ohm_en_5.5.0_3.0_1727091880416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/s_ohm_en_5.5.0_3.0_1727091880416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("s_ohm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("s_ohm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|s_ohm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.1 MB| + +## References + +https://huggingface.co/anandohm/S_ohm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_en.md b/docs/_posts/ahmedlone127/2024-09-23-sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_en.md new file mode 100644 index 00000000000000..8894a5ed060b73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15 BertForQuestionAnswering from phd411r1 +author: John Snow Labs +name: sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15` is a English model originally trained by phd411r1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_en_5.5.0_3.0_1727128026158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_en_5.5.0_3.0_1727128026158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|606.5 MB| + +## References + +https://huggingface.co/phd411r1/SajjadAyoubi_bert-base-fa-qa_finetune_on_am_15 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline_en.md new file mode 100644 index 00000000000000..011ce86c77e54f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline pipeline BertForQuestionAnswering from phd411r1 +author: John Snow Labs +name: sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline` is a English model originally trained by phd411r1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline_en_5.5.0_3.0_1727128061168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline_en_5.5.0_3.0_1727128061168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sajjadayoubi_bert_base_persian_farsi_qa_finetune_on_amharic_15_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|606.5 MB| + +## References + +https://huggingface.co/phd411r1/SajjadAyoubi_bert-base-fa-qa_finetune_on_am_15 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_arabertmo_base_v8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_arabertmo_base_v8_pipeline_en.md new file mode 100644 index 00000000000000..0b7b2f59b3332d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_arabertmo_base_v8_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_arabertmo_base_v8_pipeline pipeline BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v8_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v8_pipeline` is a English model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v8_pipeline_en_5.5.0_3.0_1727091070845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v8_pipeline_en_5.5.0_3.0_1727091070845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_arabertmo_base_v8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_arabertmo_base_v8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline_en.md new file mode 100644 index 00000000000000..b2bdac35728ca1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline pipeline BertSentenceEmbeddings from alynneoya +author: John Snow Labs +name: sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline` is a English model originally trained by alynneoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline_en_5.5.0_3.0_1727113905335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline_en_5.5.0_3.0_1727113905335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_portuguese_lenerbr_alynneoya_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.5 MB| + +## References + +https://huggingface.co/alynneoya/bert-base-cased-pt-lenerbr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_dutch_cased_finetuned_manx_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_dutch_cased_finetuned_manx_en.md new file mode 100644 index 00000000000000..e28429767e4a52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_dutch_cased_finetuned_manx_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_dutch_cased_finetuned_manx BertSentenceEmbeddings from Pyjay +author: John Snow Labs +name: sent_bert_base_dutch_cased_finetuned_manx +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_dutch_cased_finetuned_manx` is a English model originally trained by Pyjay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_dutch_cased_finetuned_manx_en_5.5.0_3.0_1727109962610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_dutch_cased_finetuned_manx_en_5.5.0_3.0_1727109962610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_dutch_cased_finetuned_manx","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_dutch_cased_finetuned_manx","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_dutch_cased_finetuned_manx| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/Pyjay/bert-base-dutch-cased-finetuned-gv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_german_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_german_cased_pipeline_en.md new file mode 100644 index 00000000000000..175a53b8dbbc4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_german_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_german_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_german_cased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_german_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_german_cased_pipeline_en_5.5.0_3.0_1727091008286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_german_cased_pipeline_en_5.5.0_3.0_1727091008286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_german_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_german_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_german_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.4 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-de-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_japanese_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_japanese_cased_pipeline_en.md new file mode 100644 index 00000000000000..a2322514fb77fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_japanese_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_japanese_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_japanese_cased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_japanese_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_japanese_cased_pipeline_en_5.5.0_3.0_1727104998249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_japanese_cased_pipeline_en_5.5.0_3.0_1727104998249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_japanese_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_japanese_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_japanese_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.9 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ja-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_portuguese_cased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_portuguese_cased_en.md new file mode 100644 index 00000000000000..6b43fb50c702d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_portuguese_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_portuguese_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_portuguese_cased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_portuguese_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_portuguese_cased_en_5.5.0_3.0_1727105325121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_portuguese_cased_en_5.5.0_3.0_1727105325121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_portuguese_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_portuguese_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_portuguese_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|419.2 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-pt-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_turkish_cased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_turkish_cased_en.md new file mode 100644 index 00000000000000..32aeb3073bca39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_turkish_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_turkish_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_turkish_cased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_turkish_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_turkish_cased_en_5.5.0_3.0_1727109881858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_turkish_cased_en_5.5.0_3.0_1727109881858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_turkish_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_turkish_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_turkish_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|410.7 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-tr-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_turkish_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_turkish_cased_pipeline_en.md new file mode 100644 index 00000000000000..626e6fdd6efe40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_turkish_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_turkish_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_turkish_cased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_turkish_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_turkish_cased_pipeline_en_5.5.0_3.0_1727109901413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_turkish_cased_pipeline_en_5.5.0_3.0_1727109901413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_turkish_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_turkish_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_turkish_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.2 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-tr-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v2_finetuned_polylex_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v2_finetuned_polylex_en.md new file mode 100644 index 00000000000000..ef7acea9aa1a52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v2_finetuned_polylex_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_greek_uncased_v2_finetuned_polylex BertSentenceEmbeddings from snousias +author: John Snow Labs +name: sent_bert_base_greek_uncased_v2_finetuned_polylex +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_greek_uncased_v2_finetuned_polylex` is a English model originally trained by snousias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v2_finetuned_polylex_en_5.5.0_3.0_1727113449962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v2_finetuned_polylex_en_5.5.0_3.0_1727113449962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_greek_uncased_v2_finetuned_polylex","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_greek_uncased_v2_finetuned_polylex","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_greek_uncased_v2_finetuned_polylex| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|421.1 MB| + +## References + +https://huggingface.co/snousias/bert-base-greek-uncased-v2-finetuned-polylex \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline_en.md new file mode 100644 index 00000000000000..6047432cb57906 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline pipeline BertSentenceEmbeddings from snousias +author: John Snow Labs +name: sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline` is a English model originally trained by snousias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline_en_5.5.0_3.0_1727113469909.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline_en_5.5.0_3.0_1727113469909.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_greek_uncased_v2_finetuned_polylex_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|421.7 MB| + +## References + +https://huggingface.co/snousias/bert-base-greek-uncased-v2-finetuned-polylex + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_en.md new file mode 100644 index 00000000000000..aafb6bc4e960cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy BertSentenceEmbeddings from polylexmg +author: John Snow Labs +name: sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy` is a English model originally trained by polylexmg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_en_5.5.0_3.0_1727110158116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_en_5.5.0_3.0_1727110158116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|421.1 MB| + +## References + +https://huggingface.co/polylexmg/bert-base-greek-uncased-v6-finetuned-polylex-mg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline_en.md new file mode 100644 index 00000000000000..a4d935d0d29889 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline pipeline BertSentenceEmbeddings from polylexmg +author: John Snow Labs +name: sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline` is a English model originally trained by polylexmg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline_en_5.5.0_3.0_1727110178153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline_en_5.5.0_3.0_1727110178153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_greek_uncased_v6_finetuned_polylex_malagasy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|421.7 MB| + +## References + +https://huggingface.co/polylexmg/bert-base-greek-uncased-v6-finetuned-polylex-mg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_stackoverflow_comments_1m_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_stackoverflow_comments_1m_en.md new file mode 100644 index 00000000000000..1b0e5508739676 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_stackoverflow_comments_1m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_stackoverflow_comments_1m BertSentenceEmbeddings from giganticode +author: John Snow Labs +name: sent_bert_base_stackoverflow_comments_1m +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_stackoverflow_comments_1m` is a English model originally trained by giganticode. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_stackoverflow_comments_1m_en_5.5.0_3.0_1727122964051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_stackoverflow_comments_1m_en_5.5.0_3.0_1727122964051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_stackoverflow_comments_1m","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_stackoverflow_comments_1m","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_stackoverflow_comments_1m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/giganticode/bert-base-StackOverflow-comments_1M \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_turkish_uncased_offensive_mlm_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_turkish_uncased_offensive_mlm_pipeline_tr.md new file mode 100644 index 00000000000000..0f1bd259718628 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_turkish_uncased_offensive_mlm_pipeline_tr.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Turkish sent_bert_base_turkish_uncased_offensive_mlm_pipeline pipeline BertSentenceEmbeddings from Overfit-GM +author: John Snow Labs +name: sent_bert_base_turkish_uncased_offensive_mlm_pipeline +date: 2024-09-23 +tags: [tr, open_source, pipeline, onnx] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_turkish_uncased_offensive_mlm_pipeline` is a Turkish model originally trained by Overfit-GM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_turkish_uncased_offensive_mlm_pipeline_tr_5.5.0_3.0_1727109586947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_turkish_uncased_offensive_mlm_pipeline_tr_5.5.0_3.0_1727109586947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_turkish_uncased_offensive_mlm_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_turkish_uncased_offensive_mlm_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_turkish_uncased_offensive_mlm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|413.0 MB| + +## References + +https://huggingface.co/Overfit-GM/bert-base-turkish-uncased-offensive-mlm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_turkish_uncased_offensive_mlm_tr.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_turkish_uncased_offensive_mlm_tr.md new file mode 100644 index 00000000000000..57b0a491dcffdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_turkish_uncased_offensive_mlm_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish sent_bert_base_turkish_uncased_offensive_mlm BertSentenceEmbeddings from Overfit-GM +author: John Snow Labs +name: sent_bert_base_turkish_uncased_offensive_mlm +date: 2024-09-23 +tags: [tr, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_turkish_uncased_offensive_mlm` is a Turkish model originally trained by Overfit-GM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_turkish_uncased_offensive_mlm_tr_5.5.0_3.0_1727109566445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_turkish_uncased_offensive_mlm_tr_5.5.0_3.0_1727109566445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_turkish_uncased_offensive_mlm","tr") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_turkish_uncased_offensive_mlm","tr") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_turkish_uncased_offensive_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|tr| +|Size:|412.5 MB| + +## References + +https://huggingface.co/Overfit-GM/bert-base-turkish-uncased-offensive-mlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_1802_r2_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_1802_r2_en.md new file mode 100644 index 00000000000000..35fb09eba1e2af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_1802_r2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_1802_r2 BertSentenceEmbeddings from JamesKim +author: John Snow Labs +name: sent_bert_base_uncased_1802_r2 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_1802_r2` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r2_en_5.5.0_3.0_1727123126737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r2_en_5.5.0_3.0_1727123126737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_1802_r2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_1802_r2","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_1802_r2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_1802_r2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_1802_r2_pipeline_en.md new file mode 100644 index 00000000000000..8df58e3d45ece0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_1802_r2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_1802_r2_pipeline pipeline BertSentenceEmbeddings from JamesKim +author: John Snow Labs +name: sent_bert_base_uncased_1802_r2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_1802_r2_pipeline` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r2_pipeline_en_5.5.0_3.0_1727123146014.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r2_pipeline_en_5.5.0_3.0_1727123146014.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_1802_r2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_1802_r2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_1802_r2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_2022_nvidia_test_3_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_2022_nvidia_test_3_en.md new file mode 100644 index 00000000000000..99b8ddc7fde1d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_2022_nvidia_test_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_2022_nvidia_test_3 BertSentenceEmbeddings from philschmid +author: John Snow Labs +name: sent_bert_base_uncased_2022_nvidia_test_3 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_2022_nvidia_test_3` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_nvidia_test_3_en_5.5.0_3.0_1727113940265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_nvidia_test_3_en_5.5.0_3.0_1727113940265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_2022_nvidia_test_3","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_2022_nvidia_test_3","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_2022_nvidia_test_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.1 MB| + +## References + +https://huggingface.co/philschmid/bert-base-uncased-2022-nvidia-test-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_dish_descriptions_128_0_5m_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_dish_descriptions_128_0_5m_en.md new file mode 100644 index 00000000000000..0b8dc1be841312 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_dish_descriptions_128_0_5m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_dish_descriptions_128_0_5m BertSentenceEmbeddings from abhilashawasthi +author: John Snow Labs +name: sent_bert_base_uncased_dish_descriptions_128_0_5m +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_dish_descriptions_128_0_5m` is a English model originally trained by abhilashawasthi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_dish_descriptions_128_0_5m_en_5.5.0_3.0_1727113419882.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_dish_descriptions_128_0_5m_en_5.5.0_3.0_1727113419882.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_dish_descriptions_128_0_5m","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_dish_descriptions_128_0_5m","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_dish_descriptions_128_0_5m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/abhilashawasthi/bert-base-uncased_dish_descriptions_128_0.5M \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline_en.md new file mode 100644 index 00000000000000..4197ab515e7404 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline pipeline BertSentenceEmbeddings from abhilashawasthi +author: John Snow Labs +name: sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline` is a English model originally trained by abhilashawasthi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline_en_5.5.0_3.0_1727113439605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline_en_5.5.0_3.0_1727113439605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_dish_descriptions_128_0_5m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.6 MB| + +## References + +https://huggingface.co/abhilashawasthi/bert-base-uncased_dish_descriptions_128_0.5M + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_duplicate_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_duplicate_en.md new file mode 100644 index 00000000000000..3fb8c6933b0661 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_duplicate_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_duplicate BertSentenceEmbeddings from julien-c +author: John Snow Labs +name: sent_bert_base_uncased_duplicate +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_duplicate` is a English model originally trained by julien-c. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_duplicate_en_5.5.0_3.0_1727105106174.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_duplicate_en_5.5.0_3.0_1727105106174.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_duplicate","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_duplicate","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_duplicate| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/julien-c/bert-base-uncased-duplicate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_imdb_medhabi_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_imdb_medhabi_en.md new file mode 100644 index 00000000000000..f6f4cd62323a3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_imdb_medhabi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_imdb_medhabi BertSentenceEmbeddings from medhabi +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_imdb_medhabi +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_imdb_medhabi` is a English model originally trained by medhabi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_imdb_medhabi_en_5.5.0_3.0_1727113933512.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_imdb_medhabi_en_5.5.0_3.0_1727113933512.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_imdb_medhabi","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_imdb_medhabi","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_imdb_medhabi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/medhabi/bert-base-uncased-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_en.md new file mode 100644 index 00000000000000..7f1b4e3bf833d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies BertSentenceEmbeddings from ietz +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies` is a English model originally trained by ietz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_en_5.5.0_3.0_1727113803766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_en_5.5.0_3.0_1727113803766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ietz/bert-base-uncased-finetuned-jira-hyperledger-issue-titles-and-bodies \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline_en.md new file mode 100644 index 00000000000000..38c8e19d798bac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline pipeline BertSentenceEmbeddings from ietz +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline` is a English model originally trained by ietz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline_en_5.5.0_3.0_1727113822958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline_en_5.5.0_3.0_1727113822958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_jira_hyperledger_issue_titles_and_bodies_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/ietz/bert-base-uncased-finetuned-jira-hyperledger-issue-titles-and-bodies + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_news_1929_1932_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_news_1929_1932_pipeline_en.md new file mode 100644 index 00000000000000..6869ae37e23261 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_news_1929_1932_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_news_1929_1932_pipeline pipeline BertSentenceEmbeddings from sally9805 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_news_1929_1932_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_news_1929_1932_pipeline` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1929_1932_pipeline_en_5.5.0_3.0_1727109854077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1929_1932_pipeline_en_5.5.0_3.0_1727109854077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_news_1929_1932_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_news_1929_1932_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_news_1929_1932_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1929-1932 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_bh8648_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_bh8648_pipeline_en.md new file mode 100644 index 00000000000000..5b261e3f18fc6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_bh8648_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_bh8648_pipeline pipeline BertSentenceEmbeddings from bh8648 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_bh8648_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_bh8648_pipeline` is a English model originally trained by bh8648. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_bh8648_pipeline_en_5.5.0_3.0_1727104995723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_bh8648_pipeline_en_5.5.0_3.0_1727104995723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_bh8648_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_bh8648_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_bh8648_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/bh8648/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_pensuke_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_pensuke_en.md new file mode 100644 index 00000000000000..7739d5b407fdeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_pensuke_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_pensuke BertSentenceEmbeddings from pensuke +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_pensuke +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_pensuke` is a English model originally trained by pensuke. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_pensuke_en_5.5.0_3.0_1727123265957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_pensuke_en_5.5.0_3.0_1727123265957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_pensuke","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_pensuke","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_pensuke| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/pensuke/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_pensuke_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_pensuke_pipeline_en.md new file mode 100644 index 00000000000000..3b08fb8627d860 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_pensuke_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_pensuke_pipeline pipeline BertSentenceEmbeddings from pensuke +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_pensuke_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_pensuke_pipeline` is a English model originally trained by pensuke. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_pensuke_pipeline_en_5.5.0_3.0_1727123285915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_pensuke_pipeline_en_5.5.0_3.0_1727123285915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_pensuke_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_pensuke_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_pensuke_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/pensuke/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_seddiktrk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_seddiktrk_pipeline_en.md new file mode 100644 index 00000000000000..263c8247a6093d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_seddiktrk_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_seddiktrk_pipeline pipeline BertSentenceEmbeddings from seddiktrk +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_seddiktrk_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_seddiktrk_pipeline` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_seddiktrk_pipeline_en_5.5.0_3.0_1727105308520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_seddiktrk_pipeline_en_5.5.0_3.0_1727105308520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_seddiktrk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_seddiktrk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_seddiktrk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/seddiktrk/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_kinyarwanda_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_kinyarwanda_finetuned_en.md new file mode 100644 index 00000000000000..143c39be274309 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_kinyarwanda_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_kinyarwanda_finetuned BertSentenceEmbeddings from RogerB +author: John Snow Labs +name: sent_bert_base_uncased_kinyarwanda_finetuned +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_kinyarwanda_finetuned` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_kinyarwanda_finetuned_en_5.5.0_3.0_1727109798634.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_kinyarwanda_finetuned_en_5.5.0_3.0_1727109798634.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_kinyarwanda_finetuned","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_kinyarwanda_finetuned","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_kinyarwanda_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/RogerB/bert-base-uncased-kinyarwanda-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_kinyarwanda_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_kinyarwanda_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..0eeeb3cf0acf76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_kinyarwanda_finetuned_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_kinyarwanda_finetuned_pipeline pipeline BertSentenceEmbeddings from RogerB +author: John Snow Labs +name: sent_bert_base_uncased_kinyarwanda_finetuned_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_kinyarwanda_finetuned_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_kinyarwanda_finetuned_pipeline_en_5.5.0_3.0_1727109818125.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_kinyarwanda_finetuned_pipeline_en_5.5.0_3.0_1727109818125.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_kinyarwanda_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_kinyarwanda_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_kinyarwanda_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/RogerB/bert-base-uncased-kinyarwanda-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_hinglish_big_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_hinglish_big_en.md new file mode 100644 index 00000000000000..8820f7ecdebe46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_hinglish_big_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_hinglish_big BertSentenceEmbeddings from aditeyabaral +author: John Snow Labs +name: sent_bert_hinglish_big +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_hinglish_big` is a English model originally trained by aditeyabaral. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_hinglish_big_en_5.5.0_3.0_1727109552454.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_hinglish_big_en_5.5.0_3.0_1727109552454.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_hinglish_big","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_hinglish_big","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_hinglish_big| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|249.0 MB| + +## References + +https://huggingface.co/aditeyabaral/bert-hinglish-big \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_hinglish_big_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_hinglish_big_pipeline_en.md new file mode 100644 index 00000000000000..5d27c9860cb391 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_hinglish_big_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_hinglish_big_pipeline pipeline BertSentenceEmbeddings from aditeyabaral +author: John Snow Labs +name: sent_bert_hinglish_big_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_hinglish_big_pipeline` is a English model originally trained by aditeyabaral. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_hinglish_big_pipeline_en_5.5.0_3.0_1727109564269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_hinglish_big_pipeline_en_5.5.0_3.0_1727109564269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_hinglish_big_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_hinglish_big_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_hinglish_big_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aditeyabaral/bert-hinglish-big + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14_en.md new file mode 100644 index 00000000000000..a49b7e1fe10f97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14 BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14_en_5.5.0_3.0_1727122909698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14_en_5.5.0_3.0_1727122909698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_14| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-fast-14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline_en.md new file mode 100644 index 00000000000000..6a9caf0d6d2f89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline pipeline BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline_en_5.5.0_3.0_1727113815073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline_en_5.5.0_3.0_1727113815073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-fast-8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline_en.md new file mode 100644 index 00000000000000..8044371e975b06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline pipeline BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline_en_5.5.0_3.0_1727102339622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline_en_5.5.0_3.0_1727102339622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-fast-9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_mini_domain_adapted_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_mini_domain_adapted_imdb_pipeline_en.md new file mode 100644 index 00000000000000..c325666c914aa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_mini_domain_adapted_imdb_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_mini_domain_adapted_imdb_pipeline pipeline BertSentenceEmbeddings from rasyosef +author: John Snow Labs +name: sent_bert_mini_domain_adapted_imdb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_mini_domain_adapted_imdb_pipeline` is a English model originally trained by rasyosef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_mini_domain_adapted_imdb_pipeline_en_5.5.0_3.0_1727122777219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_mini_domain_adapted_imdb_pipeline_en_5.5.0_3.0_1727122777219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_mini_domain_adapted_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_mini_domain_adapted_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_mini_domain_adapted_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.4 MB| + +## References + +https://huggingface.co/rasyosef/bert-mini-domain-adapted-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretraining_gaudi_2_batch_size_64_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretraining_gaudi_2_batch_size_64_en.md new file mode 100644 index 00000000000000..c4d505b28cd4aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretraining_gaudi_2_batch_size_64_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_pretraining_gaudi_2_batch_size_64 BertSentenceEmbeddings from regisss +author: John Snow Labs +name: sent_bert_pretraining_gaudi_2_batch_size_64 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_pretraining_gaudi_2_batch_size_64` is a English model originally trained by regisss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_pretraining_gaudi_2_batch_size_64_en_5.5.0_3.0_1727122873673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_pretraining_gaudi_2_batch_size_64_en_5.5.0_3.0_1727122873673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_pretraining_gaudi_2_batch_size_64","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_pretraining_gaudi_2_batch_size_64","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_pretraining_gaudi_2_batch_size_64| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.4 MB| + +## References + +https://huggingface.co/regisss/bert-pretraining-gaudi-2-batch-size-64 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretraining_gaudi_2_batch_size_64_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretraining_gaudi_2_batch_size_64_pipeline_en.md new file mode 100644 index 00000000000000..8d57e4eeabb972 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretraining_gaudi_2_batch_size_64_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_pretraining_gaudi_2_batch_size_64_pipeline pipeline BertSentenceEmbeddings from regisss +author: John Snow Labs +name: sent_bert_pretraining_gaudi_2_batch_size_64_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_pretraining_gaudi_2_batch_size_64_pipeline` is a English model originally trained by regisss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_pretraining_gaudi_2_batch_size_64_pipeline_en_5.5.0_3.0_1727122893192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_pretraining_gaudi_2_batch_size_64_pipeline_en_5.5.0_3.0_1727122893192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_pretraining_gaudi_2_batch_size_64_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_pretraining_gaudi_2_batch_size_64_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_pretraining_gaudi_2_batch_size_64_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.9 MB| + +## References + +https://huggingface.co/regisss/bert-pretraining-gaudi-2-batch-size-64 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_small_finetuned_legal_contracts10train10val_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_small_finetuned_legal_contracts10train10val_en.md new file mode 100644 index 00000000000000..0f88a96527b3f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_small_finetuned_legal_contracts10train10val_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_small_finetuned_legal_contracts10train10val BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_bert_small_finetuned_legal_contracts10train10val +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_small_finetuned_legal_contracts10train10val` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_small_finetuned_legal_contracts10train10val_en_5.5.0_3.0_1727110088145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_small_finetuned_legal_contracts10train10val_en_5.5.0_3.0_1727110088145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_small_finetuned_legal_contracts10train10val","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_small_finetuned_legal_contracts10train10val","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_small_finetuned_legal_contracts10train10val| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|107.0 MB| + +## References + +https://huggingface.co/muhtasham/bert-small-finetuned-legal-contracts10train10val \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_small_finetuned_legal_contracts10train10val_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_small_finetuned_legal_contracts10train10val_pipeline_en.md new file mode 100644 index 00000000000000..6635eb2dc78462 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_small_finetuned_legal_contracts10train10val_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_small_finetuned_legal_contracts10train10val_pipeline pipeline BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_bert_small_finetuned_legal_contracts10train10val_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_small_finetuned_legal_contracts10train10val_pipeline` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_small_finetuned_legal_contracts10train10val_pipeline_en_5.5.0_3.0_1727110093220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_small_finetuned_legal_contracts10train10val_pipeline_en_5.5.0_3.0_1727110093220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_small_finetuned_legal_contracts10train10val_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_small_finetuned_legal_contracts10train10val_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_small_finetuned_legal_contracts10train10val_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|107.5 MB| + +## References + +https://huggingface.co/muhtasham/bert-small-finetuned-legal-contracts10train10val + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_tiny_finetuned_legal_definitions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_tiny_finetuned_legal_definitions_pipeline_en.md new file mode 100644 index 00000000000000..25f1457ac4434e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_tiny_finetuned_legal_definitions_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_tiny_finetuned_legal_definitions_pipeline pipeline BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_bert_tiny_finetuned_legal_definitions_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_tiny_finetuned_legal_definitions_pipeline` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_tiny_finetuned_legal_definitions_pipeline_en_5.5.0_3.0_1727113418618.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_tiny_finetuned_legal_definitions_pipeline_en_5.5.0_3.0_1727113418618.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_tiny_finetuned_legal_definitions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_tiny_finetuned_legal_definitions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_tiny_finetuned_legal_definitions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|17.2 MB| + +## References + +https://huggingface.co/muhtasham/bert-tiny-finetuned-legal-definitions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_6_h_128_a_2_cord19_200616_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_6_h_128_a_2_cord19_200616_en.md new file mode 100644 index 00000000000000..183d49bb9ca0ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_6_h_128_a_2_cord19_200616_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_uncased_l_6_h_128_a_2_cord19_200616 BertSentenceEmbeddings from aodiniz +author: John Snow Labs +name: sent_bert_uncased_l_6_h_128_a_2_cord19_200616 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_uncased_l_6_h_128_a_2_cord19_200616` is a English model originally trained by aodiniz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_uncased_l_6_h_128_a_2_cord19_200616_en_5.5.0_3.0_1727122852915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_uncased_l_6_h_128_a_2_cord19_200616_en_5.5.0_3.0_1727122852915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_uncased_l_6_h_128_a_2_cord19_200616","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_uncased_l_6_h_128_a_2_cord19_200616","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_uncased_l_6_h_128_a_2_cord19_200616| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|19.6 MB| + +## References + +https://huggingface.co/aodiniz/bert_uncased_L-6_H-128_A-2_cord19-200616 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bertbase_uyghur_3e_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bertbase_uyghur_3e_en.md new file mode 100644 index 00000000000000..1326deaf12682b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bertbase_uyghur_3e_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bertbase_uyghur_3e BertSentenceEmbeddings from TurkLangsTeamURFU +author: John Snow Labs +name: sent_bertbase_uyghur_3e +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertbase_uyghur_3e` is a English model originally trained by TurkLangsTeamURFU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertbase_uyghur_3e_en_5.5.0_3.0_1727123195451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertbase_uyghur_3e_en_5.5.0_3.0_1727123195451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertbase_uyghur_3e","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertbase_uyghur_3e","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertbase_uyghur_3e| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|492.9 MB| + +## References + +https://huggingface.co/TurkLangsTeamURFU/BertBase_UG_3e \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bertinho_galician_base_cased_gl.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bertinho_galician_base_cased_gl.md new file mode 100644 index 00000000000000..0669df15547b49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bertinho_galician_base_cased_gl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Galician sent_bertinho_galician_base_cased BertSentenceEmbeddings from dvilares +author: John Snow Labs +name: sent_bertinho_galician_base_cased +date: 2024-09-23 +tags: [gl, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertinho_galician_base_cased` is a Galician model originally trained by dvilares. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_base_cased_gl_5.5.0_3.0_1727105071248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_base_cased_gl_5.5.0_3.0_1727105071248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertinho_galician_base_cased","gl") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertinho_galician_base_cased","gl") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertinho_galician_base_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|gl| +|Size:|405.3 MB| + +## References + +https://huggingface.co/dvilares/bertinho-gl-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bertinho_galician_base_cased_pipeline_gl.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bertinho_galician_base_cased_pipeline_gl.md new file mode 100644 index 00000000000000..8103076f01f277 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bertinho_galician_base_cased_pipeline_gl.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Galician sent_bertinho_galician_base_cased_pipeline pipeline BertSentenceEmbeddings from dvilares +author: John Snow Labs +name: sent_bertinho_galician_base_cased_pipeline +date: 2024-09-23 +tags: [gl, open_source, pipeline, onnx] +task: Embeddings +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertinho_galician_base_cased_pipeline` is a Galician model originally trained by dvilares. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_base_cased_pipeline_gl_5.5.0_3.0_1727105091575.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_base_cased_pipeline_gl_5.5.0_3.0_1727105091575.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertinho_galician_base_cased_pipeline", lang = "gl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertinho_galician_base_cased_pipeline", lang = "gl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertinho_galician_base_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gl| +|Size:|405.8 MB| + +## References + +https://huggingface.co/dvilares/bertinho-gl-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bio_mobilebert_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bio_mobilebert_en.md new file mode 100644 index 00000000000000..4ac90ece21c39f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bio_mobilebert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bio_mobilebert BertSentenceEmbeddings from nlpie +author: John Snow Labs +name: sent_bio_mobilebert +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bio_mobilebert` is a English model originally trained by nlpie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bio_mobilebert_en_5.5.0_3.0_1727105328224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bio_mobilebert_en_5.5.0_3.0_1727105328224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bio_mobilebert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bio_mobilebert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bio_mobilebert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/nlpie/bio-mobilebert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_clinical_pubmed_bert_base_128_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_clinical_pubmed_bert_base_128_en.md new file mode 100644 index 00000000000000..65e0528c34838e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_clinical_pubmed_bert_base_128_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_clinical_pubmed_bert_base_128 BertSentenceEmbeddings from Tsubasaz +author: John Snow Labs +name: sent_clinical_pubmed_bert_base_128 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_clinical_pubmed_bert_base_128` is a English model originally trained by Tsubasaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_128_en_5.5.0_3.0_1727101752096.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_128_en_5.5.0_3.0_1727101752096.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_clinical_pubmed_bert_base_128","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_clinical_pubmed_bert_base_128","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_clinical_pubmed_bert_base_128| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.0 MB| + +## References + +https://huggingface.co/Tsubasaz/clinical-pubmed-bert-base-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_clr_pretrained_bert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_clr_pretrained_bert_base_uncased_en.md new file mode 100644 index 00000000000000..840f1c04d0804f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_clr_pretrained_bert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_clr_pretrained_bert_base_uncased BertSentenceEmbeddings from SauravMaheshkar +author: John Snow Labs +name: sent_clr_pretrained_bert_base_uncased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_clr_pretrained_bert_base_uncased` is a English model originally trained by SauravMaheshkar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_clr_pretrained_bert_base_uncased_en_5.5.0_3.0_1727113811922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_clr_pretrained_bert_base_uncased_en_5.5.0_3.0_1727113811922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_clr_pretrained_bert_base_uncased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_clr_pretrained_bert_base_uncased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_clr_pretrained_bert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/SauravMaheshkar/clr-pretrained-bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_clr_pretrained_bert_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_clr_pretrained_bert_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..52ddeb21d50207 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_clr_pretrained_bert_base_uncased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_clr_pretrained_bert_base_uncased_pipeline pipeline BertSentenceEmbeddings from SauravMaheshkar +author: John Snow Labs +name: sent_clr_pretrained_bert_base_uncased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_clr_pretrained_bert_base_uncased_pipeline` is a English model originally trained by SauravMaheshkar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_clr_pretrained_bert_base_uncased_pipeline_en_5.5.0_3.0_1727113830946.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_clr_pretrained_bert_base_uncased_pipeline_en_5.5.0_3.0_1727113830946.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_clr_pretrained_bert_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_clr_pretrained_bert_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_clr_pretrained_bert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/SauravMaheshkar/clr-pretrained-bert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_danish_legal_bert_base_da.md b/docs/_posts/ahmedlone127/2024-09-23-sent_danish_legal_bert_base_da.md new file mode 100644 index 00000000000000..b2189163b2ed27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_danish_legal_bert_base_da.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Danish sent_danish_legal_bert_base BertSentenceEmbeddings from coastalcph +author: John Snow Labs +name: sent_danish_legal_bert_base +date: 2024-09-23 +tags: [da, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_danish_legal_bert_base` is a Danish model originally trained by coastalcph. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_danish_legal_bert_base_da_5.5.0_3.0_1727123277252.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_danish_legal_bert_base_da_5.5.0_3.0_1727123277252.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_danish_legal_bert_base","da") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_danish_legal_bert_base","da") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_danish_legal_bert_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|da| +|Size:|411.6 MB| + +## References + +https://huggingface.co/coastalcph/danish-legal-bert-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_defsent_bert_base_uncased_max_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_defsent_bert_base_uncased_max_en.md new file mode 100644 index 00000000000000..e4319171a03fe5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_defsent_bert_base_uncased_max_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_defsent_bert_base_uncased_max BertSentenceEmbeddings from cl-nagoya +author: John Snow Labs +name: sent_defsent_bert_base_uncased_max +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_defsent_bert_base_uncased_max` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_max_en_5.5.0_3.0_1727101689855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_max_en_5.5.0_3.0_1727101689855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_defsent_bert_base_uncased_max","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_defsent_bert_base_uncased_max","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_defsent_bert_base_uncased_max| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-bert-base-uncased-max \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_gujarati_bert_gu.md b/docs/_posts/ahmedlone127/2024-09-23-sent_gujarati_bert_gu.md new file mode 100644 index 00000000000000..7bb3bf0fe5192d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_gujarati_bert_gu.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Gujarati sent_gujarati_bert BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_gujarati_bert +date: 2024-09-23 +tags: [gu, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: gu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_gujarati_bert` is a Gujarati model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_gujarati_bert_gu_5.5.0_3.0_1727101739126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_gujarati_bert_gu_5.5.0_3.0_1727101739126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_gujarati_bert","gu") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_gujarati_bert","gu") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_gujarati_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|gu| +|Size:|890.5 MB| + +## References + +https://huggingface.co/l3cube-pune/gujarati-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_mobilebert_sanskrit_saskta_pre_training_complete_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_mobilebert_sanskrit_saskta_pre_training_complete_en.md new file mode 100644 index 00000000000000..f57757f8bad17c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_mobilebert_sanskrit_saskta_pre_training_complete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_mobilebert_sanskrit_saskta_pre_training_complete BertSentenceEmbeddings from gokuls +author: John Snow Labs +name: sent_mobilebert_sanskrit_saskta_pre_training_complete +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mobilebert_sanskrit_saskta_pre_training_complete` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mobilebert_sanskrit_saskta_pre_training_complete_en_5.5.0_3.0_1727105588822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mobilebert_sanskrit_saskta_pre_training_complete_en_5.5.0_3.0_1727105588822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_mobilebert_sanskrit_saskta_pre_training_complete","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_mobilebert_sanskrit_saskta_pre_training_complete","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mobilebert_sanskrit_saskta_pre_training_complete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_sa_pre-training-complete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_roboust_nlp_xlmr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_roboust_nlp_xlmr_pipeline_en.md new file mode 100644 index 00000000000000..709d66a9348a9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_roboust_nlp_xlmr_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_roboust_nlp_xlmr_pipeline pipeline XlmRoBertaSentenceEmbeddings from Blue7Bird +author: John Snow Labs +name: sent_roboust_nlp_xlmr_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_roboust_nlp_xlmr_pipeline` is a English model originally trained by Blue7Bird. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_roboust_nlp_xlmr_pipeline_en_5.5.0_3.0_1727062803456.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_roboust_nlp_xlmr_pipeline_en_5.5.0_3.0_1727062803456.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_roboust_nlp_xlmr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_roboust_nlp_xlmr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_roboust_nlp_xlmr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Blue7Bird/Roboust_nlp_xlmr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_test_bert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_test_bert_base_uncased_en.md new file mode 100644 index 00000000000000..37382acfb8e303 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_test_bert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_test_bert_base_uncased BertSentenceEmbeddings from kkkzzzkkk +author: John Snow Labs +name: sent_test_bert_base_uncased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_test_bert_base_uncased` is a English model originally trained by kkkzzzkkk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_test_bert_base_uncased_en_5.5.0_3.0_1727123025663.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_test_bert_base_uncased_en_5.5.0_3.0_1727123025663.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_test_bert_base_uncased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_test_bert_base_uncased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_test_bert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/kkkzzzkkk/test_bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_tiny_mlm_glue_mrpc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_tiny_mlm_glue_mrpc_pipeline_en.md new file mode 100644 index 00000000000000..d6244a9bff55ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_tiny_mlm_glue_mrpc_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_tiny_mlm_glue_mrpc_pipeline pipeline BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_tiny_mlm_glue_mrpc_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_tiny_mlm_glue_mrpc_pipeline` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_tiny_mlm_glue_mrpc_pipeline_en_5.5.0_3.0_1727105588548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_tiny_mlm_glue_mrpc_pipeline_en_5.5.0_3.0_1727105588548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_tiny_mlm_glue_mrpc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_tiny_mlm_glue_mrpc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_tiny_mlm_glue_mrpc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|17.2 MB| + +## References + +https://huggingface.co/muhtasham/tiny-mlm-glue-mrpc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_twitch_bert_base_cased_pytorch_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_twitch_bert_base_cased_pytorch_en.md new file mode 100644 index 00000000000000..96fcfc50384550 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_twitch_bert_base_cased_pytorch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_twitch_bert_base_cased_pytorch BertSentenceEmbeddings from veb +author: John Snow Labs +name: sent_twitch_bert_base_cased_pytorch +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_twitch_bert_base_cased_pytorch` is a English model originally trained by veb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_twitch_bert_base_cased_pytorch_en_5.5.0_3.0_1727113974443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_twitch_bert_base_cased_pytorch_en_5.5.0_3.0_1727113974443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_twitch_bert_base_cased_pytorch","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_twitch_bert_base_cased_pytorch","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_twitch_bert_base_cased_pytorch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/veb/twitch-bert-base-cased-pytorch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_team_28_a01794830_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_team_28_a01794830_en.md new file mode 100644 index 00000000000000..c9283df5ab0dc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_team_28_a01794830_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_model_team_28_a01794830 DistilBertForSequenceClassification from a01794830 +author: John Snow Labs +name: sentiment_analysis_model_team_28_a01794830 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_team_28_a01794830` is a English model originally trained by a01794830. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_team_28_a01794830_en_5.5.0_3.0_1727094145191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_team_28_a01794830_en_5.5.0_3.0_1727094145191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_team_28_a01794830","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_team_28_a01794830", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_team_28_a01794830| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/a01794830/sentiment-analysis-model-team-28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-somd_xlm_3stage_stage0_pre_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-somd_xlm_3stage_stage0_pre_v1_pipeline_en.md new file mode 100644 index 00000000000000..4baaaf76e14cfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-somd_xlm_3stage_stage0_pre_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English somd_xlm_3stage_stage0_pre_v1_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: somd_xlm_3stage_stage0_pre_v1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`somd_xlm_3stage_stage0_pre_v1_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/somd_xlm_3stage_stage0_pre_v1_pipeline_en_5.5.0_3.0_1727126203217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/somd_xlm_3stage_stage0_pre_v1_pipeline_en_5.5.0_3.0_1727126203217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("somd_xlm_3stage_stage0_pre_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("somd_xlm_3stage_stage0_pre_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|somd_xlm_3stage_stage0_pre_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|791.4 MB| + +## References + +https://huggingface.co/ThuyNT03/SOMD-xlm-3stage-stage0-pre-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline_en.md new file mode 100644 index 00000000000000..0767d5f20e94f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline_en_5.5.0_3.0_1727059135535.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline_en_5.5.0_3.0_1727059135535.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-20-2024-07-26_12-23-45 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-suicide_bert_en.md b/docs/_posts/ahmedlone127/2024-09-23-suicide_bert_en.md new file mode 100644 index 00000000000000..fdbc5fdecd657d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-suicide_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English suicide_bert RoBertaForSequenceClassification from vishalp23 +author: John Snow Labs +name: suicide_bert +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_bert` is a English model originally trained by vishalp23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_bert_en_5.5.0_3.0_1727085371437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_bert_en_5.5.0_3.0_1727085371437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("suicide_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("suicide_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.5 MB| + +## References + +https://huggingface.co/vishalp23/suicide-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-suicide_distilbert_6_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-suicide_distilbert_6_5_pipeline_en.md new file mode 100644 index 00000000000000..0659e5ae3d6768 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-suicide_distilbert_6_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English suicide_distilbert_6_5_pipeline pipeline DistilBertForSequenceClassification from cuadron11 +author: John Snow Labs +name: suicide_distilbert_6_5_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_distilbert_6_5_pipeline` is a English model originally trained by cuadron11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_distilbert_6_5_pipeline_en_5.5.0_3.0_1727073786779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_distilbert_6_5_pipeline_en_5.5.0_3.0_1727073786779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("suicide_distilbert_6_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("suicide_distilbert_6_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_distilbert_6_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cuadron11/suicide-distilbert-6-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-tamilroberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-tamilroberta_en.md new file mode 100644 index 00000000000000..ff8151b3ca9772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-tamilroberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tamilroberta RoBertaEmbeddings from apkbala107 +author: John Snow Labs +name: tamilroberta +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tamilroberta` is a English model originally trained by apkbala107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tamilroberta_en_5.5.0_3.0_1727121707582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tamilroberta_en_5.5.0_3.0_1727121707582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("tamilroberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("tamilroberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tamilroberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.2 MB| + +## References + +https://huggingface.co/apkbala107/tamilroberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-tamilroberto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-tamilroberto_pipeline_en.md new file mode 100644 index 00000000000000..c58ea078995535 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-tamilroberto_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tamilroberto_pipeline pipeline RoBertaEmbeddings from apkbala107 +author: John Snow Labs +name: tamilroberto_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tamilroberto_pipeline` is a English model originally trained by apkbala107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tamilroberto_pipeline_en_5.5.0_3.0_1727056915857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tamilroberto_pipeline_en_5.5.0_3.0_1727056915857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tamilroberto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tamilroberto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tamilroberto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.1 MB| + +## References + +https://huggingface.co/apkbala107/tamilroberto + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-text_classification_model_elijahriley_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-text_classification_model_elijahriley_pipeline_en.md new file mode 100644 index 00000000000000..430f3638ee79ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-text_classification_model_elijahriley_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_classification_model_elijahriley_pipeline pipeline DistilBertForSequenceClassification from elijahriley +author: John Snow Labs +name: text_classification_model_elijahriley_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_model_elijahriley_pipeline` is a English model originally trained by elijahriley. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_model_elijahriley_pipeline_en_5.5.0_3.0_1727073748733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_model_elijahriley_pipeline_en_5.5.0_3.0_1727073748733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_classification_model_elijahriley_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_classification_model_elijahriley_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_model_elijahriley_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/elijahriley/text_classification_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-uztext_568mb_roberta_bpe_en.md b/docs/_posts/ahmedlone127/2024-09-23-uztext_568mb_roberta_bpe_en.md new file mode 100644 index 00000000000000..2a8371eaa64dea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-uztext_568mb_roberta_bpe_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English uztext_568mb_roberta_bpe RoBertaEmbeddings from rifkat +author: John Snow Labs +name: uztext_568mb_roberta_bpe +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uztext_568mb_roberta_bpe` is a English model originally trained by rifkat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uztext_568mb_roberta_bpe_en_5.5.0_3.0_1727121549174.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uztext_568mb_roberta_bpe_en_5.5.0_3.0_1727121549174.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("uztext_568mb_roberta_bpe","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("uztext_568mb_roberta_bpe","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uztext_568mb_roberta_bpe| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.9 MB| + +## References + +https://huggingface.co/rifkat/uztext_568Mb_Roberta_BPE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-viz_wiz_bert_base_uncased_f16_en.md b/docs/_posts/ahmedlone127/2024-09-23-viz_wiz_bert_base_uncased_f16_en.md new file mode 100644 index 00000000000000..30d62a5f02ab79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-viz_wiz_bert_base_uncased_f16_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English viz_wiz_bert_base_uncased_f16 BertEmbeddings from eisenjulian +author: John Snow Labs +name: viz_wiz_bert_base_uncased_f16 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`viz_wiz_bert_base_uncased_f16` is a English model originally trained by eisenjulian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/viz_wiz_bert_base_uncased_f16_en_5.5.0_3.0_1727107587086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/viz_wiz_bert_base_uncased_f16_en_5.5.0_3.0_1727107587086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("viz_wiz_bert_base_uncased_f16","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("viz_wiz_bert_base_uncased_f16","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|viz_wiz_bert_base_uncased_f16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/eisenjulian/viz-wiz-bert-base-uncased_f16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whipser_small_r2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whipser_small_r2_pipeline_en.md new file mode 100644 index 00000000000000..ffda878398ec3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whipser_small_r2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whipser_small_r2_pipeline pipeline WhisperForCTC from spsither +author: John Snow Labs +name: whipser_small_r2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whipser_small_r2_pipeline` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whipser_small_r2_pipeline_en_5.5.0_3.0_1727054035387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whipser_small_r2_pipeline_en_5.5.0_3.0_1727054035387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whipser_small_r2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whipser_small_r2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whipser_small_r2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/spsither/whipser-small-r2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_6e_4_clean_legion_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_6e_4_clean_legion_v2_pipeline_en.md new file mode 100644 index 00000000000000..81529e9603363a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_6e_4_clean_legion_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_6e_4_clean_legion_v2_pipeline pipeline WhisperForCTC from yusufagung29 +author: John Snow Labs +name: whisper_6e_4_clean_legion_v2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_6e_4_clean_legion_v2_pipeline` is a English model originally trained by yusufagung29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_6e_4_clean_legion_v2_pipeline_en_5.5.0_3.0_1727076325115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_6e_4_clean_legion_v2_pipeline_en_5.5.0_3.0_1727076325115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_6e_4_clean_legion_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_6e_4_clean_legion_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_6e_4_clean_legion_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/yusufagung29/whisper_6e-4_clean_legion_v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_ai_nomi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_ai_nomi_pipeline_en.md new file mode 100644 index 00000000000000..bc3ba68c52fe72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_ai_nomi_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_ai_nomi_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_ai_nomi_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_ai_nomi_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_ai_nomi_pipeline_en_5.5.0_3.0_1727117563862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_ai_nomi_pipeline_en_5.5.0_3.0_1727117563862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_ai_nomi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_ai_nomi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_ai_nomi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-ai-nomi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_pashto_ihanif_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_pashto_ihanif_en.md new file mode 100644 index 00000000000000..df4d122b5cacd8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_pashto_ihanif_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_pashto_ihanif WhisperForCTC from ihanif +author: John Snow Labs +name: whisper_base_pashto_ihanif +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_pashto_ihanif` is a English model originally trained by ihanif. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_en_5.5.0_3.0_1727050761013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_en_5.5.0_3.0_1727050761013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_pashto_ihanif","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_pashto_ihanif", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_pashto_ihanif| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.8 MB| + +## References + +https://huggingface.co/ihanif/whisper-base-pashto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_pashto_ihanif_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_pashto_ihanif_pipeline_en.md new file mode 100644 index 00000000000000..3d445b31501cf8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_pashto_ihanif_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_pashto_ihanif_pipeline pipeline WhisperForCTC from ihanif +author: John Snow Labs +name: whisper_base_pashto_ihanif_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_pashto_ihanif_pipeline` is a English model originally trained by ihanif. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_pipeline_en_5.5.0_3.0_1727050794305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_pipeline_en_5.5.0_3.0_1727050794305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_pashto_ihanif_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_pashto_ihanif_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_pashto_ihanif_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|643.9 MB| + +## References + +https://huggingface.co/ihanif/whisper-base-pashto + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_thai_der_1_pipeline_th.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_thai_der_1_pipeline_th.md new file mode 100644 index 00000000000000..c7fa898311ade9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_thai_der_1_pipeline_th.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Thai whisper_base_thai_der_1_pipeline pipeline WhisperForCTC from arun100 +author: John Snow Labs +name: whisper_base_thai_der_1_pipeline +date: 2024-09-23 +tags: [th, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: th +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_thai_der_1_pipeline` is a Thai model originally trained by arun100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_thai_der_1_pipeline_th_5.5.0_3.0_1727077846599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_thai_der_1_pipeline_th_5.5.0_3.0_1727077846599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_thai_der_1_pipeline", lang = "th") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_thai_der_1_pipeline", lang = "th") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_thai_der_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|th| +|Size:|642.6 MB| + +## References + +https://huggingface.co/arun100/whisper-base-thai-der-1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_v3_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_v3_en.md new file mode 100644 index 00000000000000..e8bbd9c471229b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_v3_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_v3 WhisperForCTC from raiyan007 +author: John Snow Labs +name: whisper_base_v3 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_v3` is a English model originally trained by raiyan007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_v3_en_5.5.0_3.0_1727117972210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_v3_en_5.5.0_3.0_1727117972210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_v3","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_v3", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.8 MB| + +## References + +https://huggingface.co/raiyan007/whisper-base-v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_arabic_original_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_arabic_original_en.md new file mode 100644 index 00000000000000..dfdeecc7608b35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_arabic_original_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_arabic_original WhisperForCTC from aghannam +author: John Snow Labs +name: whisper_medium_arabic_original +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_arabic_original` is a English model originally trained by aghannam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_arabic_original_en_5.5.0_3.0_1727117875987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_arabic_original_en_5.5.0_3.0_1727117875987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_arabic_original","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_arabic_original", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_arabic_original| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/aghannam/whisper-medium-ar-original \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline_en.md new file mode 100644 index 00000000000000..9598d993d0fa50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline pipeline WhisperForCTC from fsicoli +author: John Snow Labs +name: whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline` is a English model originally trained by fsicoli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline_en_5.5.0_3.0_1727080163618.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline_en_5.5.0_3.0_1727080163618.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_portuguese_cv16_fleurs2_lr_wu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/fsicoli/whisper-medium-pt-cv16-fleurs2-lr-wu + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_sango_50_2_30_part2_30_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_sango_50_2_30_part2_30_2_en.md new file mode 100644 index 00000000000000..bc3313cfd2ac48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_sango_50_2_30_part2_30_2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_sango_50_2_30_part2_30_2 WhisperForCTC from eighty88 +author: John Snow Labs +name: whisper_medium_sango_50_2_30_part2_30_2 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_sango_50_2_30_part2_30_2` is a English model originally trained by eighty88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_sango_50_2_30_part2_30_2_en_5.5.0_3.0_1727119418541.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_sango_50_2_30_part2_30_2_en_5.5.0_3.0_1727119418541.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_sango_50_2_30_part2_30_2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_sango_50_2_30_part2_30_2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_sango_50_2_30_part2_30_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/eighty88/whisper-medium-sg-50-2-30-part2-30-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_bangla_bn.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_bangla_bn.md new file mode 100644 index 00000000000000..c66b7e8071fd69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_bangla_bn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Bengali whisper_small_bangla WhisperForCTC from ashrafulparan +author: John Snow Labs +name: whisper_small_bangla +date: 2024-09-23 +tags: [bn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bangla` is a Bengali model originally trained by ashrafulparan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bangla_bn_5.5.0_3.0_1727051052841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bangla_bn_5.5.0_3.0_1727051052841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_bangla","bn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_bangla", "bn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bangla| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ashrafulparan/whisper-small-bangla \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_child50k_timestretch_steplr_ko.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_child50k_timestretch_steplr_ko.md new file mode 100644 index 00000000000000..d46d05438f0d99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_child50k_timestretch_steplr_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean whisper_small_child50k_timestretch_steplr WhisperForCTC from haseong8012 +author: John Snow Labs +name: whisper_small_child50k_timestretch_steplr +date: 2024-09-23 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_child50k_timestretch_steplr` is a Korean model originally trained by haseong8012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_child50k_timestretch_steplr_ko_5.5.0_3.0_1727052144778.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_child50k_timestretch_steplr_ko_5.5.0_3.0_1727052144778.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_child50k_timestretch_steplr","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_child50k_timestretch_steplr", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_child50k_timestretch_steplr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/haseong8012/whisper-small_child50K_timestretch_stepLR \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chinese_cn_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chinese_cn_pipeline_zh.md new file mode 100644 index 00000000000000..5ee5d7aa8254c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chinese_cn_pipeline_zh.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Chinese whisper_small_chinese_cn_pipeline pipeline WhisperForCTC from JunSir +author: John Snow Labs +name: whisper_small_chinese_cn_pipeline +date: 2024-09-23 +tags: [zh, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chinese_cn_pipeline` is a Chinese model originally trained by JunSir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_cn_pipeline_zh_5.5.0_3.0_1727053166922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_cn_pipeline_zh_5.5.0_3.0_1727053166922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_chinese_cn_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_chinese_cn_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chinese_cn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|1.7 GB| + +## References + +https://huggingface.co/JunSir/whisper-small-zh-CN + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chuvash_43_freeze_encoder_hi.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chuvash_43_freeze_encoder_hi.md new file mode 100644 index 00000000000000..6c3cdd3fb48167 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chuvash_43_freeze_encoder_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_chuvash_43_freeze_encoder WhisperForCTC from alikanakar +author: John Snow Labs +name: whisper_small_chuvash_43_freeze_encoder +date: 2024-09-23 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chuvash_43_freeze_encoder` is a Hindi model originally trained by alikanakar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chuvash_43_freeze_encoder_hi_5.5.0_3.0_1727117372970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chuvash_43_freeze_encoder_hi_5.5.0_3.0_1727117372970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_chuvash_43_freeze_encoder","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_chuvash_43_freeze_encoder", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chuvash_43_freeze_encoder| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/alikanakar/whisper-small-CV-43-freeze-encoder \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chuvash_43_freeze_encoder_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chuvash_43_freeze_encoder_pipeline_hi.md new file mode 100644 index 00000000000000..62b57dd04e107d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chuvash_43_freeze_encoder_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_chuvash_43_freeze_encoder_pipeline pipeline WhisperForCTC from alikanakar +author: John Snow Labs +name: whisper_small_chuvash_43_freeze_encoder_pipeline +date: 2024-09-23 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chuvash_43_freeze_encoder_pipeline` is a Hindi model originally trained by alikanakar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chuvash_43_freeze_encoder_pipeline_hi_5.5.0_3.0_1727117456078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chuvash_43_freeze_encoder_pipeline_hi_5.5.0_3.0_1727117456078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_chuvash_43_freeze_encoder_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_chuvash_43_freeze_encoder_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chuvash_43_freeze_encoder_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/alikanakar/whisper-small-CV-43-freeze-encoder + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_divehi_shahukareem_dv.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_divehi_shahukareem_dv.md new file mode 100644 index 00000000000000..af3b887395b63a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_divehi_shahukareem_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_shahukareem WhisperForCTC from shahukareem +author: John Snow Labs +name: whisper_small_divehi_shahukareem +date: 2024-09-23 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_shahukareem` is a Dhivehi, Divehi, Maldivian model originally trained by shahukareem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_shahukareem_dv_5.5.0_3.0_1727117023387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_shahukareem_dv_5.5.0_3.0_1727117023387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_shahukareem","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_shahukareem", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_shahukareem| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/shahukareem/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_dutch_vl_nl.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_dutch_vl_nl.md new file mode 100644 index 00000000000000..4274befb2deeb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_dutch_vl_nl.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dutch, Flemish whisper_small_dutch_vl WhisperForCTC from fibleep +author: John Snow Labs +name: whisper_small_dutch_vl +date: 2024-09-23 +tags: [nl, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_dutch_vl` is a Dutch, Flemish model originally trained by fibleep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_dutch_vl_nl_5.5.0_3.0_1727116375931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_dutch_vl_nl_5.5.0_3.0_1727116375931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_dutch_vl","nl") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_dutch_vl", "nl") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_dutch_vl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|nl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/fibleep/whisper-small-nl-vl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_11_mn.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_11_mn.md new file mode 100644 index 00000000000000..b73a92e6225edc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_11_mn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Mongolian whisper_small_mongolian_11 WhisperForCTC from bayartsogt +author: John Snow Labs +name: whisper_small_mongolian_11 +date: 2024-09-23 +tags: [mn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mongolian_11` is a Mongolian model originally trained by bayartsogt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_11_mn_5.5.0_3.0_1727053943998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_11_mn_5.5.0_3.0_1727053943998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_mongolian_11","mn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_mongolian_11", "mn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mongolian_11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bayartsogt/whisper-small-mn-11 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_romanian_yehoward_pipeline_ro.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_romanian_yehoward_pipeline_ro.md new file mode 100644 index 00000000000000..6a07678ff76a14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_romanian_yehoward_pipeline_ro.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Moldavian, Moldovan, Romanian whisper_small_romanian_yehoward_pipeline pipeline WhisperForCTC from Yehoward +author: John Snow Labs +name: whisper_small_romanian_yehoward_pipeline +date: 2024-09-23 +tags: [ro, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ro +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_romanian_yehoward_pipeline` is a Moldavian, Moldovan, Romanian model originally trained by Yehoward. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_romanian_yehoward_pipeline_ro_5.5.0_3.0_1727079098548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_romanian_yehoward_pipeline_ro_5.5.0_3.0_1727079098548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_romanian_yehoward_pipeline", lang = "ro") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_romanian_yehoward_pipeline", lang = "ro") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_romanian_yehoward_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ro| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Yehoward/whisper-small-ro + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_sudanese_dialect_tiny_ayman_kagglee_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_sudanese_dialect_tiny_ayman_kagglee_en.md new file mode 100644 index 00000000000000..2ac9c0a23c4009 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_sudanese_dialect_tiny_ayman_kagglee_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_sudanese_dialect_tiny_ayman_kagglee WhisperForCTC from AymanMansour +author: John Snow Labs +name: whisper_sudanese_dialect_tiny_ayman_kagglee +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_sudanese_dialect_tiny_ayman_kagglee` is a English model originally trained by AymanMansour. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_sudanese_dialect_tiny_ayman_kagglee_en_5.5.0_3.0_1727117364896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_sudanese_dialect_tiny_ayman_kagglee_en_5.5.0_3.0_1727117364896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_sudanese_dialect_tiny_ayman_kagglee","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_sudanese_dialect_tiny_ayman_kagglee", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_sudanese_dialect_tiny_ayman_kagglee| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.7 MB| + +## References + +https://huggingface.co/AymanMansour/Whisper-Sudanese-Dialect-tiny-ayman-kagglee \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_chinese_cn_lr4_3600_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_chinese_cn_lr4_3600_pipeline_zh.md new file mode 100644 index 00000000000000..f0928e31a743f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_chinese_cn_lr4_3600_pipeline_zh.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Chinese whisper_tiny_chinese_cn_lr4_3600_pipeline pipeline WhisperForCTC from VingeNie +author: John Snow Labs +name: whisper_tiny_chinese_cn_lr4_3600_pipeline +date: 2024-09-23 +tags: [zh, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_chinese_cn_lr4_3600_pipeline` is a Chinese model originally trained by VingeNie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_cn_lr4_3600_pipeline_zh_5.5.0_3.0_1727117802346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_cn_lr4_3600_pipeline_zh_5.5.0_3.0_1727117802346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_chinese_cn_lr4_3600_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_chinese_cn_lr4_3600_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_chinese_cn_lr4_3600_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|389.2 MB| + +## References + +https://huggingface.co/VingeNie/whisper-tiny-zh_CN_lr4_3600 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_chinese_cn_lr4_3600_zh.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_chinese_cn_lr4_3600_zh.md new file mode 100644 index 00000000000000..db56394993a27c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_chinese_cn_lr4_3600_zh.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Chinese whisper_tiny_chinese_cn_lr4_3600 WhisperForCTC from VingeNie +author: John Snow Labs +name: whisper_tiny_chinese_cn_lr4_3600 +date: 2024-09-23 +tags: [zh, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_chinese_cn_lr4_3600` is a Chinese model originally trained by VingeNie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_cn_lr4_3600_zh_5.5.0_3.0_1727117779492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_chinese_cn_lr4_3600_zh_5.5.0_3.0_1727117779492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_chinese_cn_lr4_3600","zh") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_chinese_cn_lr4_3600", "zh") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_chinese_cn_lr4_3600| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|zh| +|Size:|389.2 MB| + +## References + +https://huggingface.co/VingeNie/whisper-tiny-zh_CN_lr4_3600 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hebrew_modern_2_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hebrew_modern_2_pipeline_he.md new file mode 100644 index 00000000000000..1b37d62c91beef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hebrew_modern_2_pipeline_he.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hebrew whisper_tiny_hebrew_modern_2_pipeline pipeline WhisperForCTC from NS-Y +author: John Snow Labs +name: whisper_tiny_hebrew_modern_2_pipeline +date: 2024-09-23 +tags: [he, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hebrew_modern_2_pipeline` is a Hebrew model originally trained by NS-Y. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hebrew_modern_2_pipeline_he_5.5.0_3.0_1727117938886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hebrew_modern_2_pipeline_he_5.5.0_3.0_1727117938886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_hebrew_modern_2_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_hebrew_modern_2_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hebrew_modern_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|242.9 MB| + +## References + +https://huggingface.co/NS-Y/whisper-tiny-he-2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hindi_alexao_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hindi_alexao_en.md new file mode 100644 index 00000000000000..9cef0d3c4fb48a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hindi_alexao_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_hindi_alexao WhisperForCTC from Alexao +author: John Snow Labs +name: whisper_tiny_hindi_alexao +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hindi_alexao` is a English model originally trained by Alexao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_alexao_en_5.5.0_3.0_1727118439012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_alexao_en_5.5.0_3.0_1727118439012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_hindi_alexao","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_hindi_alexao", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hindi_alexao| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.8 MB| + +## References + +https://huggingface.co/Alexao/whisper-tiny-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hindi_alexao_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hindi_alexao_pipeline_en.md new file mode 100644 index 00000000000000..eff27d107c1179 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hindi_alexao_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_hindi_alexao_pipeline pipeline WhisperForCTC from Alexao +author: John Snow Labs +name: whisper_tiny_hindi_alexao_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hindi_alexao_pipeline` is a English model originally trained by Alexao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_alexao_pipeline_en_5.5.0_3.0_1727118460972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_alexao_pipeline_en_5.5.0_3.0_1727118460972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_hindi_alexao_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_hindi_alexao_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hindi_alexao_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.8 MB| + +## References + +https://huggingface.co/Alexao/whisper-tiny-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_sjdata_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_sjdata_en.md new file mode 100644 index 00000000000000..26074bbb0ff2e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_sjdata_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_sjdata WhisperForCTC from sjdata +author: John Snow Labs +name: whisper_tiny_minds14_sjdata +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_sjdata` is a English model originally trained by sjdata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_sjdata_en_5.5.0_3.0_1727076889654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_sjdata_en_5.5.0_3.0_1727076889654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_sjdata","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_sjdata", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_sjdata| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/sjdata/whisper-tiny-minds14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline_en.md new file mode 100644 index 00000000000000..455302d3cd4799 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline pipeline WhisperForCTC from sgonzalezsilot +author: John Snow Labs +name: whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline` is a English model originally trained by sgonzalezsilot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline_en_5.5.0_3.0_1727117390711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline_en_5.5.0_3.0_1727117390711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.6 MB| + +## References + +https://huggingface.co/sgonzalezsilot/whisper-tiny-spanish-es-Nemo_unified_2024-06-26_09-12-11 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whispercheckpoints3_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-23-whispercheckpoints3_pipeline_sv.md new file mode 100644 index 00000000000000..2324a2ad3413b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whispercheckpoints3_pipeline_sv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Swedish whispercheckpoints3_pipeline pipeline WhisperForCTC from Yulle +author: John Snow Labs +name: whispercheckpoints3_pipeline +date: 2024-09-23 +tags: [sv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whispercheckpoints3_pipeline` is a Swedish model originally trained by Yulle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whispercheckpoints3_pipeline_sv_5.5.0_3.0_1727053110214.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whispercheckpoints3_pipeline_sv_5.5.0_3.0_1727053110214.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whispercheckpoints3_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whispercheckpoints3_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whispercheckpoints3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Yulle/WhisperCheckpoints3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline_en.md new file mode 100644 index 00000000000000..0209979d5b3c17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline pipeline XlmRoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline_en_5.5.0_3.0_1727099820980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline_en_5.5.0_3.0_1727099820980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kin-sent2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_ligerre_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_ligerre_pipeline_en.md new file mode 100644 index 00000000000000..674ece68453120 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_ligerre_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ligerre_pipeline pipeline XlmRoBertaForTokenClassification from ligerre +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ligerre_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ligerre_pipeline` is a English model originally trained by ligerre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ligerre_pipeline_en_5.5.0_3.0_1727062144173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ligerre_pipeline_en_5.5.0_3.0_1727062144173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ligerre_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ligerre_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ligerre_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.1 MB| + +## References + +https://huggingface.co/ligerre/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_arabic_zaina01_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_arabic_zaina01_en.md new file mode 100644 index 00000000000000..d85d4e00e14127 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_arabic_zaina01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_zaina01 XlmRoBertaForTokenClassification from zaina01 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_zaina01 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_zaina01` is a English model originally trained by zaina01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaina01_en_5.5.0_3.0_1727132561127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaina01_en_5.5.0_3.0_1727132561127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_arabic_zaina01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_arabic_zaina01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_zaina01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.6 MB| + +## References + +https://huggingface.co/zaina01/xlm-roberta-base-finetuned-panx-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline_en.md new file mode 100644 index 00000000000000..7715cfbe0f311b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline pipeline XlmRoBertaForTokenClassification from zaina01 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline` is a English model originally trained by zaina01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline_en_5.5.0_3.0_1727132633542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline_en_5.5.0_3.0_1727132633542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_zaina01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.7 MB| + +## References + +https://huggingface.co/zaina01/xlm-roberta-base-finetuned-panx-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_english_juhyun76_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_english_juhyun76_en.md new file mode 100644 index 00000000000000..c0b9f4a6b62ad2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_english_juhyun76_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_juhyun76 XlmRoBertaForTokenClassification from juhyun76 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_juhyun76 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_juhyun76` is a English model originally trained by juhyun76. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_juhyun76_en_5.5.0_3.0_1727132867349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_juhyun76_en_5.5.0_3.0_1727132867349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_juhyun76","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_juhyun76", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_juhyun76| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.7 MB| + +## References + +https://huggingface.co/juhyun76/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline_en.md new file mode 100644 index 00000000000000..448cc027a31882 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline pipeline XlmRoBertaForTokenClassification from juhyun76 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline` is a English model originally trained by juhyun76. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline_en_5.5.0_3.0_1727132971232.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline_en_5.5.0_3.0_1727132971232.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_juhyun76_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.7 MB| + +## References + +https://huggingface.co/juhyun76/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_clboetticher_school_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_clboetticher_school_en.md new file mode 100644 index 00000000000000..5dfe5d08839c0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_clboetticher_school_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_clboetticher_school XlmRoBertaForTokenClassification from clboetticher-school +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_clboetticher_school +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_clboetticher_school` is a English model originally trained by clboetticher-school. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_clboetticher_school_en_5.5.0_3.0_1727132540994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_clboetticher_school_en_5.5.0_3.0_1727132540994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_clboetticher_school","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_clboetticher_school", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_clboetticher_school| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/clboetticher-school/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline_en.md new file mode 100644 index 00000000000000..88093567d5b1a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline pipeline XlmRoBertaForTokenClassification from clboetticher-school +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline` is a English model originally trained by clboetticher-school. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline_en_5.5.0_3.0_1727132628814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline_en_5.5.0_3.0_1727132628814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_clboetticher_school_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/clboetticher-school/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_esperesa_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_esperesa_en.md new file mode 100644 index 00000000000000..f08cf7a2468d24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_esperesa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_esperesa XlmRoBertaForTokenClassification from esperesa +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_esperesa +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_esperesa` is a English model originally trained by esperesa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_esperesa_en_5.5.0_3.0_1727132067738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_esperesa_en_5.5.0_3.0_1727132067738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_esperesa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_esperesa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_esperesa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/esperesa/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_en.md new file mode 100644 index 00000000000000..73f506ed4d67d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan XlmRoBertaForTokenClassification from Arnaudmkonan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan` is a English model originally trained by Arnaudmkonan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_en_5.5.0_3.0_1727132035379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_en_5.5.0_3.0_1727132035379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Arnaudmkonan/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline_en.md new file mode 100644 index 00000000000000..4393b757a52f55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline pipeline XlmRoBertaForTokenClassification from Arnaudmkonan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline` is a English model originally trained by Arnaudmkonan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline_en_5.5.0_3.0_1727132101751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline_en_5.5.0_3.0_1727132101751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_arnaudmkonan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Arnaudmkonan/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_edwardjross_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_edwardjross_en.md new file mode 100644 index 00000000000000..914b8d3e7dc8a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_edwardjross_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_edwardjross XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_edwardjross +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_edwardjross` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_edwardjross_en_5.5.0_3.0_1727132220889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_edwardjross_en_5.5.0_3.0_1727132220889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_edwardjross","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_edwardjross", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_edwardjross| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline_en.md new file mode 100644 index 00000000000000..5fa86f95fdef6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline pipeline XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline_en_5.5.0_3.0_1727132284996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline_en_5.5.0_3.0_1727132284996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_edwardjross_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline_en.md new file mode 100644 index 00000000000000..8b994b72e838ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline pipeline XlmRoBertaForTokenClassification from ligerre +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline` is a English model originally trained by ligerre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline_en_5.5.0_3.0_1727132729786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline_en_5.5.0_3.0_1727132729786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_ligerre_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/ligerre/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_misterneil_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_misterneil_en.md new file mode 100644 index 00000000000000..03813ab53e91a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_misterneil_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_misterneil XlmRoBertaForTokenClassification from misterneil +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_misterneil +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_misterneil` is a English model originally trained by misterneil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_misterneil_en_5.5.0_3.0_1727133168862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_misterneil_en_5.5.0_3.0_1727133168862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_misterneil","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_misterneil", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_misterneil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/misterneil/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_penguinman73_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_penguinman73_en.md new file mode 100644 index 00000000000000..cd998884554207 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_penguinman73_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_penguinman73 XlmRoBertaForTokenClassification from penguinman73 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_penguinman73 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_penguinman73` is a English model originally trained by penguinman73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_penguinman73_en_5.5.0_3.0_1727062023810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_penguinman73_en_5.5.0_3.0_1727062023810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_penguinman73","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_penguinman73", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_penguinman73| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.0 MB| + +## References + +https://huggingface.co/penguinman73/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_thucdangvan020999_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_thucdangvan020999_en.md new file mode 100644 index 00000000000000..7d69996a0b52df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_thucdangvan020999_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_thucdangvan020999 XlmRoBertaForTokenClassification from thucdangvan020999 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_thucdangvan020999 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_thucdangvan020999` is a English model originally trained by thucdangvan020999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_thucdangvan020999_en_5.5.0_3.0_1727132679594.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_thucdangvan020999_en_5.5.0_3.0_1727132679594.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_thucdangvan020999","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_thucdangvan020999", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_thucdangvan020999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/thucdangvan020999/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline_en.md new file mode 100644 index 00000000000000..7a1b18210fd36c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline pipeline XlmRoBertaForTokenClassification from agvelu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline` is a English model originally trained by agvelu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline_en_5.5.0_3.0_1727062084285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline_en_5.5.0_3.0_1727062084285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_agvelu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/agvelu/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_hcy5561_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_hcy5561_en.md new file mode 100644 index 00000000000000..e9c417a3dfe579 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_hcy5561_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_hcy5561 XlmRoBertaForTokenClassification from hcy5561 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_hcy5561 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_hcy5561` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hcy5561_en_5.5.0_3.0_1727061541689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hcy5561_en_5.5.0_3.0_1727061541689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_hcy5561","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_hcy5561", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_hcy5561| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/hcy5561/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline_en.md new file mode 100644 index 00000000000000..88f737d77de33a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline pipeline XlmRoBertaForTokenClassification from hcy5561 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline_en_5.5.0_3.0_1727061647734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline_en_5.5.0_3.0_1727061647734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_hcy5561_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/hcy5561/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_henryjiang_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_henryjiang_en.md new file mode 100644 index 00000000000000..3484cb9b8439cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_henryjiang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_henryjiang XlmRoBertaForTokenClassification from henryjiang +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_henryjiang +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_henryjiang` is a English model originally trained by henryjiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_henryjiang_en_5.5.0_3.0_1727132112453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_henryjiang_en_5.5.0_3.0_1727132112453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_henryjiang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_henryjiang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_henryjiang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|833.1 MB| + +## References + +https://huggingface.co/henryjiang/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline_en.md new file mode 100644 index 00000000000000..bcb15a29258064 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline pipeline XlmRoBertaForTokenClassification from henryjiang +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline` is a English model originally trained by henryjiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline_en_5.5.0_3.0_1727132195280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline_en_5.5.0_3.0_1727132195280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_henryjiang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|833.1 MB| + +## References + +https://huggingface.co/henryjiang/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_ryatora_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_ryatora_en.md new file mode 100644 index 00000000000000..9a0e19c851ee98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_ryatora_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_ryatora XlmRoBertaForTokenClassification from ryatora +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_ryatora +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_ryatora` is a English model originally trained by ryatora. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ryatora_en_5.5.0_3.0_1727133268595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ryatora_en_5.5.0_3.0_1727133268595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_ryatora","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_ryatora", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_ryatora| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/ryatora/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_seobak_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_seobak_en.md new file mode 100644 index 00000000000000..8fe9b3cc5da1ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_seobak_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_seobak XlmRoBertaForTokenClassification from seobak +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_seobak +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_seobak` is a English model originally trained by seobak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_seobak_en_5.5.0_3.0_1727133052888.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_seobak_en_5.5.0_3.0_1727133052888.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_seobak","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_seobak", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_seobak| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/seobak/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_seobak_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_seobak_pipeline_en.md new file mode 100644 index 00000000000000..7b434c1bb642fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_seobak_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_seobak_pipeline pipeline XlmRoBertaForTokenClassification from seobak +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_seobak_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_seobak_pipeline` is a English model originally trained by seobak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_seobak_pipeline_en_5.5.0_3.0_1727133155549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_seobak_pipeline_en_5.5.0_3.0_1727133155549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_seobak_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_seobak_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_seobak_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/seobak/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_zardian_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_zardian_en.md new file mode 100644 index 00000000000000..e9226da2034f01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_zardian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_zardian XlmRoBertaForTokenClassification from Zardian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_zardian +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_zardian` is a English model originally trained by Zardian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_zardian_en_5.5.0_3.0_1727133058148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_zardian_en_5.5.0_3.0_1727133058148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_zardian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_zardian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_zardian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/Zardian/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_zardian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_zardian_pipeline_en.md new file mode 100644 index 00000000000000..83632726c915b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_zardian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_zardian_pipeline pipeline XlmRoBertaForTokenClassification from Zardian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_zardian_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_zardian_pipeline` is a English model originally trained by Zardian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_zardian_pipeline_en_5.5.0_3.0_1727133160822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_zardian_pipeline_en_5.5.0_3.0_1727133160822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_zardian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_zardian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_zardian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/Zardian/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_hate_speech_ben_hin_bn.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_hate_speech_ben_hin_bn.md new file mode 100644 index 00000000000000..a657a2036bacb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_hate_speech_ben_hin_bn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bengali xlm_roberta_base_hate_speech_ben_hin XlmRoBertaForSequenceClassification from kingshukroy +author: John Snow Labs +name: xlm_roberta_base_hate_speech_ben_hin +date: 2024-09-23 +tags: [bn, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_hate_speech_ben_hin` is a Bengali model originally trained by kingshukroy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_hate_speech_ben_hin_bn_5.5.0_3.0_1727089310552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_hate_speech_ben_hin_bn_5.5.0_3.0_1727089310552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_hate_speech_ben_hin","bn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_hate_speech_ben_hin", "bn") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_hate_speech_ben_hin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|bn| +|Size:|791.2 MB| + +## References + +https://huggingface.co/kingshukroy/xlm-roberta-base-hate-speech-ben-hin \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_language_detection_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_language_detection_finetuned_en.md new file mode 100644 index 00000000000000..4d892cce314608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_language_detection_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_language_detection_finetuned XlmRoBertaForSequenceClassification from RonTon05 +author: John Snow Labs +name: xlm_roberta_base_language_detection_finetuned +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_language_detection_finetuned` is a English model originally trained by RonTon05. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_language_detection_finetuned_en_5.5.0_3.0_1727088733158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_language_detection_finetuned_en_5.5.0_3.0_1727088733158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_language_detection_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_language_detection_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_language_detection_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|890.4 MB| + +## References + +https://huggingface.co/RonTon05/xlm-roberta-base-language-detection-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_en.md new file mode 100644 index 00000000000000..0beeff250764aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_en_5.5.0_3.0_1727125835133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_en_5.5.0_3.0_1727125835133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|804.8 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr5e-06_seed42_basic_original_amh-esp-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..faf39b0fefca5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727125963604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727125963604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr5e_06_seed42_basic_original_amh_esp_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|804.8 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr5e-06_seed42_basic_original_amh-esp-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_english_tweet_sentiment_english_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_english_tweet_sentiment_english_en.md new file mode 100644 index 00000000000000..354e7c91881b33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_english_tweet_sentiment_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_tweet_sentiment_english XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_tweet_sentiment_english +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_tweet_sentiment_english` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_tweet_sentiment_english_en_5.5.0_3.0_1727088551061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_tweet_sentiment_english_en_5.5.0_3.0_1727088551061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_tweet_sentiment_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_tweet_sentiment_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_tweet_sentiment_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|647.0 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-tweet-sentiment-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_german_xnli_german_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_german_xnli_german_en.md new file mode 100644 index 00000000000000..1c76ddea6c1884 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_german_xnli_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_german_xnli_german XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_german_xnli_german +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_german_xnli_german` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_german_xnli_german_en_5.5.0_3.0_1727126538849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_german_xnli_german_en_5.5.0_3.0_1727126538849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_german_xnli_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_german_xnli_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_german_xnli_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|528.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-de-xnli-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_german_xnli_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_german_xnli_german_pipeline_en.md new file mode 100644 index 00000000000000..3acc64065564bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_german_xnli_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_german_xnli_german_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_german_xnli_german_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_german_xnli_german_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_german_xnli_german_pipeline_en_5.5.0_3.0_1727126586159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_german_xnli_german_pipeline_en_5.5.0_3.0_1727126586159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_german_xnli_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_german_xnli_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_german_xnli_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|528.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-de-xnli-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian_en.md new file mode 100644 index 00000000000000..3c67124b9bb52b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian_en_5.5.0_3.0_1727126717646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian_en_5.5.0_3.0_1727126717646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_italian_60000_tweet_sentiment_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|443.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-it-60000-tweet-sentiment-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000_en.md new file mode 100644 index 00000000000000..46a928bb376630 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000_en_5.5.0_3.0_1727126677799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000_en_5.5.0_3.0_1727126677799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_tweet_sentiment_italian_trimmed_italian_15000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|360.2 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-tweet-sentiment-it-trimmed-it-15000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_vaxxstance_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_vaxxstance_spanish_en.md new file mode 100644 index 00000000000000..e54cf92d4d45ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_vaxxstance_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_vaxxstance_spanish XlmRoBertaForSequenceClassification from nouman-10 +author: John Snow Labs +name: xlm_roberta_base_vaxxstance_spanish +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vaxxstance_spanish` is a English model originally trained by nouman-10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vaxxstance_spanish_en_5.5.0_3.0_1727125849546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vaxxstance_spanish_en_5.5.0_3.0_1727125849546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_vaxxstance_spanish","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_vaxxstance_spanish", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vaxxstance_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/nouman-10/xlm-roberta-base_vaxxstance_spanish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_vaxxstance_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_vaxxstance_spanish_pipeline_en.md new file mode 100644 index 00000000000000..c430cce4bb8ae9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_vaxxstance_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_vaxxstance_spanish_pipeline pipeline XlmRoBertaForSequenceClassification from nouman-10 +author: John Snow Labs +name: xlm_roberta_base_vaxxstance_spanish_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vaxxstance_spanish_pipeline` is a English model originally trained by nouman-10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vaxxstance_spanish_pipeline_en_5.5.0_3.0_1727125931087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vaxxstance_spanish_pipeline_en_5.5.0_3.0_1727125931087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_vaxxstance_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_vaxxstance_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vaxxstance_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.3 MB| + +## References + +https://huggingface.co/nouman-10/xlm-roberta-base_vaxxstance_spanish + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline_en.md new file mode 100644 index 00000000000000..1ed0a2c68f078e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline_en_5.5.0_3.0_1727089335088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline_en_5.5.0_3.0_1727089335088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|353.6 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-es-trimmed-es-10000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xmlr_roberta_base_finetuned_panx_korean_en.md b/docs/_posts/ahmedlone127/2024-09-23-xmlr_roberta_base_finetuned_panx_korean_en.md new file mode 100644 index 00000000000000..43383a8aefb96f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xmlr_roberta_base_finetuned_panx_korean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xmlr_roberta_base_finetuned_panx_korean XlmRoBertaForTokenClassification from ghks4861 +author: John Snow Labs +name: xmlr_roberta_base_finetuned_panx_korean +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xmlr_roberta_base_finetuned_panx_korean` is a English model originally trained by ghks4861. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xmlr_roberta_base_finetuned_panx_korean_en_5.5.0_3.0_1727132271934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xmlr_roberta_base_finetuned_panx_korean_en_5.5.0_3.0_1727132271934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xmlr_roberta_base_finetuned_panx_korean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xmlr_roberta_base_finetuned_panx_korean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xmlr_roberta_base_finetuned_panx_korean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/ghks4861/xmlr-roberta-base-finetuned-panx-ko \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xnli_xlm_r_only_bulgarian_en.md b/docs/_posts/ahmedlone127/2024-09-23-xnli_xlm_r_only_bulgarian_en.md new file mode 100644 index 00000000000000..0b173a804d5fca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xnli_xlm_r_only_bulgarian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xnli_xlm_r_only_bulgarian XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_bulgarian +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_bulgarian` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_bulgarian_en_5.5.0_3.0_1727126414636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_bulgarian_en_5.5.0_3.0_1727126414636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_bulgarian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_bulgarian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_bulgarian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|803.0 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_bg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-0_0000005_0_999_rose_e_wang_en.md b/docs/_posts/ahmedlone127/2024-09-24-0_0000005_0_999_rose_e_wang_en.md new file mode 100644 index 00000000000000..360018005e6bcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-0_0000005_0_999_rose_e_wang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 0_0000005_0_999_rose_e_wang RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: 0_0000005_0_999_rose_e_wang +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0_0000005_0_999_rose_e_wang` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0_0000005_0_999_rose_e_wang_en_5.5.0_3.0_1727171744851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0_0000005_0_999_rose_e_wang_en_5.5.0_3.0_1727171744851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_0000005_0_999_rose_e_wang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_0000005_0_999_rose_e_wang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0_0000005_0_999_rose_e_wang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/0.0000005_0.999 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-1030_1_en.md b/docs/_posts/ahmedlone127/2024-09-24-1030_1_en.md new file mode 100644 index 00000000000000..49cf9b5f541d0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-1030_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 1030_1 DistilBertForSequenceClassification from tingchih +author: John Snow Labs +name: 1030_1 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1030_1` is a English model originally trained by tingchih. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1030_1_en_5.5.0_3.0_1727154388487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1030_1_en_5.5.0_3.0_1727154388487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("1030_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("1030_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1030_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tingchih/1030-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-1030_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-1030_1_pipeline_en.md new file mode 100644 index 00000000000000..64ac2aa2efe866 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-1030_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 1030_1_pipeline pipeline DistilBertForSequenceClassification from tingchih +author: John Snow Labs +name: 1030_1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1030_1_pipeline` is a English model originally trained by tingchih. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1030_1_pipeline_en_5.5.0_3.0_1727154406470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1030_1_pipeline_en_5.5.0_3.0_1727154406470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("1030_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("1030_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1030_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tingchih/1030-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-activitat3_en.md b/docs/_posts/ahmedlone127/2024-09-24-activitat3_en.md new file mode 100644 index 00000000000000..e7b5fc5a3c8915 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-activitat3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English activitat3 RoBertaForSequenceClassification from rcodina +author: John Snow Labs +name: activitat3 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`activitat3` is a English model originally trained by rcodina. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/activitat3_en_5.5.0_3.0_1727171075907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/activitat3_en_5.5.0_3.0_1727171075907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("activitat3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("activitat3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|activitat3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|426.4 MB| + +## References + +https://huggingface.co/rcodina/activitat3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-activitat3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-activitat3_pipeline_en.md new file mode 100644 index 00000000000000..3b769764772f00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-activitat3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English activitat3_pipeline pipeline RoBertaForSequenceClassification from rcodina +author: John Snow Labs +name: activitat3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`activitat3_pipeline` is a English model originally trained by rcodina. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/activitat3_pipeline_en_5.5.0_3.0_1727171109140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/activitat3_pipeline_en_5.5.0_3.0_1727171109140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("activitat3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("activitat3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|activitat3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|426.4 MB| + +## References + +https://huggingface.co/rcodina/activitat3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_small_kor_v1_en.md b/docs/_posts/ahmedlone127/2024-09-24-albert_small_kor_v1_en.md new file mode 100644 index 00000000000000..bb0f6c4d7e8f2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_small_kor_v1_en.md @@ -0,0 +1,96 @@ +--- +layout: model +title: English albert_small_kor_v1 AlbertEmbeddings from bongsoo +author: John Snow Labs +name: albert_small_kor_v1 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, albert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_small_kor_v1` is a English model originally trained by bongsoo. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_small_kor_v1_en_5.5.0_3.0_1727158725304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_small_kor_v1_en_5.5.0_3.0_1727158725304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = AlbertEmbeddings.pretrained("albert_small_kor_v1","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = AlbertEmbeddings.pretrained("albert_small_kor_v1","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_small_kor_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|41.7 MB| + +## References + +References + +https://huggingface.co/bongsoo/albert-small-kor-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_small_talk_5_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_small_talk_5_16_5_en.md new file mode 100644 index 00000000000000..9a9d6abb2bcd06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_small_talk_5_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_small_talk_5_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_small_talk_5_16_5 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_small_talk_5_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_small_talk_5_16_5_en_5.5.0_3.0_1727167978436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_small_talk_5_16_5_en_5.5.0_3.0_1727167978436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_small_talk_5_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_small_talk_5_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_small_talk_5_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-small_talk-5-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_small_talk_5_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_small_talk_5_16_5_pipeline_en.md new file mode 100644 index 00000000000000..1d5e082ecf5cf0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_small_talk_5_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_small_talk_5_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_small_talk_5_16_5_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_small_talk_5_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_small_talk_5_16_5_pipeline_en_5.5.0_3.0_1727168043789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_small_talk_5_16_5_pipeline_en_5.5.0_3.0_1727168043789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_small_talk_5_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_small_talk_5_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_small_talk_5_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-small_talk-5-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_work_3_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_work_3_16_5_en.md new file mode 100644 index 00000000000000..b3677abc2d801d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_work_3_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_work_3_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_work_3_16_5 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_work_3_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_work_3_16_5_en_5.5.0_3.0_1727172010747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_work_3_16_5_en_5.5.0_3.0_1727172010747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_work_3_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_work_3_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_work_3_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-work-3-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_work_3_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_work_3_16_5_pipeline_en.md new file mode 100644 index 00000000000000..da9db2db9867ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-all_roberta_large_v1_work_3_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_work_3_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_work_3_16_5_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_work_3_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_work_3_16_5_pipeline_en_5.5.0_3.0_1727172079312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_work_3_16_5_pipeline_en_5.5.0_3.0_1727172079312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_work_3_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_work_3_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_work_3_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-work-3-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-angela_untranslated_diacritics_eval_en.md b/docs/_posts/ahmedlone127/2024-09-24-angela_untranslated_diacritics_eval_en.md new file mode 100644 index 00000000000000..a19c520b19436f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-angela_untranslated_diacritics_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_untranslated_diacritics_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_diacritics_eval +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_diacritics_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_diacritics_eval_en_5.5.0_3.0_1727147416973.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_diacritics_eval_en_5.5.0_3.0_1727147416973.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_diacritics_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_diacritics_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_diacritics_eval| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_diacritics_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-angela_untranslated_diacritics_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-angela_untranslated_diacritics_eval_pipeline_en.md new file mode 100644 index 00000000000000..cd8b3ebb075296 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-angela_untranslated_diacritics_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_untranslated_diacritics_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_diacritics_eval_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_diacritics_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_diacritics_eval_pipeline_en_5.5.0_3.0_1727147468390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_diacritics_eval_pipeline_en_5.5.0_3.0_1727147468390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_untranslated_diacritics_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_untranslated_diacritics_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_diacritics_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_diacritics_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-autotrain_1_xlmr_rs_53879126771_en.md b/docs/_posts/ahmedlone127/2024-09-24-autotrain_1_xlmr_rs_53879126771_en.md new file mode 100644 index 00000000000000..04cd7cb03eba82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-autotrain_1_xlmr_rs_53879126771_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_1_xlmr_rs_53879126771 XlmRoBertaForTokenClassification from tinyYhorm +author: John Snow Labs +name: autotrain_1_xlmr_rs_53879126771 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_1_xlmr_rs_53879126771` is a English model originally trained by tinyYhorm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_1_xlmr_rs_53879126771_en_5.5.0_3.0_1727147790963.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_1_xlmr_rs_53879126771_en_5.5.0_3.0_1727147790963.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("autotrain_1_xlmr_rs_53879126771","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("autotrain_1_xlmr_rs_53879126771", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_1_xlmr_rs_53879126771| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|770.0 MB| + +## References + +https://huggingface.co/tinyYhorm/autotrain-1-xlmr-rs-53879126771 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-autotrain_1_xlmr_rs_53879126771_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-autotrain_1_xlmr_rs_53879126771_pipeline_en.md new file mode 100644 index 00000000000000..90f7f846f00e93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-autotrain_1_xlmr_rs_53879126771_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_1_xlmr_rs_53879126771_pipeline pipeline XlmRoBertaForTokenClassification from tinyYhorm +author: John Snow Labs +name: autotrain_1_xlmr_rs_53879126771_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_1_xlmr_rs_53879126771_pipeline` is a English model originally trained by tinyYhorm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_1_xlmr_rs_53879126771_pipeline_en_5.5.0_3.0_1727147952345.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_1_xlmr_rs_53879126771_pipeline_en_5.5.0_3.0_1727147952345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_1_xlmr_rs_53879126771_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_1_xlmr_rs_53879126771_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_1_xlmr_rs_53879126771_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|770.0 MB| + +## References + +https://huggingface.co/tinyYhorm/autotrain-1-xlmr-rs-53879126771 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-autotrain_hindi_ner_xlmr_869827677_en.md b/docs/_posts/ahmedlone127/2024-09-24-autotrain_hindi_ner_xlmr_869827677_en.md new file mode 100644 index 00000000000000..a2e6fddf222c91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-autotrain_hindi_ner_xlmr_869827677_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_hindi_ner_xlmr_869827677 XlmRoBertaForTokenClassification from pujaburman30 +author: John Snow Labs +name: autotrain_hindi_ner_xlmr_869827677 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_hindi_ner_xlmr_869827677` is a English model originally trained by pujaburman30. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_hindi_ner_xlmr_869827677_en_5.5.0_3.0_1727148036528.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_hindi_ner_xlmr_869827677_en_5.5.0_3.0_1727148036528.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("autotrain_hindi_ner_xlmr_869827677","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("autotrain_hindi_ner_xlmr_869827677", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_hindi_ner_xlmr_869827677| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|770.6 MB| + +## References + +https://huggingface.co/pujaburman30/autotrain-hi_ner_xlmr-869827677 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-autotrain_hindi_ner_xlmr_869827677_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-autotrain_hindi_ner_xlmr_869827677_pipeline_en.md new file mode 100644 index 00000000000000..d668d8ca96ccab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-autotrain_hindi_ner_xlmr_869827677_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_hindi_ner_xlmr_869827677_pipeline pipeline XlmRoBertaForTokenClassification from pujaburman30 +author: John Snow Labs +name: autotrain_hindi_ner_xlmr_869827677_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_hindi_ner_xlmr_869827677_pipeline` is a English model originally trained by pujaburman30. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_hindi_ner_xlmr_869827677_pipeline_en_5.5.0_3.0_1727148187756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_hindi_ner_xlmr_869827677_pipeline_en_5.5.0_3.0_1727148187756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_hindi_ner_xlmr_869827677_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_hindi_ner_xlmr_869827677_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_hindi_ner_xlmr_869827677_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|770.6 MB| + +## References + +https://huggingface.co/pujaburman30/autotrain-hi_ner_xlmr-869827677 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_squad_v2_bosnian_16_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_squad_v2_bosnian_16_en.md new file mode 100644 index 00000000000000..c42409a1595927 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_squad_v2_bosnian_16_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_finetuned_squad_v2_bosnian_16 BertForQuestionAnswering from lauraparra28 +author: John Snow Labs +name: bert_base_cased_finetuned_squad_v2_bosnian_16 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_squad_v2_bosnian_16` is a English model originally trained by lauraparra28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_v2_bosnian_16_en_5.5.0_3.0_1727176007761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_v2_bosnian_16_en_5.5.0_3.0_1727176007761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_squad_v2_bosnian_16","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_squad_v2_bosnian_16", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_squad_v2_bosnian_16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lauraparra28/bert-base-cased-finetuned-squad_v2-bs_16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline_en.md new file mode 100644 index 00000000000000..a96b3b68128091 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline pipeline BertForQuestionAnswering from lauraparra28 +author: John Snow Labs +name: bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline` is a English model originally trained by lauraparra28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline_en_5.5.0_3.0_1727176028261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline_en_5.5.0_3.0_1727176028261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_squad_v2_bosnian_16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lauraparra28/bert-base-cased-finetuned-squad_v2-bs_16 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_scmedium_scqa2_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_scmedium_scqa2_en.md new file mode 100644 index 00000000000000..3a6bfecb642986 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_scmedium_scqa2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_scmedium_scqa2 BertForQuestionAnswering from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_base_cased_scmedium_scqa2 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_scmedium_scqa2` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_scmedium_scqa2_en_5.5.0_3.0_1727175347706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_scmedium_scqa2_en_5.5.0_3.0_1727175347706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_scmedium_scqa2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_scmedium_scqa2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_scmedium_scqa2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-cased-scmedium-scqa2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_russian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_russian_cased_en.md new file mode 100644 index 00000000000000..399901617ca5e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_russian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_english_greek_modern_russian_cased BertEmbeddings from Geotrend +author: John Snow Labs +name: bert_base_english_greek_modern_russian_cased +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_english_greek_modern_russian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_english_greek_modern_russian_cased_en_5.5.0_3.0_1727161619829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_english_greek_modern_russian_cased_en_5.5.0_3.0_1727161619829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_english_greek_modern_russian_cased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_english_greek_modern_russian_cased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_english_greek_modern_russian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|433.2 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-ru-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_finetuned_rqa_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_finetuned_rqa_pipeline_xx.md new file mode 100644 index 00000000000000..21c4df879a1199 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_finetuned_rqa_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_finetuned_rqa_pipeline pipeline BertForQuestionAnswering from AsifAbrar6 +author: John Snow Labs +name: bert_base_multilingual_cased_finetuned_rqa_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_finetuned_rqa_pipeline` is a Multilingual model originally trained by AsifAbrar6. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_rqa_pipeline_xx_5.5.0_3.0_1727163292797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_rqa_pipeline_xx_5.5.0_3.0_1727163292797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_finetuned_rqa_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_finetuned_rqa_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_finetuned_rqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/AsifAbrar6/bert-base-multilingual-cased-finetuned-RQA + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_finetuned_rqa_xx.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_finetuned_rqa_xx.md new file mode 100644 index 00000000000000..8e7f34e6ae6318 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_finetuned_rqa_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_finetuned_rqa BertForQuestionAnswering from AsifAbrar6 +author: John Snow Labs +name: bert_base_multilingual_cased_finetuned_rqa +date: 2024-09-24 +tags: [xx, open_source, onnx, question_answering, bert] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_finetuned_rqa` is a Multilingual model originally trained by AsifAbrar6. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_rqa_xx_5.5.0_3.0_1727163256211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_rqa_xx_5.5.0_3.0_1727163256211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_cased_finetuned_rqa","xx") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_cased_finetuned_rqa", "xx") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_finetuned_rqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/AsifAbrar6/bert-base-multilingual-cased-finetuned-RQA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_sv2_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_sv2_pipeline_xx.md new file mode 100644 index 00000000000000..573f25548a615b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_sv2_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_sv2_pipeline pipeline BertForQuestionAnswering from monakth +author: John Snow Labs +name: bert_base_multilingual_cased_sv2_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_sv2_pipeline` is a Multilingual model originally trained by monakth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_sv2_pipeline_xx_5.5.0_3.0_1727175396956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_sv2_pipeline_xx_5.5.0_3.0_1727175396956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_sv2_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_sv2_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_sv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/monakth/bert-base-multilingual-cased-sv2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..c8e9a8461e62ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727176181678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727176181678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_100_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.0-b-32-lr-8e-07-dp-0.5-ss-100-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..c1f55cdc6675ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727163481076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727163481076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.69-b-32-lr-4e-06-dp-0.1-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..64da160f2c69da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727163506920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727163506920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.69-b-32-lr-4e-06-dp-0.1-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..8c27c535706bd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727175807534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727175807534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.11-b-32-lr-8e-07-dp-0.5-ss-700-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_en.md new file mode 100644 index 00000000000000..714f6c8676e656 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_en_5.5.0_3.0_1727163199936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_en_5.5.0_3.0_1727163199936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.44-b-32-lr-8e-07-dp-0.5-ss-0-st-False-fh-False-hs-800 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..848d56f9b9e697 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727163833689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727163833689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-4.87-b-32-lr-4e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_en.md new file mode 100644 index 00000000000000..920f39b1a2953d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_en_5.5.0_3.0_1727163793729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_en_5.5.0_3.0_1727163793729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.5-lr-1e-05-wd-0.001-dp-0.2-ss-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline_en.md new file mode 100644 index 00000000000000..dd2821e259194f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline_en_5.5.0_3.0_1727163816652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline_en_5.5.0_3.0_1727163816652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_5_lr_1e_05_wd_0_001_dp_0_2_swati_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.5-lr-1e-05-wd-0.001-dp-0.2-ss-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_en.md new file mode 100644 index 00000000000000..73a80ded98edb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_en_5.5.0_3.0_1727175641920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_en_5.5.0_3.0_1727175641920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.9-lr-1e-06-wd-0.001-dp-0.99999-ss-160000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_en.md new file mode 100644 index 00000000000000..190936d8112520 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_en_5.5.0_3.0_1727175781415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_en_5.5.0_3.0_1727175781415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.9-lr-1e-06-wd-0.001-dp-0.99999-ss-80000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline_en.md new file mode 100644 index 00000000000000..1715f6974faa7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline_en_5.5.0_3.0_1727175802203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline_en_5.5.0_3.0_1727175802203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_80000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.9-lr-1e-06-wd-0.001-dp-0.99999-ss-80000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_en.md new file mode 100644 index 00000000000000..c317e093abae86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_en_5.5.0_3.0_1727163262037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_en_5.5.0_3.0_1727163262037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-10.0-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en.md new file mode 100644 index 00000000000000..e13467368a96e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en_5.5.0_3.0_1727163284975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline_en_5.5.0_3.0_1727163284975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-10.0-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-1000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_en.md new file mode 100644 index 00000000000000..7610ab309acbd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727163461473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727163461473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-05-wd-0.001-dp-0.2-ss-700-st-False-fh-True-hs-666 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline_en.md new file mode 100644 index 00000000000000..c279529dde5d78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline_en_5.5.0_3.0_1727163482205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline_en_5.5.0_3.0_1727163482205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_2_swati_700_southern_sotho_false_fh_true_hs_666_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-05-wd-0.001-dp-0.2-ss-700-st-False-fh-True-hs-666 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_en.md new file mode 100644 index 00000000000000..15dd0420b096d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_en_5.5.0_3.0_1727163926396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_en_5.5.0_3.0_1727163926396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.2-ss-700-st-True-fh-True \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline_en.md new file mode 100644 index 00000000000000..c1964e88d0f231 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1727163946848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1727163946848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_700_southern_sotho_true_fh_true_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.2-ss-700-st-True-fh-True + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_en.md new file mode 100644 index 00000000000000..51dc228f9b10cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_en_5.5.0_3.0_1727163343938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_en_5.5.0_3.0_1727163343938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.29-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en.md new file mode 100644 index 00000000000000..adc2a55a7e997d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en_5.5.0_3.0_1727163364437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline_en_5.5.0_3.0_1727163364437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_29_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.29-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-300 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4_en.md new file mode 100644 index 00000000000000..23d3170e08ec80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4_en_5.5.0_3.0_1727175347800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4_en_5.5.0_3.0_1727175347800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-0.0001-wd-0.001-dp-0.4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_en.md new file mode 100644 index 00000000000000..e3cfcf21aca705 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_en_5.5.0_3.0_1727176190428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_en_5.5.0_3.0_1727176190428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-06-wd-0.001-dp-0.99999-ss-50000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_en.md new file mode 100644 index 00000000000000..f36ec8ae280b3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_en_5.5.0_3.0_1727163618337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_en_5.5.0_3.0_1727163618337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-4e-05-wd-0.001-dp-0.999 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline_en.md new file mode 100644 index 00000000000000..8087939eaa1602 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline_en_5.5.0_3.0_1727163638907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline_en_5.5.0_3.0_1727163638907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_4e_05_wd_0_001_dp_0_999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-4e-05-wd-0.001-dp-0.999 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666_en.md new file mode 100644 index 00000000000000..e4074b65f8fa7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727175825323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727175825323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_3_0_lr_1e_06_wd_0_001_dp_0_2_swati_8228_southern_sotho_false_fh_true_hs_666| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-3.0-lr-1e-06-wd-0.001-dp-0.2-ss-8228-st-False-fh-True-hs-666 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_news_2009_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_news_2009_en.md new file mode 100644 index 00000000000000..bbcb38d49d195b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_news_2009_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_news_2009 BertEmbeddings from sally9805 +author: John Snow Labs +name: bert_base_uncased_finetuned_news_2009 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_news_2009` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_2009_en_5.5.0_3.0_1727177493599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_2009_en_5.5.0_3.0_1727177493599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_news_2009","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_news_2009","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_news_2009| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-2009 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_nohistory_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_nohistory_en.md new file mode 100644 index 00000000000000..1d20395d120dbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_nohistory_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_quac_nohistory BertForQuestionAnswering from Jellevdl +author: John Snow Labs +name: bert_base_uncased_finetuned_quac_nohistory +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_quac_nohistory` is a English model originally trained by Jellevdl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_nohistory_en_5.5.0_3.0_1727163668616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_nohistory_en_5.5.0_3.0_1727163668616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_quac_nohistory","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_quac_nohistory", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_quac_nohistory| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Jellevdl/bert-base-uncased-finetuned-quac-noHistory \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_issues_128_igory1999_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_issues_128_igory1999_en.md new file mode 100644 index 00000000000000..3597c8b204c8c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_issues_128_igory1999_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_igory1999 BertEmbeddings from igory1999 +author: John Snow Labs +name: bert_base_uncased_issues_128_igory1999 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_igory1999` is a English model originally trained by igory1999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_igory1999_en_5.5.0_3.0_1727173511657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_igory1999_en_5.5.0_3.0_1727173511657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_igory1999","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_igory1999","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_igory1999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/igory1999/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_issues_128_igory1999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_issues_128_igory1999_pipeline_en.md new file mode 100644 index 00000000000000..2996d93762f8af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_issues_128_igory1999_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_igory1999_pipeline pipeline BertEmbeddings from igory1999 +author: John Snow Labs +name: bert_base_uncased_issues_128_igory1999_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_igory1999_pipeline` is a English model originally trained by igory1999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_igory1999_pipeline_en_5.5.0_3.0_1727173532770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_igory1999_pipeline_en_5.5.0_3.0_1727173532770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_issues_128_igory1999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_issues_128_igory1999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_igory1999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/igory1999/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_qna_mlqa_dataset_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_qna_mlqa_dataset_en.md new file mode 100644 index 00000000000000..beaa6ad95f7e81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_qna_mlqa_dataset_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_qna_mlqa_dataset BertForQuestionAnswering from DunnBC22 +author: John Snow Labs +name: bert_base_uncased_qna_mlqa_dataset +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qna_mlqa_dataset` is a English model originally trained by DunnBC22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qna_mlqa_dataset_en_5.5.0_3.0_1727163771530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qna_mlqa_dataset_en_5.5.0_3.0_1727163771530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_qna_mlqa_dataset","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_qna_mlqa_dataset", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qna_mlqa_dataset| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/DunnBC22/bert-base-uncased-QnA-MLQA_Dataset \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_qna_mlqa_dataset_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_qna_mlqa_dataset_pipeline_en.md new file mode 100644 index 00000000000000..0d55d51eaced23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_qna_mlqa_dataset_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_qna_mlqa_dataset_pipeline pipeline BertForQuestionAnswering from DunnBC22 +author: John Snow Labs +name: bert_base_uncased_qna_mlqa_dataset_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qna_mlqa_dataset_pipeline` is a English model originally trained by DunnBC22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qna_mlqa_dataset_pipeline_en_5.5.0_3.0_1727163792312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qna_mlqa_dataset_pipeline_en_5.5.0_3.0_1727163792312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_qna_mlqa_dataset_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_qna_mlqa_dataset_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qna_mlqa_dataset_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/DunnBC22/bert-base-uncased-QnA-MLQA_Dataset + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_scqa1_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_scqa1_en.md new file mode 100644 index 00000000000000..dbaa00d4a09aa4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_scqa1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_scqa1 BertForQuestionAnswering from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_base_uncased_scqa1 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_scqa1` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_scqa1_en_5.5.0_3.0_1727163133858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_scqa1_en_5.5.0_3.0_1727163133858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_scqa1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_scqa1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_scqa1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-uncased-scqa1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_squad_delayedkarma_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_squad_delayedkarma_en.md new file mode 100644 index 00000000000000..a8ea932cb033ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_squad_delayedkarma_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_squad_delayedkarma BertForQuestionAnswering from delayedkarma +author: John Snow Labs +name: bert_finetuned_squad_delayedkarma +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_delayedkarma` is a English model originally trained by delayedkarma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_delayedkarma_en_5.5.0_3.0_1727175623331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_delayedkarma_en_5.5.0_3.0_1727175623331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_delayedkarma","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_delayedkarma", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_delayedkarma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/delayedkarma/bert-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_interview_nepal_bhasa_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_interview_nepal_bhasa_en.md new file mode 100644 index 00000000000000..1be27bd5981be0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_interview_nepal_bhasa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_interview_nepal_bhasa DistilBertForSequenceClassification from eskayML +author: John Snow Labs +name: bert_interview_nepal_bhasa +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_interview_nepal_bhasa` is a English model originally trained by eskayML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_interview_nepal_bhasa_en_5.5.0_3.0_1727136875844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_interview_nepal_bhasa_en_5.5.0_3.0_1727136875844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_interview_nepal_bhasa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_interview_nepal_bhasa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_interview_nepal_bhasa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eskayML/bert_interview_new \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_interview_nepal_bhasa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_interview_nepal_bhasa_pipeline_en.md new file mode 100644 index 00000000000000..77a1ec908b9616 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_interview_nepal_bhasa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_interview_nepal_bhasa_pipeline pipeline DistilBertForSequenceClassification from eskayML +author: John Snow Labs +name: bert_interview_nepal_bhasa_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_interview_nepal_bhasa_pipeline` is a English model originally trained by eskayML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_interview_nepal_bhasa_pipeline_en_5.5.0_3.0_1727136888891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_interview_nepal_bhasa_pipeline_en_5.5.0_3.0_1727136888891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_interview_nepal_bhasa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_interview_nepal_bhasa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_interview_nepal_bhasa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eskayML/bert_interview_new + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_large_cased_squadscqa1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_large_cased_squadscqa1_pipeline_en.md new file mode 100644 index 00000000000000..358aaaa301eef6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_large_cased_squadscqa1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_cased_squadscqa1_pipeline pipeline BertForQuestionAnswering from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_large_cased_squadscqa1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_squadscqa1_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_squadscqa1_pipeline_en_5.5.0_3.0_1727175904915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_squadscqa1_pipeline_en_5.5.0_3.0_1727175904915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_cased_squadscqa1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_cased_squadscqa1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_squadscqa1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-large-cased-squadscqa1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_large_uncased_sparse_90_unstructured_pruneofa_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_large_uncased_sparse_90_unstructured_pruneofa_en.md new file mode 100644 index 00000000000000..68a2bbcf160761 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_large_uncased_sparse_90_unstructured_pruneofa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_uncased_sparse_90_unstructured_pruneofa BertEmbeddings from Intel +author: John Snow Labs +name: bert_large_uncased_sparse_90_unstructured_pruneofa +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_sparse_90_unstructured_pruneofa` is a English model originally trained by Intel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_sparse_90_unstructured_pruneofa_en_5.5.0_3.0_1727173563921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_sparse_90_unstructured_pruneofa_en_5.5.0_3.0_1727173563921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_large_uncased_sparse_90_unstructured_pruneofa","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_large_uncased_sparse_90_unstructured_pruneofa","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_sparse_90_unstructured_pruneofa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|361.7 MB| + +## References + +https://huggingface.co/Intel/bert-large-uncased-sparse-90-unstructured-pruneofa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline_en.md new file mode 100644 index 00000000000000..b94516ed1b36b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline pipeline BertEmbeddings from Intel +author: John Snow Labs +name: bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline` is a English model originally trained by Intel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline_en_5.5.0_3.0_1727173623641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline_en_5.5.0_3.0_1727173623641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_sparse_90_unstructured_pruneofa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|361.7 MB| + +## References + +https://huggingface.co/Intel/bert-large-uncased-sparse-90-unstructured-pruneofa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en.md new file mode 100644 index 00000000000000..aad8fe351cb506 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline pipeline BertEmbeddings from iMahdiGhazavi +author: John Snow Labs +name: bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline` is a English model originally trained by iMahdiGhazavi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en_5.5.0_3.0_1727161796011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en_5.5.0_3.0_1727161796011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|605.8 MB| + +## References + +https://huggingface.co/iMahdiGhazavi/bert-fa-base-uncased-nlp-course-hw2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_political_classification_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_political_classification_en.md new file mode 100644 index 00000000000000..c1565246e8099c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_political_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_political_classification BertForSequenceClassification from harshal-11 +author: John Snow Labs +name: bert_political_classification +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_political_classification` is a English model originally trained by harshal-11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_political_classification_en_5.5.0_3.0_1727149334675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_political_classification_en_5.5.0_3.0_1727149334675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_political_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_political_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_political_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/harshal-11/Bert-political-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_political_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_political_classification_pipeline_en.md new file mode 100644 index 00000000000000..38aca996ba1c32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_political_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_political_classification_pipeline pipeline BertForSequenceClassification from harshal-11 +author: John Snow Labs +name: bert_political_classification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_political_classification_pipeline` is a English model originally trained by harshal-11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_political_classification_pipeline_en_5.5.0_3.0_1727149356671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_political_classification_pipeline_en_5.5.0_3.0_1727149356671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_political_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_political_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_political_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/harshal-11/Bert-political-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_vllm_gemma2b_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_vllm_gemma2b_8_pipeline_en.md new file mode 100644 index 00000000000000..4cc3884ba47157 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_vllm_gemma2b_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_vllm_gemma2b_8_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_8_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_8_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_8_pipeline_en_5.5.0_3.0_1727154524127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_8_pipeline_en_5.5.0_3.0_1727154524127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_vllm_gemma2b_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_vllm_gemma2b_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bertin_roberta_base_spanish_es.md b/docs/_posts/ahmedlone127/2024-09-24-bertin_roberta_base_spanish_es.md new file mode 100644 index 00000000000000..a5e55df24c4b27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bertin_roberta_base_spanish_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bertin_roberta_base_spanish RoBertaEmbeddings from bertin-project +author: John Snow Labs +name: bertin_roberta_base_spanish +date: 2024-09-24 +tags: [es, open_source, onnx, embeddings, roberta] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertin_roberta_base_spanish` is a Castilian, Spanish model originally trained by bertin-project. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertin_roberta_base_spanish_es_5.5.0_3.0_1727168816239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertin_roberta_base_spanish_es_5.5.0_3.0_1727168816239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bertin_roberta_base_spanish","es") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bertin_roberta_base_spanish","es") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertin_roberta_base_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|es| +|Size:|462.2 MB| + +## References + +https://huggingface.co/bertin-project/bertin-roberta-base-spanish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-brwac_v1_5__checkpoint_27_100000_en.md b/docs/_posts/ahmedlone127/2024-09-24-brwac_v1_5__checkpoint_27_100000_en.md new file mode 100644 index 00000000000000..8269acf42c27f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-brwac_v1_5__checkpoint_27_100000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English brwac_v1_5__checkpoint_27_100000 RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_5__checkpoint_27_100000 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_5__checkpoint_27_100000` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_27_100000_en_5.5.0_3.0_1727169121903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_27_100000_en_5.5.0_3.0_1727169121903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("brwac_v1_5__checkpoint_27_100000","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("brwac_v1_5__checkpoint_27_100000","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_5__checkpoint_27_100000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|296.9 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_5__checkpoint_27_100000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_symptemist_es.md b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_symptemist_es.md new file mode 100644 index 00000000000000..59138f40e2ba2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_symptemist_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_symptemist RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist +date: 2024-09-24 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_es_5.5.0_3.0_1727151462665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_es_5.5.0_3.0_1727151462665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_symptemist","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_symptemist", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|441.8 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-symptemist \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_en.md new file mode 100644 index 00000000000000..be50f578bae8c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa RoBertaEmbeddings from Erantr1 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa` is a English model originally trained by Erantr1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_en_5.5.0_3.0_1727169074199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_en_5.5.0_3.0_1727169074199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/Erantr1/my_awesome_eli5_mlm_model_eran_t_imdb_new \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_boldirev_as_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_boldirev_as_pipeline_en.md new file mode 100644 index 00000000000000..c301a9f9249daf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_boldirev_as_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_boldirev_as_pipeline pipeline DistilBertForSequenceClassification from boldirev-as +author: John Snow Labs +name: burmese_awesome_model_boldirev_as_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_boldirev_as_pipeline` is a English model originally trained by boldirev-as. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_boldirev_as_pipeline_en_5.5.0_3.0_1727164849755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_boldirev_as_pipeline_en_5.5.0_3.0_1727164849755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_boldirev_as_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_boldirev_as_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_boldirev_as_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/boldirev-as/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_cobegreene_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_cobegreene_en.md new file mode 100644 index 00000000000000..9847a8234fb283 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_cobegreene_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_cobegreene DistilBertForSequenceClassification from cobegreene +author: John Snow Labs +name: burmese_awesome_model_cobegreene +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_cobegreene` is a English model originally trained by cobegreene. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_cobegreene_en_5.5.0_3.0_1727154733582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_cobegreene_en_5.5.0_3.0_1727154733582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_cobegreene","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_cobegreene", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_cobegreene| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cobegreene/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_cobegreene_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_cobegreene_pipeline_en.md new file mode 100644 index 00000000000000..fb375eb776f702 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_cobegreene_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_cobegreene_pipeline pipeline DistilBertForSequenceClassification from cobegreene +author: John Snow Labs +name: burmese_awesome_model_cobegreene_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_cobegreene_pipeline` is a English model originally trained by cobegreene. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_cobegreene_pipeline_en_5.5.0_3.0_1727154747838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_cobegreene_pipeline_en_5.5.0_3.0_1727154747838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_cobegreene_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_cobegreene_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_cobegreene_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cobegreene/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_hyungho_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_hyungho_en.md new file mode 100644 index 00000000000000..9bfa15d48fedf3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_hyungho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_hyungho DistilBertForSequenceClassification from Hyungho +author: John Snow Labs +name: burmese_awesome_model_hyungho +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_hyungho` is a English model originally trained by Hyungho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_hyungho_en_5.5.0_3.0_1727136938617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_hyungho_en_5.5.0_3.0_1727136938617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_hyungho","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_hyungho", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_hyungho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Hyungho/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_sharadakatla_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_sharadakatla_en.md new file mode 100644 index 00000000000000..b27fe35bb00966 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_sharadakatla_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_sharadakatla DistilBertForSequenceClassification from sharadakatla +author: John Snow Labs +name: burmese_awesome_model_sharadakatla +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_sharadakatla` is a English model originally trained by sharadakatla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sharadakatla_en_5.5.0_3.0_1727164826419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sharadakatla_en_5.5.0_3.0_1727164826419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_sharadakatla","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_sharadakatla", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_sharadakatla| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sharadakatla/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-busu_model_small_ro.md b/docs/_posts/ahmedlone127/2024-09-24-busu_model_small_ro.md new file mode 100644 index 00000000000000..e567ef21cf321a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-busu_model_small_ro.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Moldavian, Moldovan, Romanian busu_model_small WhisperForCTC from iulik-pisik +author: John Snow Labs +name: busu_model_small +date: 2024-09-24 +tags: [ro, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ro +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`busu_model_small` is a Moldavian, Moldovan, Romanian model originally trained by iulik-pisik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/busu_model_small_ro_5.5.0_3.0_1727144260430.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/busu_model_small_ro_5.5.0_3.0_1727144260430.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("busu_model_small","ro") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("busu_model_small", "ro") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|busu_model_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ro| +|Size:|1.7 GB| + +## References + +https://huggingface.co/iulik-pisik/busu_model_small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-cat_ner_spanish_3_en.md b/docs/_posts/ahmedlone127/2024-09-24-cat_ner_spanish_3_en.md new file mode 100644 index 00000000000000..b9c9c53e4bb8c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-cat_ner_spanish_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_ner_spanish_3 RoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_spanish_3 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_spanish_3` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_3_en_5.5.0_3.0_1727151167432.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_3_en_5.5.0_3.0_1727151167432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("cat_ner_spanish_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("cat_ner_spanish_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_spanish_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|462.3 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-es-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-cat_ner_spanish_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-cat_ner_spanish_3_pipeline_en.md new file mode 100644 index 00000000000000..79baf3391b44b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-cat_ner_spanish_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_ner_spanish_3_pipeline pipeline RoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_spanish_3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_spanish_3_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_3_pipeline_en_5.5.0_3.0_1727151191419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_3_pipeline_en_5.5.0_3.0_1727151191419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_ner_spanish_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_ner_spanish_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_spanish_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|462.3 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-es-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-classification_tagging_en.md b/docs/_posts/ahmedlone127/2024-09-24-classification_tagging_en.md new file mode 100644 index 00000000000000..1b5b29e0e19f9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-classification_tagging_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classification_tagging BertEmbeddings from kumarsonu +author: John Snow Labs +name: classification_tagging +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_tagging` is a English model originally trained by kumarsonu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_tagging_en_5.5.0_3.0_1727177672772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_tagging_en_5.5.0_3.0_1727177672772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("classification_tagging","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("classification_tagging","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_tagging| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/kumarsonu/Classification_Tagging \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_random_trimmed_en.md b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_random_trimmed_en.md new file mode 100644 index 00000000000000..d5e7ca366fec6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_random_trimmed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English code_search_codebert_base_random_trimmed RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_random_trimmed +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_random_trimmed` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_random_trimmed_en_5.5.0_3.0_1727150954940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_random_trimmed_en_5.5.0_3.0_1727150954940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_random_trimmed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_random_trimmed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_random_trimmed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_random_trimmed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_up_down_1_trimmed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_up_down_1_trimmed_pipeline_en.md new file mode 100644 index 00000000000000..426be3af133856 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_up_down_1_trimmed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English code_search_codebert_base_up_down_1_trimmed_pipeline pipeline RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_up_down_1_trimmed_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_up_down_1_trimmed_pipeline` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_up_down_1_trimmed_pipeline_en_5.5.0_3.0_1727139395640.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_up_down_1_trimmed_pipeline_en_5.5.0_3.0_1727139395640.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("code_search_codebert_base_up_down_1_trimmed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("code_search_codebert_base_up_down_1_trimmed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_up_down_1_trimmed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_up_down_1_trimmed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-cold_fusion_itr9_seed3_en.md b/docs/_posts/ahmedlone127/2024-09-24-cold_fusion_itr9_seed3_en.md new file mode 100644 index 00000000000000..56181b2f7aae36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-cold_fusion_itr9_seed3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr9_seed3 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr9_seed3 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr9_seed3` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr9_seed3_en_5.5.0_3.0_1727171208554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr9_seed3_en_5.5.0_3.0_1727171208554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr9_seed3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr9_seed3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr9_seed3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.9 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr9-seed3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-cold_fusion_itr9_seed3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-cold_fusion_itr9_seed3_pipeline_en.md new file mode 100644 index 00000000000000..f485406fc28310 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-cold_fusion_itr9_seed3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr9_seed3_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr9_seed3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr9_seed3_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr9_seed3_pipeline_en_5.5.0_3.0_1727171232747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr9_seed3_pipeline_en_5.5.0_3.0_1727171232747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr9_seed3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr9_seed3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr9_seed3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr9-seed3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-credit_card_collection_intent_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-credit_card_collection_intent_classification_pipeline_en.md new file mode 100644 index 00000000000000..48a1fb5aa12a85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-credit_card_collection_intent_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English credit_card_collection_intent_classification_pipeline pipeline DistilBertForSequenceClassification from PabitraJiban +author: John Snow Labs +name: credit_card_collection_intent_classification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`credit_card_collection_intent_classification_pipeline` is a English model originally trained by PabitraJiban. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/credit_card_collection_intent_classification_pipeline_en_5.5.0_3.0_1727137348683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/credit_card_collection_intent_classification_pipeline_en_5.5.0_3.0_1727137348683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("credit_card_collection_intent_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("credit_card_collection_intent_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|credit_card_collection_intent_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PabitraJiban/Credit-card-collection-intent-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deeppolicytracker_200k_en.md b/docs/_posts/ahmedlone127/2024-09-24-deeppolicytracker_200k_en.md new file mode 100644 index 00000000000000..4cda7546e51c74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deeppolicytracker_200k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deeppolicytracker_200k RoBertaEmbeddings from flavio-nakasato +author: John Snow Labs +name: deeppolicytracker_200k +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deeppolicytracker_200k` is a English model originally trained by flavio-nakasato. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deeppolicytracker_200k_en_5.5.0_3.0_1727169009922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deeppolicytracker_200k_en_5.5.0_3.0_1727169009922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("deeppolicytracker_200k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("deeppolicytracker_200k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deeppolicytracker_200k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|305.8 MB| + +## References + +https://huggingface.co/flavio-nakasato/deeppolicytracker_200k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deeppolicytracker_200k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-deeppolicytracker_200k_pipeline_en.md new file mode 100644 index 00000000000000..28fad635fbd3e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deeppolicytracker_200k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deeppolicytracker_200k_pipeline pipeline RoBertaEmbeddings from flavio-nakasato +author: John Snow Labs +name: deeppolicytracker_200k_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deeppolicytracker_200k_pipeline` is a English model originally trained by flavio-nakasato. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deeppolicytracker_200k_pipeline_en_5.5.0_3.0_1727169025830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deeppolicytracker_200k_pipeline_en_5.5.0_3.0_1727169025830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deeppolicytracker_200k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deeppolicytracker_200k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deeppolicytracker_200k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|305.8 MB| + +## References + +https://huggingface.co/flavio-nakasato/deeppolicytracker_200k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline_en.md new file mode 100644 index 00000000000000..f5e6af3642871e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline pipeline BertForQuestionAnswering from Moussab +author: John Snow Labs +name: deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline` is a English model originally trained by Moussab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline_en_5.5.0_3.0_1727176104687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline_en_5.5.0_3.0_1727176104687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepset_bert_base_cased_squad2_orkg_what_5e_05_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Moussab/deepset_bert-base-cased-squad2-orkg-what-5e-05 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-delivery_balanced_distilbert_base_uncased_v1_en.md b/docs/_posts/ahmedlone127/2024-09-24-delivery_balanced_distilbert_base_uncased_v1_en.md new file mode 100644 index 00000000000000..e2786cb5ef3a4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-delivery_balanced_distilbert_base_uncased_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English delivery_balanced_distilbert_base_uncased_v1 DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: delivery_balanced_distilbert_base_uncased_v1 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`delivery_balanced_distilbert_base_uncased_v1` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/delivery_balanced_distilbert_base_uncased_v1_en_5.5.0_3.0_1727137364883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/delivery_balanced_distilbert_base_uncased_v1_en_5.5.0_3.0_1727137364883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("delivery_balanced_distilbert_base_uncased_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("delivery_balanced_distilbert_base_uncased_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|delivery_balanced_distilbert_base_uncased_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/delivery-balanced-distilbert-base-uncased-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-delivery_balanced_distilbert_base_uncased_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-delivery_balanced_distilbert_base_uncased_v1_pipeline_en.md new file mode 100644 index 00000000000000..5385b601c2fd9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-delivery_balanced_distilbert_base_uncased_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English delivery_balanced_distilbert_base_uncased_v1_pipeline pipeline DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: delivery_balanced_distilbert_base_uncased_v1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`delivery_balanced_distilbert_base_uncased_v1_pipeline` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/delivery_balanced_distilbert_base_uncased_v1_pipeline_en_5.5.0_3.0_1727137377799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/delivery_balanced_distilbert_base_uncased_v1_pipeline_en_5.5.0_3.0_1727137377799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("delivery_balanced_distilbert_base_uncased_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("delivery_balanced_distilbert_base_uncased_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|delivery_balanced_distilbert_base_uncased_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/delivery-balanced-distilbert-base-uncased-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_en.md b/docs/_posts/ahmedlone127/2024-09-24-dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_en.md new file mode 100644 index 00000000000000..eb0ff465b8c88d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8 WhisperForCTC from rohitp1 +author: John Snow Labs +name: dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8 +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8` is a English model originally trained by rohitp1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_en_5.5.0_3.0_1727146591942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_en_5.5.0_3.0_1727146591942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.8 MB| + +## References + +https://huggingface.co/rohitp1/dgx1_whisper_base_finetune_teacher_no_noise_mozilla_100_epochs_batch_8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline_en.md new file mode 100644 index 00000000000000..062a203579537e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline pipeline WhisperForCTC from rohitp1 +author: John Snow Labs +name: dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline` is a English model originally trained by rohitp1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline_en_5.5.0_3.0_1727146625621.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline_en_5.5.0_3.0_1727146625621.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dgx1_whisper_base_finetune_teacher_norwegian_noise_mozilla_100_epochs_batch_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.9 MB| + +## References + +https://huggingface.co/rohitp1/dgx1_whisper_base_finetune_teacher_no_noise_mozilla_100_epochs_batch_8 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-dictabert_large_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-24-dictabert_large_pipeline_he.md new file mode 100644 index 00000000000000..67bb1e33d34b6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-dictabert_large_pipeline_he.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hebrew dictabert_large_pipeline pipeline BertEmbeddings from dicta-il +author: John Snow Labs +name: dictabert_large_pipeline +date: 2024-09-24 +tags: [he, open_source, pipeline, onnx] +task: Embeddings +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dictabert_large_pipeline` is a Hebrew model originally trained by dicta-il. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dictabert_large_pipeline_he_5.5.0_3.0_1727174390044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dictabert_large_pipeline_he_5.5.0_3.0_1727174390044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dictabert_large_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dictabert_large_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dictabert_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|1.0 GB| + +## References + +https://huggingface.co/dicta-il/dictabert-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-discourse_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-discourse_model_pipeline_en.md new file mode 100644 index 00000000000000..70b3098c9e863f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-discourse_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English discourse_model_pipeline pipeline RoBertaForSequenceClassification from lightcarrieson +author: John Snow Labs +name: discourse_model_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`discourse_model_pipeline` is a English model originally trained by lightcarrieson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/discourse_model_pipeline_en_5.5.0_3.0_1727171842123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/discourse_model_pipeline_en_5.5.0_3.0_1727171842123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("discourse_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("discourse_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|discourse_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.8 MB| + +## References + +https://huggingface.co/lightcarrieson/discourse_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline_en.md new file mode 100644 index 00000000000000..df67f58c1b660f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline pipeline DistilBertForSequenceClassification from Abhibeats95 +author: John Snow Labs +name: distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline` is a English model originally trained by Abhibeats95. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline_en_5.5.0_3.0_1727137184801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline_en_5.5.0_3.0_1727137184801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_5000_questions_gt_3_5epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Abhibeats95/distilbert-base-uncased-5000_questions_gt_3_5epochs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_fb_housing_posts_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_fb_housing_posts_pipeline_en.md new file mode 100644 index 00000000000000..d6e8a371d84b45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_fb_housing_posts_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_fb_housing_posts_pipeline pipeline DistilBertForSequenceClassification from hoaj +author: John Snow Labs +name: distilbert_base_uncased_fb_housing_posts_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_fb_housing_posts_pipeline` is a English model originally trained by hoaj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fb_housing_posts_pipeline_en_5.5.0_3.0_1727164375136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fb_housing_posts_pipeline_en_5.5.0_3.0_1727164375136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_fb_housing_posts_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_fb_housing_posts_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_fb_housing_posts_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hoaj/distilbert-base-uncased-fb-housing-posts + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetune_six_emotions_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetune_six_emotions_en.md new file mode 100644 index 00000000000000..4731e1c0a5f381 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetune_six_emotions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetune_six_emotions DistilBertForSequenceClassification from Logicloom44 +author: John Snow Labs +name: distilbert_base_uncased_finetune_six_emotions +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetune_six_emotions` is a English model originally trained by Logicloom44. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetune_six_emotions_en_5.5.0_3.0_1727136928209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetune_six_emotions_en_5.5.0_3.0_1727136928209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetune_six_emotions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetune_six_emotions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetune_six_emotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Logicloom44/distilbert-base-uncased-finetune-six-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_en.md new file mode 100644 index 00000000000000..1e9558b5c03723 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_nachikethmurthy666 DistilBertForSequenceClassification from nachikethmurthy666 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_nachikethmurthy666 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_nachikethmurthy666` is a English model originally trained by nachikethmurthy666. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_en_5.5.0_3.0_1727136820558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_en_5.5.0_3.0_1727136820558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_nachikethmurthy666","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_nachikethmurthy666", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_nachikethmurthy666| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/nachikethmurthy666/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_pbruna_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_pbruna_en.md new file mode 100644 index 00000000000000..6f9b123ad4788e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_pbruna_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_pbruna DistilBertForSequenceClassification from pbruna +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_pbruna +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_pbruna` is a English model originally trained by pbruna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_pbruna_en_5.5.0_3.0_1727154503765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_pbruna_en_5.5.0_3.0_1727154503765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_pbruna","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_pbruna", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_pbruna| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/pbruna/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_pbruna_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_pbruna_pipeline_en.md new file mode 100644 index 00000000000000..feba3bfa24b155 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_pbruna_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_pbruna_pipeline pipeline DistilBertForSequenceClassification from pbruna +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_pbruna_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_pbruna_pipeline` is a English model originally trained by pbruna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_pbruna_pipeline_en_5.5.0_3.0_1727154516524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_pbruna_pipeline_en_5.5.0_3.0_1727154516524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_pbruna_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_pbruna_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_pbruna_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/pbruna/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_againeureka_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_againeureka_en.md new file mode 100644 index 00000000000000..5e86f9bf4fa218 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_againeureka_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_againeureka DistilBertForSequenceClassification from againeureka +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_againeureka +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_againeureka` is a English model originally trained by againeureka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_againeureka_en_5.5.0_3.0_1727164479442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_againeureka_en_5.5.0_3.0_1727164479442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_againeureka","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_againeureka", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_againeureka| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/againeureka/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_robuved_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_robuved_pipeline_en.md new file mode 100644 index 00000000000000..79f9c72bcd42c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_robuved_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_robuved_pipeline pipeline DistilBertForSequenceClassification from robuved +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_robuved_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_robuved_pipeline` is a English model originally trained by robuved. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_robuved_pipeline_en_5.5.0_3.0_1727154854934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_robuved_pipeline_en_5.5.0_3.0_1727154854934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_robuved_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_robuved_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_robuved_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/robuved/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_wy3106714391_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_wy3106714391_en.md new file mode 100644 index 00000000000000..47031ced5dda4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_wy3106714391_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_wy3106714391 DistilBertForSequenceClassification from wy3106714391 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_wy3106714391 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_wy3106714391` is a English model originally trained by wy3106714391. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_wy3106714391_en_5.5.0_3.0_1727164141876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_wy3106714391_en_5.5.0_3.0_1727164141876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_wy3106714391","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_wy3106714391", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_wy3106714391| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wy3106714391/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_jlsurdilla_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_jlsurdilla_en.md new file mode 100644 index 00000000000000..e9d12dde9c0e78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_jlsurdilla_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jlsurdilla DistilBertForSequenceClassification from jlsurdilla +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jlsurdilla +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jlsurdilla` is a English model originally trained by jlsurdilla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jlsurdilla_en_5.5.0_3.0_1727137165572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jlsurdilla_en_5.5.0_3.0_1727137165572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jlsurdilla","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jlsurdilla", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jlsurdilla| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jlsurdilla/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_ryli_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_ryli_en.md new file mode 100644 index 00000000000000..1febda74758e90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_ryli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ryli DistilBertForSequenceClassification from ryli +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ryli +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ryli` is a English model originally trained by ryli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ryli_en_5.5.0_3.0_1727137053294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ryli_en_5.5.0_3.0_1727137053294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ryli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ryli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ryli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ryli/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_sapkpa1_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_sapkpa1_en.md new file mode 100644 index 00000000000000..0d90712e2670b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_sapkpa1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_sapkpa1 DistilBertForSequenceClassification from sapkpa1 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_sapkpa1 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_sapkpa1` is a English model originally trained by sapkpa1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sapkpa1_en_5.5.0_3.0_1727136821822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sapkpa1_en_5.5.0_3.0_1727136821822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_sapkpa1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_sapkpa1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_sapkpa1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sapkpa1/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_yukky777_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_yukky777_en.md new file mode 100644 index 00000000000000..159ff06450342a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_yukky777_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yukky777 DistilBertForSequenceClassification from yukky777 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yukky777 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yukky777` is a English model originally trained by yukky777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yukky777_en_5.5.0_3.0_1727137467123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yukky777_en_5.5.0_3.0_1727137467123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yukky777","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yukky777", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yukky777| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yukky777/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_tweet_eval_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_tweet_eval_sentiment_en.md new file mode 100644 index 00000000000000..085d931739b686 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_tweet_eval_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_tweet_eval_sentiment DistilBertForSequenceClassification from HSIEN1009 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_tweet_eval_sentiment +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_tweet_eval_sentiment` is a English model originally trained by HSIEN1009. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweet_eval_sentiment_en_5.5.0_3.0_1727137570075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweet_eval_sentiment_en_5.5.0_3.0_1727137570075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_tweet_eval_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_tweet_eval_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_tweet_eval_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/HSIEN1009/distilbert-base-uncased-finetuned-tweet_eval_sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..dab778374da199 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline pipeline DistilBertForSequenceClassification from HSIEN1009 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline` is a English model originally trained by HSIEN1009. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline_en_5.5.0_3.0_1727137582597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline_en_5.5.0_3.0_1727137582597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_tweet_eval_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/HSIEN1009/distilbert-base-uncased-finetuned-tweet_eval_sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_mbib_2048_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_mbib_2048_en.md new file mode 100644 index 00000000000000..1be97635947410 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_mbib_2048_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_mbib_2048 DistilBertForSequenceClassification from ANGKJ1995 +author: John Snow Labs +name: distilbert_base_uncased_mbib_2048 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_mbib_2048` is a English model originally trained by ANGKJ1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_mbib_2048_en_5.5.0_3.0_1727154396211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_mbib_2048_en_5.5.0_3.0_1727154396211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_mbib_2048","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_mbib_2048", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_mbib_2048| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ANGKJ1995/distilbert-base-uncased-mbib-2048 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_mbib_2048_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_mbib_2048_pipeline_en.md new file mode 100644 index 00000000000000..3d9d76fde9dcc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_mbib_2048_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_mbib_2048_pipeline pipeline DistilBertForSequenceClassification from ANGKJ1995 +author: John Snow Labs +name: distilbert_base_uncased_mbib_2048_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_mbib_2048_pipeline` is a English model originally trained by ANGKJ1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_mbib_2048_pipeline_en_5.5.0_3.0_1727154410370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_mbib_2048_pipeline_en_5.5.0_3.0_1727154410370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_mbib_2048_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_mbib_2048_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_mbib_2048_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ANGKJ1995/distilbert-base-uncased-mbib-2048 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_en.md new file mode 100644 index 00000000000000..6f20a815b7f118 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_en_5.5.0_3.0_1727164142561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_en_5.5.0_3.0_1727164142561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_6e4exps_0strandom42sd_ut72ut5_PLPrefix0stlarge42_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline_en.md new file mode 100644 index 00000000000000..003ba90a8b59fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline_en_5.5.0_3.0_1727154400872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline_en_5.5.0_3.0_1727154400872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st30sd_ut72ut1large30PfxNf_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_coping_replies_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_coping_replies_en.md new file mode 100644 index 00000000000000..542714b32af327 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_coping_replies_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_coping_replies DistilBertForSequenceClassification from coping-appraisal +author: John Snow Labs +name: distilbert_coping_replies +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_coping_replies` is a English model originally trained by coping-appraisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_coping_replies_en_5.5.0_3.0_1727154263199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_coping_replies_en_5.5.0_3.0_1727154263199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_coping_replies","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_coping_replies", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_coping_replies| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/coping-appraisal/distilbert-coping-replies \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_ebit_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ebit_pipeline_en.md new file mode 100644 index 00000000000000..5eb47a61ca777d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ebit_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_ebit_pipeline pipeline DistilBertForSequenceClassification from lenguyen +author: John Snow Labs +name: distilbert_ebit_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_ebit_pipeline` is a English model originally trained by lenguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_ebit_pipeline_en_5.5.0_3.0_1727164544841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_ebit_pipeline_en_5.5.0_3.0_1727164544841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_ebit_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_ebit_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_ebit_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.0 MB| + +## References + +https://huggingface.co/lenguyen/distilbert_EBIT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotion_patdj_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotion_patdj_en.md new file mode 100644 index 00000000000000..a838ccb8b29fb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotion_patdj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_patdj DistilBertForSequenceClassification from PatDJ +author: John Snow Labs +name: distilbert_emotion_patdj +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_patdj` is a English model originally trained by PatDJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_patdj_en_5.5.0_3.0_1727154388011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_patdj_en_5.5.0_3.0_1727154388011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_patdj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_patdj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_patdj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PatDJ/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotions_clf_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotions_clf_en.md new file mode 100644 index 00000000000000..b3cd03238129e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotions_clf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotions_clf DistilBertForSequenceClassification from eduardo-alvarez +author: John Snow Labs +name: distilbert_emotions_clf +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotions_clf` is a English model originally trained by eduardo-alvarez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotions_clf_en_5.5.0_3.0_1727136819732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotions_clf_en_5.5.0_3.0_1727136819732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotions_clf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotions_clf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotions_clf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eduardo-alvarez/distilbert-emotions-clf \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotions_clf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotions_clf_pipeline_en.md new file mode 100644 index 00000000000000..217a6b300019f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_emotions_clf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotions_clf_pipeline pipeline DistilBertForSequenceClassification from eduardo-alvarez +author: John Snow Labs +name: distilbert_emotions_clf_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotions_clf_pipeline` is a English model originally trained by eduardo-alvarez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotions_clf_pipeline_en_5.5.0_3.0_1727136834081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotions_clf_pipeline_en_5.5.0_3.0_1727136834081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotions_clf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotions_clf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotions_clf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eduardo-alvarez/distilbert-emotions-clf + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_essays_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_essays_pipeline_en.md new file mode 100644 index 00000000000000..a98e94f4a4db4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_essays_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_essays_pipeline pipeline DistilBertForSequenceClassification from Bimarshad +author: John Snow Labs +name: distilbert_essays_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_essays_pipeline` is a English model originally trained by Bimarshad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_essays_pipeline_en_5.5.0_3.0_1727137592482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_essays_pipeline_en_5.5.0_3.0_1727137592482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_essays_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_essays_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_essays_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Bimarshad/distilbert.essays + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_ethics_test_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ethics_test_en.md new file mode 100644 index 00000000000000..0ccdc1d061b755 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ethics_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_ethics_test DistilBertForSequenceClassification from harplyon +author: John Snow Labs +name: distilbert_ethics_test +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_ethics_test` is a English model originally trained by harplyon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_ethics_test_en_5.5.0_3.0_1727154263197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_ethics_test_en_5.5.0_3.0_1727154263197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_ethics_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_ethics_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_ethics_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/harplyon/distilbert-ethics-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_ethics_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ethics_test_pipeline_en.md new file mode 100644 index 00000000000000..802251342b16d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ethics_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_ethics_test_pipeline pipeline DistilBertForSequenceClassification from harplyon +author: John Snow Labs +name: distilbert_ethics_test_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_ethics_test_pipeline` is a English model originally trained by harplyon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_ethics_test_pipeline_en_5.5.0_3.0_1727154277585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_ethics_test_pipeline_en_5.5.0_3.0_1727154277585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_ethics_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_ethics_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_ethics_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/harplyon/distilbert-ethics-test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_emotion_pt_sk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_emotion_pt_sk_pipeline_en.md new file mode 100644 index 00000000000000..e4976bb214e136 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_emotion_pt_sk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_emotion_pt_sk_pipeline pipeline DistilBertForSequenceClassification from pt-sk +author: John Snow Labs +name: distilbert_finetuned_emotion_pt_sk_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_emotion_pt_sk_pipeline` is a English model originally trained by pt-sk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_emotion_pt_sk_pipeline_en_5.5.0_3.0_1727137410016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_emotion_pt_sk_pipeline_en_5.5.0_3.0_1727137410016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_emotion_pt_sk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_emotion_pt_sk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_emotion_pt_sk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pt-sk/distilbert-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_hatespeech_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_hatespeech_en.md new file mode 100644 index 00000000000000..ce001c2166459b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_hatespeech_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_hatespeech DistilBertForSequenceClassification from ayln +author: John Snow Labs +name: distilbert_finetuned_hatespeech +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_hatespeech` is a English model originally trained by ayln. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_hatespeech_en_5.5.0_3.0_1727164471587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_hatespeech_en_5.5.0_3.0_1727164471587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_hatespeech","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_hatespeech", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_hatespeech| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ayln/distilbert_finetuned_hatespeech \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline_en.md new file mode 100644 index 00000000000000..dc08538619a310 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline_en_5.5.0_3.0_1727137286655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline_en_5.5.0_3.0_1727137286655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_stsb_256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_v1_b_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_v1_b_pipeline_en.md new file mode 100644 index 00000000000000..f5cb6debf8f5bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_v1_b_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_v1_b_pipeline pipeline DistilBertForSequenceClassification from sheduele +author: John Snow Labs +name: distilbert_v1_b_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_v1_b_pipeline` is a English model originally trained by sheduele. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_v1_b_pipeline_en_5.5.0_3.0_1727164425129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_v1_b_pipeline_en_5.5.0_3.0_1727164425129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_v1_b_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_v1_b_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_v1_b_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/sheduele/distilbert_v1_b + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_abr_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_abr_en.md new file mode 100644 index 00000000000000..21ed9fb3cb7c50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_abr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_finetuned_abr RoBertaEmbeddings from Transabrar +author: John Snow Labs +name: distilroberta_base_finetuned_abr +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_abr` is a English model originally trained by Transabrar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_abr_en_5.5.0_3.0_1727169121871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_abr_en_5.5.0_3.0_1727169121871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_abr","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_abr","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_abr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Transabrar/distilroberta-base-finetuned-abr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_abr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_abr_pipeline_en.md new file mode 100644 index 00000000000000..0e48b35d2cdf20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_abr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_abr_pipeline pipeline RoBertaEmbeddings from Transabrar +author: John Snow Labs +name: distilroberta_base_finetuned_abr_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_abr_pipeline` is a English model originally trained by Transabrar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_abr_pipeline_en_5.5.0_3.0_1727169141026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_abr_pipeline_en_5.5.0_3.0_1727169141026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_abr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_abr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_abr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Transabrar/distilroberta-base-finetuned-abr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_0409nnn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_0409nnn_pipeline_en.md new file mode 100644 index 00000000000000..dcb38da9572384 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_0409nnn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_wikitext2_0409nnn_pipeline pipeline RoBertaEmbeddings from ntust0 +author: John Snow Labs +name: distilroberta_base_finetuned_wikitext2_0409nnn_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_wikitext2_0409nnn_pipeline` is a English model originally trained by ntust0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_0409nnn_pipeline_en_5.5.0_3.0_1727168803569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_0409nnn_pipeline_en_5.5.0_3.0_1727168803569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_wikitext2_0409nnn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_wikitext2_0409nnn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_wikitext2_0409nnn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/ntust0/distilroberta-base-finetuned-wikitext2-0409nnn + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_aekang12_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_aekang12_en.md new file mode 100644 index 00000000000000..1071bb9791913e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_aekang12_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_finetuned_wikitext2_aekang12 RoBertaEmbeddings from aekang12 +author: John Snow Labs +name: distilroberta_base_finetuned_wikitext2_aekang12 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_wikitext2_aekang12` is a English model originally trained by aekang12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_aekang12_en_5.5.0_3.0_1727168669067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_aekang12_en_5.5.0_3.0_1727168669067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_wikitext2_aekang12","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_wikitext2_aekang12","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_wikitext2_aekang12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/aekang12/distilroberta-base-finetuned-wikitext2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_ft_4chan_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_ft_4chan_en.md new file mode 100644 index 00000000000000..4a0b99c8212abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_ft_4chan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_4chan RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_4chan +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_4chan` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_4chan_en_5.5.0_3.0_1727168947325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_4chan_en_5.5.0_3.0_1727168947325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_4chan","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_4chan","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_4chan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-4chan \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberts_base_mrpc_glue_jeraldflowers_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberts_base_mrpc_glue_jeraldflowers_en.md new file mode 100644 index 00000000000000..06dd0a1ac89519 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberts_base_mrpc_glue_jeraldflowers_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberts_base_mrpc_glue_jeraldflowers RoBertaForSequenceClassification from jeraldflowers +author: John Snow Labs +name: distilroberts_base_mrpc_glue_jeraldflowers +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberts_base_mrpc_glue_jeraldflowers` is a English model originally trained by jeraldflowers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberts_base_mrpc_glue_jeraldflowers_en_5.5.0_3.0_1727171242683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberts_base_mrpc_glue_jeraldflowers_en_5.5.0_3.0_1727171242683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberts_base_mrpc_glue_jeraldflowers","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberts_base_mrpc_glue_jeraldflowers", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberts_base_mrpc_glue_jeraldflowers| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/jeraldflowers/distilroberts-base-mrpc-glue-jeraldflowers \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e4a_covid_question_answering_en.md b/docs/_posts/ahmedlone127/2024-09-24-e4a_covid_question_answering_en.md new file mode 100644 index 00000000000000..146fc37cdf7135 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e4a_covid_question_answering_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English e4a_covid_question_answering BertForQuestionAnswering from racai +author: John Snow Labs +name: e4a_covid_question_answering +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e4a_covid_question_answering` is a English model originally trained by racai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e4a_covid_question_answering_en_5.5.0_3.0_1727175350127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e4a_covid_question_answering_en_5.5.0_3.0_1727175350127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("e4a_covid_question_answering","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("e4a_covid_question_answering", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e4a_covid_question_answering| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/racai/e4a-covid-question-answering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e4a_covid_question_answering_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-e4a_covid_question_answering_pipeline_en.md new file mode 100644 index 00000000000000..f16dc162e94f2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e4a_covid_question_answering_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English e4a_covid_question_answering_pipeline pipeline BertForQuestionAnswering from racai +author: John Snow Labs +name: e4a_covid_question_answering_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e4a_covid_question_answering_pipeline` is a English model originally trained by racai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e4a_covid_question_answering_pipeline_en_5.5.0_3.0_1727175376922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e4a_covid_question_answering_pipeline_en_5.5.0_3.0_1727175376922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("e4a_covid_question_answering_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("e4a_covid_question_answering_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e4a_covid_question_answering_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/racai/e4a-covid-question-answering + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-emotion_analysis_en.md b/docs/_posts/ahmedlone127/2024-09-24-emotion_analysis_en.md new file mode 100644 index 00000000000000..c446469975c882 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-emotion_analysis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotion_analysis DistilBertForSequenceClassification from erlend123 +author: John Snow Labs +name: emotion_analysis +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_analysis` is a English model originally trained by erlend123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_analysis_en_5.5.0_3.0_1727154263205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_analysis_en_5.5.0_3.0_1727154263205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_analysis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_analysis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_analysis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/erlend123/emotion-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-emotion_analysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-emotion_analysis_pipeline_en.md new file mode 100644 index 00000000000000..a963ba80c6dc40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-emotion_analysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_analysis_pipeline pipeline DistilBertForSequenceClassification from erlend123 +author: John Snow Labs +name: emotion_analysis_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_analysis_pipeline` is a English model originally trained by erlend123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_analysis_pipeline_en_5.5.0_3.0_1727154284410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_analysis_pipeline_en_5.5.0_3.0_1727154284410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_analysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_analysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/erlend123/emotion-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-emotion_vangmayy_en.md b/docs/_posts/ahmedlone127/2024-09-24-emotion_vangmayy_en.md new file mode 100644 index 00000000000000..5b365671ac11ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-emotion_vangmayy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotion_vangmayy DistilBertForSequenceClassification from Vangmayy +author: John Snow Labs +name: emotion_vangmayy +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_vangmayy` is a English model originally trained by Vangmayy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_vangmayy_en_5.5.0_3.0_1727137293163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_vangmayy_en_5.5.0_3.0_1727137293163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_vangmayy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_vangmayy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_vangmayy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Vangmayy/emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-fakenews_classifier_nela_gt_en.md b/docs/_posts/ahmedlone127/2024-09-24-fakenews_classifier_nela_gt_en.md new file mode 100644 index 00000000000000..541fc68cf64432 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-fakenews_classifier_nela_gt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fakenews_classifier_nela_gt RoBertaForSequenceClassification from newsmediabias +author: John Snow Labs +name: fakenews_classifier_nela_gt +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_classifier_nela_gt` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_classifier_nela_gt_en_5.5.0_3.0_1727171381995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_classifier_nela_gt_en_5.5.0_3.0_1727171381995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("fakenews_classifier_nela_gt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("fakenews_classifier_nela_gt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_classifier_nela_gt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/newsmediabias/FakeNews-Classifier-NELA-GT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-fakenews_classifier_nela_gt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-fakenews_classifier_nela_gt_pipeline_en.md new file mode 100644 index 00000000000000..cb585ce19927db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-fakenews_classifier_nela_gt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fakenews_classifier_nela_gt_pipeline pipeline RoBertaForSequenceClassification from newsmediabias +author: John Snow Labs +name: fakenews_classifier_nela_gt_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_classifier_nela_gt_pipeline` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_classifier_nela_gt_pipeline_en_5.5.0_3.0_1727171404886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_classifier_nela_gt_pipeline_en_5.5.0_3.0_1727171404886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fakenews_classifier_nela_gt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fakenews_classifier_nela_gt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_classifier_nela_gt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/newsmediabias/FakeNews-Classifier-NELA-GT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finbert_ner_en.md b/docs/_posts/ahmedlone127/2024-09-24-finbert_ner_en.md new file mode 100644 index 00000000000000..99f0b8eae0067f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finbert_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finbert_ner BertForTokenClassification from Rupesh2 +author: John Snow Labs +name: finbert_ner +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finbert_ner` is a English model originally trained by Rupesh2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finbert_ner_en_5.5.0_3.0_1727196324312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finbert_ner_en_5.5.0_3.0_1727196324312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("finbert_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("finbert_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finbert_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/Rupesh2/finbert-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetune_whisper_tiny_malay_singlish_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetune_whisper_tiny_malay_singlish_en.md new file mode 100644 index 00000000000000..2131efe6e499ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetune_whisper_tiny_malay_singlish_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English finetune_whisper_tiny_malay_singlish WhisperForCTC from mesolitica +author: John Snow Labs +name: finetune_whisper_tiny_malay_singlish +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_whisper_tiny_malay_singlish` is a English model originally trained by mesolitica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_whisper_tiny_malay_singlish_en_5.5.0_3.0_1727144205921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_whisper_tiny_malay_singlish_en_5.5.0_3.0_1727144205921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("finetune_whisper_tiny_malay_singlish","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("finetune_whisper_tiny_malay_singlish", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_whisper_tiny_malay_singlish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|378.7 MB| + +## References + +https://huggingface.co/mesolitica/finetune-whisper-tiny-ms-singlish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetune_whisper_tiny_malay_singlish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetune_whisper_tiny_malay_singlish_pipeline_en.md new file mode 100644 index 00000000000000..34f86b0f227b89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetune_whisper_tiny_malay_singlish_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English finetune_whisper_tiny_malay_singlish_pipeline pipeline WhisperForCTC from mesolitica +author: John Snow Labs +name: finetune_whisper_tiny_malay_singlish_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_whisper_tiny_malay_singlish_pipeline` is a English model originally trained by mesolitica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_whisper_tiny_malay_singlish_pipeline_en_5.5.0_3.0_1727144232676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_whisper_tiny_malay_singlish_pipeline_en_5.5.0_3.0_1727144232676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetune_whisper_tiny_malay_singlish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetune_whisper_tiny_malay_singlish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_whisper_tiny_malay_singlish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|378.7 MB| + +## References + +https://huggingface.co/mesolitica/finetune-whisper-tiny-ms-singlish + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuned_demo_2_shardev_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuned_demo_2_shardev_en.md new file mode 100644 index 00000000000000..b6e507b5440b5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuned_demo_2_shardev_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_demo_2_shardev DistilBertForSequenceClassification from Shardev +author: John Snow Labs +name: finetuned_demo_2_shardev +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_demo_2_shardev` is a English model originally trained by Shardev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_shardev_en_5.5.0_3.0_1727164141723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_shardev_en_5.5.0_3.0_1727164141723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_2_shardev","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_2_shardev", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_demo_2_shardev| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/Shardev/finetuned_demo_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuned_distilroberta_base_semeval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuned_distilroberta_base_semeval_pipeline_en.md new file mode 100644 index 00000000000000..2ea0b8373bc7d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuned_distilroberta_base_semeval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_distilroberta_base_semeval_pipeline pipeline RoBertaForSequenceClassification from Youssef320 +author: John Snow Labs +name: finetuned_distilroberta_base_semeval_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_distilroberta_base_semeval_pipeline` is a English model originally trained by Youssef320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_distilroberta_base_semeval_pipeline_en_5.5.0_3.0_1727172137087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_distilroberta_base_semeval_pipeline_en_5.5.0_3.0_1727172137087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_distilroberta_base_semeval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_distilroberta_base_semeval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_distilroberta_base_semeval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.9 MB| + +## References + +https://huggingface.co/Youssef320/finetuned-distilroberta-base-SemEval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetunedemotionmodel_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetunedemotionmodel_en.md new file mode 100644 index 00000000000000..651b6325330928 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetunedemotionmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetunedemotionmodel DistilBertForSequenceClassification from Rishabh3108 +author: John Snow Labs +name: finetunedemotionmodel +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetunedemotionmodel` is a English model originally trained by Rishabh3108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetunedemotionmodel_en_5.5.0_3.0_1727164242358.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetunedemotionmodel_en_5.5.0_3.0_1727164242358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetunedemotionmodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetunedemotionmodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetunedemotionmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Rishabh3108/finetunedemotionmodel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_analysis_model_3000_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_analysis_model_3000_en.md new file mode 100644 index 00000000000000..0527c202a11c96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_analysis_model_3000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_model_3000 DistilBertForSequenceClassification from gmvchile +author: John Snow Labs +name: finetuning_sentiment_analysis_model_3000 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_model_3000` is a English model originally trained by gmvchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_model_3000_en_5.5.0_3.0_1727154517311.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_model_3000_en_5.5.0_3.0_1727154517311.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_analysis_model_3000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_analysis_model_3000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_model_3000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gmvchile/finetuning-sentiment-analysis-model-3000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_analysis_model_3000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_analysis_model_3000_pipeline_en.md new file mode 100644 index 00000000000000..25c5111d48de9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_analysis_model_3000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_model_3000_pipeline pipeline DistilBertForSequenceClassification from gmvchile +author: John Snow Labs +name: finetuning_sentiment_analysis_model_3000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_model_3000_pipeline` is a English model originally trained by gmvchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_model_3000_pipeline_en_5.5.0_3.0_1727154531964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_model_3000_pipeline_en_5.5.0_3.0_1727154531964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_analysis_model_3000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_analysis_model_3000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_model_3000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gmvchile/finetuning-sentiment-analysis-model-3000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_dnzy_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_dnzy_en.md new file mode 100644 index 00000000000000..a2be4f3f3918c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_dnzy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_dnzy DistilBertForSequenceClassification from DNZY +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_dnzy +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_dnzy` is a English model originally trained by DNZY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dnzy_en_5.5.0_3.0_1727164728881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dnzy_en_5.5.0_3.0_1727164728881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_dnzy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_dnzy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_dnzy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DNZY/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_yudingwang_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_yudingwang_en.md new file mode 100644 index 00000000000000..56816ec02f03ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_yudingwang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_yudingwang DistilBertForSequenceClassification from YudingWang +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_yudingwang +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_yudingwang` is a English model originally trained by YudingWang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_yudingwang_en_5.5.0_3.0_1727137478682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_yudingwang_en_5.5.0_3.0_1727137478682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_yudingwang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_yudingwang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_yudingwang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/YudingWang/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline_en.md new file mode 100644 index 00000000000000..5c7d287a875ef1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline pipeline DistilBertForSequenceClassification from KurtBadelt +author: John Snow Labs +name: finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline` is a English model originally trained by KurtBadelt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline_en_5.5.0_3.0_1727154284385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline_en_5.5.0_3.0_1727154284385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3500_samples_train_kurtbadelt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KurtBadelt/finetuning-sentiment-model-3500-samples-train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline_en.md new file mode 100644 index 00000000000000..80a99b84638829 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline pipeline DistilBertForSequenceClassification from leonardosegurat +author: John Snow Labs +name: finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline` is a English model originally trained by leonardosegurat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline_en_5.5.0_3.0_1727137501296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline_en_5.5.0_3.0_1727137501296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_samples_leonardosegurat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/leonardosegurat/finetuning-sentiment-model-5000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-frabert_distilbert_base_uncased_augmented_en.md b/docs/_posts/ahmedlone127/2024-09-24-frabert_distilbert_base_uncased_augmented_en.md new file mode 100644 index 00000000000000..27ff9a22e324df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-frabert_distilbert_base_uncased_augmented_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English frabert_distilbert_base_uncased_augmented DistilBertForSequenceClassification from Francesco0101 +author: John Snow Labs +name: frabert_distilbert_base_uncased_augmented +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frabert_distilbert_base_uncased_augmented` is a English model originally trained by Francesco0101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frabert_distilbert_base_uncased_augmented_en_5.5.0_3.0_1727164141989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frabert_distilbert_base_uncased_augmented_en_5.5.0_3.0_1727164141989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("frabert_distilbert_base_uncased_augmented","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("frabert_distilbert_base_uncased_augmented", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frabert_distilbert_base_uncased_augmented| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Francesco0101/FRABERT-distilbert-base-uncased-AUGMENTED \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random0_seed2_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random0_seed2_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..c561a7d782168e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random0_seed2_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random0_seed2_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random0_seed2_bertweet_large_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random0_seed2_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed2_bertweet_large_pipeline_en_5.5.0_3.0_1727171820471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed2_bertweet_large_pipeline_en_5.5.0_3.0_1727171820471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random0_seed2_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random0_seed2_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random0_seed2_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random0_seed2-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-homophobicclassfication_roberta_large_finetuned_model2_en.md b/docs/_posts/ahmedlone127/2024-09-24-homophobicclassfication_roberta_large_finetuned_model2_en.md new file mode 100644 index 00000000000000..5161c18f6630dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-homophobicclassfication_roberta_large_finetuned_model2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English homophobicclassfication_roberta_large_finetuned_model2 RoBertaForSequenceClassification from conorgee +author: John Snow Labs +name: homophobicclassfication_roberta_large_finetuned_model2 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`homophobicclassfication_roberta_large_finetuned_model2` is a English model originally trained by conorgee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/homophobicclassfication_roberta_large_finetuned_model2_en_5.5.0_3.0_1727168146073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/homophobicclassfication_roberta_large_finetuned_model2_en_5.5.0_3.0_1727168146073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("homophobicclassfication_roberta_large_finetuned_model2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("homophobicclassfication_roberta_large_finetuned_model2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|homophobicclassfication_roberta_large_finetuned_model2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/conorgee/HomophobicClassfication_roberta-large_fineTuned_model2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-homophobicclassfication_roberta_large_finetuned_model2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-homophobicclassfication_roberta_large_finetuned_model2_pipeline_en.md new file mode 100644 index 00000000000000..4972b5c64c7c2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-homophobicclassfication_roberta_large_finetuned_model2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English homophobicclassfication_roberta_large_finetuned_model2_pipeline pipeline RoBertaForSequenceClassification from conorgee +author: John Snow Labs +name: homophobicclassfication_roberta_large_finetuned_model2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`homophobicclassfication_roberta_large_finetuned_model2_pipeline` is a English model originally trained by conorgee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/homophobicclassfication_roberta_large_finetuned_model2_pipeline_en_5.5.0_3.0_1727168228310.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/homophobicclassfication_roberta_large_finetuned_model2_pipeline_en_5.5.0_3.0_1727168228310.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("homophobicclassfication_roberta_large_finetuned_model2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("homophobicclassfication_roberta_large_finetuned_model2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|homophobicclassfication_roberta_large_finetuned_model2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/conorgee/HomophobicClassfication_roberta-large_fineTuned_model2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-imperialism_ner_en.md b/docs/_posts/ahmedlone127/2024-09-24-imperialism_ner_en.md new file mode 100644 index 00000000000000..a7bc15efca3a43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-imperialism_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imperialism_ner RoBertaForTokenClassification from matthewleechen +author: John Snow Labs +name: imperialism_ner +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imperialism_ner` is a English model originally trained by matthewleechen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imperialism_ner_en_5.5.0_3.0_1727150917558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imperialism_ner_en_5.5.0_3.0_1727150917558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("imperialism_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("imperialism_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imperialism_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/matthewleechen/imperialism-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-imperialism_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-imperialism_ner_pipeline_en.md new file mode 100644 index 00000000000000..c8c57103596a2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-imperialism_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imperialism_ner_pipeline pipeline RoBertaForTokenClassification from matthewleechen +author: John Snow Labs +name: imperialism_ner_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imperialism_ner_pipeline` is a English model originally trained by matthewleechen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imperialism_ner_pipeline_en_5.5.0_3.0_1727151001241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imperialism_ner_pipeline_en_5.5.0_3.0_1727151001241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imperialism_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imperialism_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imperialism_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/matthewleechen/imperialism-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-italian_legal_bert_finetuned_squad_italian_it.md b/docs/_posts/ahmedlone127/2024-09-24-italian_legal_bert_finetuned_squad_italian_it.md new file mode 100644 index 00000000000000..abb426e2138266 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-italian_legal_bert_finetuned_squad_italian_it.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Italian italian_legal_bert_finetuned_squad_italian BertForQuestionAnswering from Decre99 +author: John Snow Labs +name: italian_legal_bert_finetuned_squad_italian +date: 2024-09-24 +tags: [it, open_source, onnx, question_answering, bert] +task: Question Answering +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`italian_legal_bert_finetuned_squad_italian` is a Italian model originally trained by Decre99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/italian_legal_bert_finetuned_squad_italian_it_5.5.0_3.0_1727163557917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/italian_legal_bert_finetuned_squad_italian_it_5.5.0_3.0_1727163557917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("italian_legal_bert_finetuned_squad_italian","it") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("italian_legal_bert_finetuned_squad_italian", "it") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|italian_legal_bert_finetuned_squad_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|it| +|Size:|408.9 MB| + +## References + +https://huggingface.co/Decre99/Italian-Legal-BERT-finetuned-squad-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-italian_legal_bert_finetuned_squad_italian_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-24-italian_legal_bert_finetuned_squad_italian_pipeline_it.md new file mode 100644 index 00000000000000..db39f2164297b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-italian_legal_bert_finetuned_squad_italian_pipeline_it.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Italian italian_legal_bert_finetuned_squad_italian_pipeline pipeline BertForQuestionAnswering from Decre99 +author: John Snow Labs +name: italian_legal_bert_finetuned_squad_italian_pipeline +date: 2024-09-24 +tags: [it, open_source, pipeline, onnx] +task: Question Answering +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`italian_legal_bert_finetuned_squad_italian_pipeline` is a Italian model originally trained by Decre99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/italian_legal_bert_finetuned_squad_italian_pipeline_it_5.5.0_3.0_1727163579981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/italian_legal_bert_finetuned_squad_italian_pipeline_it_5.5.0_3.0_1727163579981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("italian_legal_bert_finetuned_squad_italian_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("italian_legal_bert_finetuned_squad_italian_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|italian_legal_bert_finetuned_squad_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|408.9 MB| + +## References + +https://huggingface.co/Decre99/Italian-Legal-BERT-finetuned-squad-it + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-jailbreak_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-24-jailbreak_classifier_en.md new file mode 100644 index 00000000000000..73c25d68b9f9fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-jailbreak_classifier_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English jailbreak_classifier BertForSequenceClassification from jackhhao +author: John Snow Labs +name: jailbreak_classifier +date: 2024-09-24 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jailbreak_classifier` is a English model originally trained by jackhhao. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jailbreak_classifier_en_5.5.0_3.0_1727149486052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jailbreak_classifier_en_5.5.0_3.0_1727149486052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = BertForSequenceClassification.pretrained("jailbreak_classifier","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("jailbreak_classifier","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jailbreak_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +References + +https://huggingface.co/jackhhao/jailbreak-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-jailbreak_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-jailbreak_classifier_pipeline_en.md new file mode 100644 index 00000000000000..4806cdfc122de2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-jailbreak_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jailbreak_classifier_pipeline pipeline BertForSequenceClassification from lordofthejars +author: John Snow Labs +name: jailbreak_classifier_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jailbreak_classifier_pipeline` is a English model originally trained by lordofthejars. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jailbreak_classifier_pipeline_en_5.5.0_3.0_1727149507802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jailbreak_classifier_pipeline_en_5.5.0_3.0_1727149507802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jailbreak_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jailbreak_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jailbreak_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/lordofthejars/jailbreak-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-joe_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-24-joe_roberta_en.md new file mode 100644 index 00000000000000..92bdd208ac8b93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-joe_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English joe_roberta RoBertaForSequenceClassification from Gikubu +author: John Snow Labs +name: joe_roberta +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`joe_roberta` is a English model originally trained by Gikubu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/joe_roberta_en_5.5.0_3.0_1727167617417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/joe_roberta_en_5.5.0_3.0_1727167617417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("joe_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("joe_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|joe_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.0 MB| + +## References + +https://huggingface.co/Gikubu/joe_roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-khipu_finetuned_amazon_reviews_multi_cpiana_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-khipu_finetuned_amazon_reviews_multi_cpiana_pipeline_en.md new file mode 100644 index 00000000000000..c3a1eee62c0e96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-khipu_finetuned_amazon_reviews_multi_cpiana_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English khipu_finetuned_amazon_reviews_multi_cpiana_pipeline pipeline RoBertaForSequenceClassification from cpiana +author: John Snow Labs +name: khipu_finetuned_amazon_reviews_multi_cpiana_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khipu_finetuned_amazon_reviews_multi_cpiana_pipeline` is a English model originally trained by cpiana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khipu_finetuned_amazon_reviews_multi_cpiana_pipeline_en_5.5.0_3.0_1727167329955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khipu_finetuned_amazon_reviews_multi_cpiana_pipeline_en_5.5.0_3.0_1727167329955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("khipu_finetuned_amazon_reviews_multi_cpiana_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("khipu_finetuned_amazon_reviews_multi_cpiana_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khipu_finetuned_amazon_reviews_multi_cpiana_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.9 MB| + +## References + +https://huggingface.co/cpiana/khipu-finetuned-amazon_reviews_multi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_en.md b/docs/_posts/ahmedlone127/2024-09-24-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_en.md new file mode 100644 index 00000000000000..0835043838713f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3 RoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_en_5.5.0_3.0_1727171180880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_en_5.5.0_3.0_1727171180880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.3 MB| + +## References + +https://huggingface.co/RogerB/kinyaRoberta-large-kinte-finetuned-kin-sent3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline_en.md new file mode 100644 index 00000000000000..014eb6a44babdb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline pipeline RoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline_en_5.5.0_3.0_1727171201410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline_en_5.5.0_3.0_1727171201410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kinyaroberta_large_kinte_finetuned_kinyarwanda_sent3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.3 MB| + +## References + +https://huggingface.co/RogerB/kinyaRoberta-large-kinte-finetuned-kin-sent3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-legal_bert_small_filtered_cuad_en.md b/docs/_posts/ahmedlone127/2024-09-24-legal_bert_small_filtered_cuad_en.md new file mode 100644 index 00000000000000..6e27fb91a9a4e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-legal_bert_small_filtered_cuad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English legal_bert_small_filtered_cuad BertForQuestionAnswering from alex-apostolo +author: John Snow Labs +name: legal_bert_small_filtered_cuad +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_bert_small_filtered_cuad` is a English model originally trained by alex-apostolo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_bert_small_filtered_cuad_en_5.5.0_3.0_1727175449083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_bert_small_filtered_cuad_en_5.5.0_3.0_1727175449083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("legal_bert_small_filtered_cuad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("legal_bert_small_filtered_cuad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_bert_small_filtered_cuad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|130.6 MB| + +## References + +https://huggingface.co/alex-apostolo/legal-bert-small-filtered-cuad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-lnmt15_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-lnmt15_pipeline_en.md new file mode 100644 index 00000000000000..ffb6fa284cda97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-lnmt15_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lnmt15_pipeline pipeline DistilBertForSequenceClassification from carmenlozano +author: John Snow Labs +name: lnmt15_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lnmt15_pipeline` is a English model originally trained by carmenlozano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lnmt15_pipeline_en_5.5.0_3.0_1727154851421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lnmt15_pipeline_en_5.5.0_3.0_1727154851421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lnmt15_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lnmt15_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lnmt15_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/carmenlozano/lnmt15 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-malayalam_qa_model_pipeline_ml.md b/docs/_posts/ahmedlone127/2024-09-24-malayalam_qa_model_pipeline_ml.md new file mode 100644 index 00000000000000..dfd857f9b83fd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-malayalam_qa_model_pipeline_ml.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Malayalam malayalam_qa_model_pipeline pipeline BertForQuestionAnswering from Anitha2020 +author: John Snow Labs +name: malayalam_qa_model_pipeline +date: 2024-09-24 +tags: [ml, open_source, pipeline, onnx] +task: Question Answering +language: ml +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malayalam_qa_model_pipeline` is a Malayalam model originally trained by Anitha2020. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malayalam_qa_model_pipeline_ml_5.5.0_3.0_1727163232993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malayalam_qa_model_pipeline_ml_5.5.0_3.0_1727163232993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("malayalam_qa_model_pipeline", lang = "ml") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("malayalam_qa_model_pipeline", lang = "ml") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malayalam_qa_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ml| +|Size:|890.5 MB| + +## References + +https://huggingface.co/Anitha2020/Malayalam_QA_model + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mbert_argmining_abstrct_english_spanish_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-mbert_argmining_abstrct_english_spanish_pipeline_es.md new file mode 100644 index 00000000000000..c02f37279a3b05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mbert_argmining_abstrct_english_spanish_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish mbert_argmining_abstrct_english_spanish_pipeline pipeline BertForTokenClassification from HiTZ +author: John Snow Labs +name: mbert_argmining_abstrct_english_spanish_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mbert_argmining_abstrct_english_spanish_pipeline` is a Castilian, Spanish model originally trained by HiTZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mbert_argmining_abstrct_english_spanish_pipeline_es_5.5.0_3.0_1727195893132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mbert_argmining_abstrct_english_spanish_pipeline_es_5.5.0_3.0_1727195893132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mbert_argmining_abstrct_english_spanish_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mbert_argmining_abstrct_english_spanish_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mbert_argmining_abstrct_english_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|665.1 MB| + +## References + +https://huggingface.co/HiTZ/mbert-argmining-abstrct-en-es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-memo_bert_wsd_memo_bert_danskbert_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-memo_bert_wsd_memo_bert_danskbert_last_pipeline_en.md new file mode 100644 index 00000000000000..8e6643f5f0d155 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-memo_bert_wsd_memo_bert_danskbert_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English memo_bert_wsd_memo_bert_danskbert_last_pipeline pipeline XlmRoBertaForSequenceClassification from yemen2016 +author: John Snow Labs +name: memo_bert_wsd_memo_bert_danskbert_last_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`memo_bert_wsd_memo_bert_danskbert_last_pipeline` is a English model originally trained by yemen2016. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/memo_bert_wsd_memo_bert_danskbert_last_pipeline_en_5.5.0_3.0_1727155837014.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/memo_bert_wsd_memo_bert_danskbert_last_pipeline_en_5.5.0_3.0_1727155837014.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("memo_bert_wsd_memo_bert_danskbert_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("memo_bert_wsd_memo_bert_danskbert_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|memo_bert_wsd_memo_bert_danskbert_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.3 MB| + +## References + +https://huggingface.co/yemen2016/MeMo_BERT-WSD-MeMo-BERT-DanskBERT_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-multi_label_classification_venkatarajendra_en.md b/docs/_posts/ahmedlone127/2024-09-24-multi_label_classification_venkatarajendra_en.md new file mode 100644 index 00000000000000..1a365ff8e0368a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-multi_label_classification_venkatarajendra_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English multi_label_classification_venkatarajendra RoBertaForSequenceClassification from venkatarajendra +author: John Snow Labs +name: multi_label_classification_venkatarajendra +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multi_label_classification_venkatarajendra` is a English model originally trained by venkatarajendra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multi_label_classification_venkatarajendra_en_5.5.0_3.0_1727171494775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multi_label_classification_venkatarajendra_en_5.5.0_3.0_1727171494775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("multi_label_classification_venkatarajendra","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("multi_label_classification_venkatarajendra", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multi_label_classification_venkatarajendra| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|457.1 MB| + +## References + +https://huggingface.co/venkatarajendra/multi-label-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-multilingual_xlm_roberta_for_ner_yvzplay2_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-multilingual_xlm_roberta_for_ner_yvzplay2_pipeline_xx.md new file mode 100644 index 00000000000000..d4ed0efac246f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-multilingual_xlm_roberta_for_ner_yvzplay2_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual multilingual_xlm_roberta_for_ner_yvzplay2_pipeline pipeline XlmRoBertaForTokenClassification from yvzplay2 +author: John Snow Labs +name: multilingual_xlm_roberta_for_ner_yvzplay2_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_xlm_roberta_for_ner_yvzplay2_pipeline` is a Multilingual model originally trained by yvzplay2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_yvzplay2_pipeline_xx_5.5.0_3.0_1727160807499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_yvzplay2_pipeline_xx_5.5.0_3.0_1727160807499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multilingual_xlm_roberta_for_ner_yvzplay2_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multilingual_xlm_roberta_for_ner_yvzplay2_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_xlm_roberta_for_ner_yvzplay2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|853.8 MB| + +## References + +https://huggingface.co/yvzplay2/multilingual-xlm-roberta-for-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_en.md b/docs/_posts/ahmedlone127/2024-09-24-multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_en.md new file mode 100644 index 00000000000000..42096c49264ee2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English multipleqg_full_ctxt_only_filtered_0_15_pubmedbert BertForQuestionAnswering from LeWince +author: John Snow Labs +name: multipleqg_full_ctxt_only_filtered_0_15_pubmedbert +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multipleqg_full_ctxt_only_filtered_0_15_pubmedbert` is a English model originally trained by LeWince. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_en_5.5.0_3.0_1727175563624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_en_5.5.0_3.0_1727175563624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("multipleqg_full_ctxt_only_filtered_0_15_pubmedbert","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("multipleqg_full_ctxt_only_filtered_0_15_pubmedbert", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multipleqg_full_ctxt_only_filtered_0_15_pubmedbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|53.7 MB| + +## References + +https://huggingface.co/LeWince/MultipleQG-Full_Ctxt_Only-filtered_0_15_PubMedBert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline_en.md new file mode 100644 index 00000000000000..31878ea2460834 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline pipeline BertForQuestionAnswering from LeWince +author: John Snow Labs +name: multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline` is a English model originally trained by LeWince. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline_en_5.5.0_3.0_1727175566545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline_en_5.5.0_3.0_1727175566545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multipleqg_full_ctxt_only_filtered_0_15_pubmedbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|53.7 MB| + +## References + +https://huggingface.co/LeWince/MultipleQG-Full_Ctxt_Only-filtered_0_15_PubMedBert + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mymodel_cased_en.md b/docs/_posts/ahmedlone127/2024-09-24-mymodel_cased_en.md new file mode 100644 index 00000000000000..36555058747233 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mymodel_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mymodel_cased DistilBertForSequenceClassification from AkhilGTom +author: John Snow Labs +name: mymodel_cased +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mymodel_cased` is a English model originally trained by AkhilGTom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mymodel_cased_en_5.5.0_3.0_1727154613983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mymodel_cased_en_5.5.0_3.0_1727154613983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("mymodel_cased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("mymodel_cased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mymodel_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/AkhilGTom/myModel_cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mymodel_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-mymodel_cased_pipeline_en.md new file mode 100644 index 00000000000000..c56eb84806916a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mymodel_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mymodel_cased_pipeline pipeline DistilBertForSequenceClassification from AkhilGTom +author: John Snow Labs +name: mymodel_cased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mymodel_cased_pipeline` is a English model originally trained by AkhilGTom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mymodel_cased_pipeline_en_5.5.0_3.0_1727154627542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mymodel_cased_pipeline_en_5.5.0_3.0_1727154627542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mymodel_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mymodel_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mymodel_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/AkhilGTom/myModel_cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mymodel_en.md b/docs/_posts/ahmedlone127/2024-09-24-mymodel_en.md new file mode 100644 index 00000000000000..3a84f89c60d9a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mymodel_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English mymodel BertEmbeddings from heima +author: John Snow Labs +name: mymodel +date: 2024-09-24 +tags: [bert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mymodel` is a English model originally trained by heima. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mymodel_en_5.5.0_3.0_1727171496587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mymodel_en_5.5.0_3.0_1727171496587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =BertEmbeddings.pretrained("mymodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = BertEmbeddings + .pretrained("mymodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mymodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.7 MB| + +## References + +References + +https://huggingface.co/heima/mymodel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-n_distilbert_sst5_padding0model_wyzhw_en.md b/docs/_posts/ahmedlone127/2024-09-24-n_distilbert_sst5_padding0model_wyzhw_en.md new file mode 100644 index 00000000000000..ef12632f5493c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-n_distilbert_sst5_padding0model_wyzhw_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_sst5_padding0model_wyzhw DistilBertForSequenceClassification from wyzhw +author: John Snow Labs +name: n_distilbert_sst5_padding0model_wyzhw +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding0model_wyzhw` is a English model originally trained by wyzhw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding0model_wyzhw_en_5.5.0_3.0_1727136932805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding0model_wyzhw_en_5.5.0_3.0_1727136932805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding0model_wyzhw","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding0model_wyzhw", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding0model_wyzhw| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wyzhw/N_distilbert_sst5_padding0model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-ndd_mantisbt_test_content_tags_en.md b/docs/_posts/ahmedlone127/2024-09-24-ndd_mantisbt_test_content_tags_en.md new file mode 100644 index 00000000000000..9b68af8d9b668f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-ndd_mantisbt_test_content_tags_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ndd_mantisbt_test_content_tags DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_mantisbt_test_content_tags +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_mantisbt_test_content_tags` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_content_tags_en_5.5.0_3.0_1727164635187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_content_tags_en_5.5.0_3.0_1727164635187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ndd_mantisbt_test_content_tags","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ndd_mantisbt_test_content_tags", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_mantisbt_test_content_tags| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-mantisbt_test-content_tags \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-ndd_mantisbt_test_content_tags_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-ndd_mantisbt_test_content_tags_pipeline_en.md new file mode 100644 index 00000000000000..fca31305b64250 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-ndd_mantisbt_test_content_tags_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ndd_mantisbt_test_content_tags_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_mantisbt_test_content_tags_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_mantisbt_test_content_tags_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_content_tags_pipeline_en_5.5.0_3.0_1727164648732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_content_tags_pipeline_en_5.5.0_3.0_1727164648732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ndd_mantisbt_test_content_tags_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ndd_mantisbt_test_content_tags_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_mantisbt_test_content_tags_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-mantisbt_test-content_tags + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-ner_ner_random2_seed2_roberta_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-ner_ner_random2_seed2_roberta_large_pipeline_en.md new file mode 100644 index 00000000000000..3b55f0f4b46b28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-ner_ner_random2_seed2_roberta_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_ner_random2_seed2_roberta_large_pipeline pipeline RoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random2_seed2_roberta_large_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random2_seed2_roberta_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_roberta_large_pipeline_en_5.5.0_3.0_1727151489651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_roberta_large_pipeline_en_5.5.0_3.0_1727151489651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_ner_random2_seed2_roberta_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_ner_random2_seed2_roberta_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random2_seed2_roberta_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random2_seed2-roberta-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-nerubios_roberta_base_bne_training_testing_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-nerubios_roberta_base_bne_training_testing_pipeline_en.md new file mode 100644 index 00000000000000..3f91dfc23de7a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-nerubios_roberta_base_bne_training_testing_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerubios_roberta_base_bne_training_testing_pipeline pipeline RoBertaForTokenClassification from ajtamayoh +author: John Snow Labs +name: nerubios_roberta_base_bne_training_testing_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerubios_roberta_base_bne_training_testing_pipeline` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerubios_roberta_base_bne_training_testing_pipeline_en_5.5.0_3.0_1727151576531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerubios_roberta_base_bne_training_testing_pipeline_en_5.5.0_3.0_1727151576531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerubios_roberta_base_bne_training_testing_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerubios_roberta_base_bne_training_testing_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerubios_roberta_base_bne_training_testing_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|437.6 MB| + +## References + +https://huggingface.co/ajtamayoh/NeRUBioS_RoBERTa_base_bne_Training_Testing + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-norwegian_repeats_en.md b/docs/_posts/ahmedlone127/2024-09-24-norwegian_repeats_en.md new file mode 100644 index 00000000000000..5f38dac92897bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-norwegian_repeats_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English norwegian_repeats XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: norwegian_repeats +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_repeats` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_repeats_en_5.5.0_3.0_1727174648394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_repeats_en_5.5.0_3.0_1727174648394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("norwegian_repeats","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("norwegian_repeats", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_repeats| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/no_repeats \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-nuner_v1_ontonotes5_en.md b/docs/_posts/ahmedlone127/2024-09-24-nuner_v1_ontonotes5_en.md new file mode 100644 index 00000000000000..95e2cd879f8fee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-nuner_v1_ontonotes5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nuner_v1_ontonotes5 RoBertaForTokenClassification from guishe +author: John Snow Labs +name: nuner_v1_ontonotes5 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nuner_v1_ontonotes5` is a English model originally trained by guishe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nuner_v1_ontonotes5_en_5.5.0_3.0_1727139834509.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nuner_v1_ontonotes5_en_5.5.0_3.0_1727139834509.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("nuner_v1_ontonotes5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("nuner_v1_ontonotes5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nuner_v1_ontonotes5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|453.8 MB| + +## References + +https://huggingface.co/guishe/nuner-v1_ontonotes5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-nuner_v1_ontonotes5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-nuner_v1_ontonotes5_pipeline_en.md new file mode 100644 index 00000000000000..663e00e4cda17e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-nuner_v1_ontonotes5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nuner_v1_ontonotes5_pipeline pipeline RoBertaForTokenClassification from guishe +author: John Snow Labs +name: nuner_v1_ontonotes5_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nuner_v1_ontonotes5_pipeline` is a English model originally trained by guishe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nuner_v1_ontonotes5_pipeline_en_5.5.0_3.0_1727139860961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nuner_v1_ontonotes5_pipeline_en_5.5.0_3.0_1727139860961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nuner_v1_ontonotes5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nuner_v1_ontonotes5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nuner_v1_ontonotes5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|453.8 MB| + +## References + +https://huggingface.co/guishe/nuner-v1_ontonotes5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_arabic_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_arabic_english_pipeline_en.md new file mode 100644 index 00000000000000..3a1d084975eeae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_arabic_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_arabic_english_pipeline pipeline MarianTransformer from finnstrom3693 +author: John Snow Labs +name: opus_maltese_arabic_english_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_arabic_english_pipeline` is a English model originally trained by finnstrom3693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_arabic_english_pipeline_en_5.5.0_3.0_1727166193635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_arabic_english_pipeline_en_5.5.0_3.0_1727166193635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_arabic_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_arabic_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_arabic_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|336.1 MB| + +## References + +https://huggingface.co/finnstrom3693/opus-mt-ar-en + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_arabic_en.md b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_arabic_en.md new file mode 100644 index 00000000000000..d959dab5b5bb2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_arabic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_arabic MarianTransformer from finnstrom3693 +author: John Snow Labs +name: opus_maltese_english_arabic +date: 2024-09-24 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_arabic` is a English model originally trained by finnstrom3693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_arabic_en_5.5.0_3.0_1727166110577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_arabic_en_5.5.0_3.0_1727166110577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_arabic","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_arabic","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_arabic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|336.6 MB| + +## References + +https://huggingface.co/finnstrom3693/opus-mt-en-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_arabic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_arabic_pipeline_en.md new file mode 100644 index 00000000000000..051fea77f9ecb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_arabic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_arabic_pipeline pipeline MarianTransformer from finnstrom3693 +author: John Snow Labs +name: opus_maltese_english_arabic_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_arabic_pipeline` is a English model originally trained by finnstrom3693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_arabic_pipeline_en_5.5.0_3.0_1727166203604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_arabic_pipeline_en_5.5.0_3.0_1727166203604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_arabic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_arabic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|337.2 MB| + +## References + +https://huggingface.co/finnstrom3693/opus-mt-en-ar + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_indonesian_en.md b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_indonesian_en.md new file mode 100644 index 00000000000000..a86460d4c33ca1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_indonesian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_indonesian MarianTransformer from finnstrom3693 +author: John Snow Labs +name: opus_maltese_english_indonesian +date: 2024-09-24 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_indonesian` is a English model originally trained by finnstrom3693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_en_5.5.0_3.0_1727166492980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_en_5.5.0_3.0_1727166492980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_indonesian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_indonesian","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_indonesian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|307.3 MB| + +## References + +https://huggingface.co/finnstrom3693/opus-mt-en-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-platzi_distilroberta_base_mrpc_glue_will_mendoza_en.md b/docs/_posts/ahmedlone127/2024-09-24-platzi_distilroberta_base_mrpc_glue_will_mendoza_en.md new file mode 100644 index 00000000000000..7a9db78dfa106a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-platzi_distilroberta_base_mrpc_glue_will_mendoza_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_will_mendoza RoBertaForSequenceClassification from willmendoza +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_will_mendoza +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_will_mendoza` is a English model originally trained by willmendoza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_will_mendoza_en_5.5.0_3.0_1727167752829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_will_mendoza_en_5.5.0_3.0_1727167752829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_will_mendoza","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_will_mendoza", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_will_mendoza| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/willmendoza/platzi-distilroberta-base-mrpc-glue-will-mendoza \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_en.md b/docs/_posts/ahmedlone127/2024-09-24-policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_en.md new file mode 100644 index 00000000000000..204ae166937096 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout RoBertaForTokenClassification from GiladH +author: John Snow Labs +name: policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout` is a English model originally trained by GiladH. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_en_5.5.0_3.0_1727150881095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_en_5.5.0_3.0_1727150881095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/GiladH/policy_pos_neg_2012_roberta_no_dropout \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline_en.md new file mode 100644 index 00000000000000..127a14dfdb1b03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline pipeline RoBertaForTokenClassification from GiladH +author: John Snow Labs +name: policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline` is a English model originally trained by GiladH. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline_en_5.5.0_3.0_1727150950330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline_en_5.5.0_3.0_1727150950330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|policy_sayula_popoluca_neg_2012_roberta_norwegian_dropout_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/GiladH/policy_pos_neg_2012_roberta_no_dropout + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline_en.md new file mode 100644 index 00000000000000..d6f3563ad95abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline pipeline XlmRoBertaForSequenceClassification from harish +author: John Snow Labs +name: portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline` is a English model originally trained by harish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline_en_5.5.0_3.0_1727153550979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline_en_5.5.0_3.0_1727153550979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|portuguese_up_xlmr_contextincluded_idiomexcluded_4_best_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|788.2 MB| + +## References + +https://huggingface.co/harish/PT-UP-xlmR-ContextIncluded_IdiomExcluded-4_BEST + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-prueba4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-prueba4_pipeline_en.md new file mode 100644 index 00000000000000..68e3acd18496e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-prueba4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English prueba4_pipeline pipeline RoBertaForSequenceClassification from Saul98lm +author: John Snow Labs +name: prueba4_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prueba4_pipeline` is a English model originally trained by Saul98lm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prueba4_pipeline_en_5.5.0_3.0_1727172020392.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prueba4_pipeline_en_5.5.0_3.0_1727172020392.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("prueba4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("prueba4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prueba4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Saul98lm/Prueba4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-python_code_comment_classification_en.md b/docs/_posts/ahmedlone127/2024-09-24-python_code_comment_classification_en.md new file mode 100644 index 00000000000000..ea87c9443966ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-python_code_comment_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English python_code_comment_classification BertEmbeddings from ZarahShibli +author: John Snow Labs +name: python_code_comment_classification +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`python_code_comment_classification` is a English model originally trained by ZarahShibli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/python_code_comment_classification_en_5.5.0_3.0_1727161834464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/python_code_comment_classification_en_5.5.0_3.0_1727161834464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("python_code_comment_classification","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("python_code_comment_classification","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|python_code_comment_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/ZarahShibli/python-code-comment-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-results_deberta_en.md b/docs/_posts/ahmedlone127/2024-09-24-results_deberta_en.md new file mode 100644 index 00000000000000..531e4f17198f91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-results_deberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English results_deberta DeBertaForSequenceClassification from Siddartha10 +author: John Snow Labs +name: results_deberta +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_deberta` is a English model originally trained by Siddartha10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_deberta_en_5.5.0_3.0_1727162438786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_deberta_en_5.5.0_3.0_1727162438786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("results_deberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("results_deberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_deberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|641.0 MB| + +## References + +https://huggingface.co/Siddartha10/results_deberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rinna_roberta_qa_arcd1_en.md b/docs/_posts/ahmedlone127/2024-09-24-rinna_roberta_qa_arcd1_en.md new file mode 100644 index 00000000000000..d9c4500471af9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rinna_roberta_qa_arcd1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English rinna_roberta_qa_arcd1 BertForQuestionAnswering from Echiguerkh +author: John Snow Labs +name: rinna_roberta_qa_arcd1 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rinna_roberta_qa_arcd1` is a English model originally trained by Echiguerkh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rinna_roberta_qa_arcd1_en_5.5.0_3.0_1727163851135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rinna_roberta_qa_arcd1_en_5.5.0_3.0_1727163851135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("rinna_roberta_qa_arcd1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("rinna_roberta_qa_arcd1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rinna_roberta_qa_arcd1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|222.1 MB| + +## References + +https://huggingface.co/Echiguerkh/rinna-roberta-qa-arcd1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rinna_roberta_qa_arcd1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-rinna_roberta_qa_arcd1_pipeline_en.md new file mode 100644 index 00000000000000..d34756d5b47781 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rinna_roberta_qa_arcd1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English rinna_roberta_qa_arcd1_pipeline pipeline BertForQuestionAnswering from Echiguerkh +author: John Snow Labs +name: rinna_roberta_qa_arcd1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rinna_roberta_qa_arcd1_pipeline` is a English model originally trained by Echiguerkh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rinna_roberta_qa_arcd1_pipeline_en_5.5.0_3.0_1727163862400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rinna_roberta_qa_arcd1_pipeline_en_5.5.0_3.0_1727163862400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rinna_roberta_qa_arcd1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rinna_roberta_qa_arcd1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rinna_roberta_qa_arcd1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|222.1 MB| + +## References + +https://huggingface.co/Echiguerkh/rinna-roberta-qa-arcd1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_biomedical_spanish_finetunedemoevent_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_biomedical_spanish_finetunedemoevent_en.md new file mode 100644 index 00000000000000..9380c8a220648a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_biomedical_spanish_finetunedemoevent_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_biomedical_spanish_finetunedemoevent RoBertaForSequenceClassification from joancipria +author: John Snow Labs +name: roberta_base_biomedical_spanish_finetunedemoevent +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_biomedical_spanish_finetunedemoevent` is a English model originally trained by joancipria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_spanish_finetunedemoevent_en_5.5.0_3.0_1727171044796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_spanish_finetunedemoevent_en_5.5.0_3.0_1727171044796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_biomedical_spanish_finetunedemoevent","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_biomedical_spanish_finetunedemoevent", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_biomedical_spanish_finetunedemoevent| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.8 MB| + +## References + +https://huggingface.co/joancipria/roberta-base-biomedical-es-FineTunedEmoEvent \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_finetuned_amazon_reviews_multi_pdres_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_finetuned_amazon_reviews_multi_pdres_en.md new file mode 100644 index 00000000000000..7b02bb3c39136a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_finetuned_amazon_reviews_multi_pdres_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_amazon_reviews_multi_pdres RoBertaForSequenceClassification from PDRES +author: John Snow Labs +name: roberta_base_bne_finetuned_amazon_reviews_multi_pdres +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_amazon_reviews_multi_pdres` is a English model originally trained by PDRES. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_multi_pdres_en_5.5.0_3.0_1727171594652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_multi_pdres_en_5.5.0_3.0_1727171594652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_amazon_reviews_multi_pdres","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_amazon_reviews_multi_pdres", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_amazon_reviews_multi_pdres| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/PDRES/roberta-base-bne-finetuned-amazon_reviews_multi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline_en.md new file mode 100644 index 00000000000000..4bc8dfccb45968 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline pipeline RoBertaForSequenceClassification from PDRES +author: John Snow Labs +name: roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline` is a English model originally trained by PDRES. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline_en_5.5.0_3.0_1727171677197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline_en_5.5.0_3.0_1727171677197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_amazon_reviews_multi_pdres_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/PDRES/roberta-base-bne-finetuned-amazon_reviews_multi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_24_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_24_en.md new file mode 100644 index 00000000000000..649e348b2dc974 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_24_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_24 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_24 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_24` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_24_en_5.5.0_3.0_1727169325365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_24_en_5.5.0_3.0_1727169325365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_24","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_24","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_24| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_24 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_83_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_83_en.md new file mode 100644 index 00000000000000..5e7fda55a61a2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_83_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_83 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_83 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_83` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_83_en_5.5.0_3.0_1727169305070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_83_en_5.5.0_3.0_1727169305070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_83","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_83","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_83| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_83 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_finetuned_wallisian_manual_4ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_finetuned_wallisian_manual_4ep_pipeline_en.md new file mode 100644 index 00000000000000..f4ec72d222f446 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_finetuned_wallisian_manual_4ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_4ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_4ep_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_4ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_4ep_pipeline_en_5.5.0_3.0_1727168712517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_4ep_pipeline_en_5.5.0_3.0_1727168712517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_manual_4ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_manual_4ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_4ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.6 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-4ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_ours_rundi_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_ours_rundi_2_pipeline_en.md new file mode 100644 index 00000000000000..155d0a4c4891c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_ours_rundi_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_ours_rundi_2_pipeline pipeline RoBertaForSequenceClassification from SkyR +author: John Snow Labs +name: roberta_base_ours_rundi_2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ours_rundi_2_pipeline` is a English model originally trained by SkyR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ours_rundi_2_pipeline_en_5.5.0_3.0_1727172219611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ours_rundi_2_pipeline_en_5.5.0_3.0_1727172219611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ours_rundi_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ours_rundi_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ours_rundi_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|429.3 MB| + +## References + +https://huggingface.co/SkyR/roberta-base-ours-run-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_32_13_smoothed_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_32_13_smoothed_en.md new file mode 100644 index 00000000000000..0d55e2e8fd84be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_32_13_smoothed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_sst_2_32_13_smoothed RoBertaForSequenceClassification from simonycl +author: John Snow Labs +name: roberta_base_sst_2_32_13_smoothed +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sst_2_32_13_smoothed` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sst_2_32_13_smoothed_en_5.5.0_3.0_1727167523794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sst_2_32_13_smoothed_en_5.5.0_3.0_1727167523794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sst_2_32_13_smoothed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sst_2_32_13_smoothed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sst_2_32_13_smoothed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|424.9 MB| + +## References + +https://huggingface.co/simonycl/roberta-base-sst-2-32-13-smoothed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_32_13_smoothed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_32_13_smoothed_pipeline_en.md new file mode 100644 index 00000000000000..f4fa61e4eab5c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_32_13_smoothed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_sst_2_32_13_smoothed_pipeline pipeline RoBertaForSequenceClassification from simonycl +author: John Snow Labs +name: roberta_base_sst_2_32_13_smoothed_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sst_2_32_13_smoothed_pipeline` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sst_2_32_13_smoothed_pipeline_en_5.5.0_3.0_1727167559838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sst_2_32_13_smoothed_pipeline_en_5.5.0_3.0_1727167559838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_sst_2_32_13_smoothed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_sst_2_32_13_smoothed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sst_2_32_13_smoothed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.9 MB| + +## References + +https://huggingface.co/simonycl/roberta-base-sst-2-32-13-smoothed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_64_13_30_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_64_13_30_en.md new file mode 100644 index 00000000000000..bd5a69fd9a1311 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_sst_2_64_13_30_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_sst_2_64_13_30 RoBertaForSequenceClassification from simonycl +author: John Snow Labs +name: roberta_base_sst_2_64_13_30 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sst_2_64_13_30` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sst_2_64_13_30_en_5.5.0_3.0_1727167163017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sst_2_64_13_30_en_5.5.0_3.0_1727167163017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sst_2_64_13_30","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_sst_2_64_13_30", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sst_2_64_13_30| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|425.6 MB| + +## References + +https://huggingface.co/simonycl/roberta-base-sst-2-64-13-30 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_8_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_8_en.md new file mode 100644 index 00000000000000..3a96e2f1a2784d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_conll_epoch_8 RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_conll_epoch_8 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_conll_epoch_8` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_8_en_5.5.0_3.0_1727139356272.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_8_en_5.5.0_3.0_1727139356272.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_conll_epoch_8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_conll_epoch_8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_conll_epoch_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_conll_epoch_8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_8_pipeline_en.md new file mode 100644 index 00000000000000..91da817aa8e7ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_conll_epoch_8_pipeline pipeline RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_conll_epoch_8_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_conll_epoch_8_pipeline` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_8_pipeline_en_5.5.0_3.0_1727139372178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_8_pipeline_en_5.5.0_3.0_1727139372178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_conll_epoch_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_conll_epoch_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_conll_epoch_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_conll_epoch_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_9_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_9_en.md new file mode 100644 index 00000000000000..0db6092e4e92b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_9_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_conll_epoch_9 RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_conll_epoch_9 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_conll_epoch_9` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_9_en_5.5.0_3.0_1727151113750.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_9_en_5.5.0_3.0_1727151113750.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_conll_epoch_9","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_conll_epoch_9", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_conll_epoch_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_conll_epoch_9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_9_pipeline_en.md new file mode 100644 index 00000000000000..601a9f363805fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_conll_epoch_9_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_conll_epoch_9_pipeline pipeline RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_conll_epoch_9_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_conll_epoch_9_pipeline` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_9_pipeline_en_5.5.0_3.0_1727151129521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_9_pipeline_en_5.5.0_3.0_1727151129521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_conll_epoch_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_conll_epoch_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_conll_epoch_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_conll_epoch_9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_ganda_cased_malay_ner_v2_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_ganda_cased_malay_ner_v2_test_pipeline_en.md new file mode 100644 index 00000000000000..e4531a5d94c394 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_ganda_cased_malay_ner_v2_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_ganda_cased_malay_ner_v2_test_pipeline pipeline RoBertaForTokenClassification from nxaliao +author: John Snow Labs +name: roberta_ganda_cased_malay_ner_v2_test_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ganda_cased_malay_ner_v2_test_pipeline` is a English model originally trained by nxaliao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ganda_cased_malay_ner_v2_test_pipeline_en_5.5.0_3.0_1727151359082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ganda_cased_malay_ner_v2_test_pipeline_en_5.5.0_3.0_1727151359082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_ganda_cased_malay_ner_v2_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_ganda_cased_malay_ner_v2_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ganda_cased_malay_ner_v2_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/nxaliao/roberta-lg-cased-ms-ner-v2-test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_bc4chemd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_bc4chemd_pipeline_en.md new file mode 100644 index 00000000000000..cd3130c43c080c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_bc4chemd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_bc4chemd_pipeline pipeline RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_large_bc4chemd_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_bc4chemd_pipeline` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_bc4chemd_pipeline_en_5.5.0_3.0_1727150808360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_bc4chemd_pipeline_en_5.5.0_3.0_1727150808360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_bc4chemd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_bc4chemd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_bc4chemd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/CheccoCando/roberta-large_bc4chemd + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_finetuned_ner_finetuned_ner_lionellow_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_finetuned_ner_finetuned_ner_lionellow_en.md new file mode 100644 index 00000000000000..b69087ec5d333f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_finetuned_ner_finetuned_ner_lionellow_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_ner_finetuned_ner_lionellow RoBertaForTokenClassification from LionelLow +author: John Snow Labs +name: roberta_large_finetuned_ner_finetuned_ner_lionellow +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_ner_finetuned_ner_lionellow` is a English model originally trained by LionelLow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_finetuned_ner_lionellow_en_5.5.0_3.0_1727150718733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_finetuned_ner_lionellow_en_5.5.0_3.0_1727150718733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_ner_finetuned_ner_lionellow","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_ner_finetuned_ner_lionellow", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_ner_finetuned_ner_lionellow| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/LionelLow/roberta-large-finetuned-ner-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_gest_pred_seqeval_partialmatch_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_gest_pred_seqeval_partialmatch_en.md new file mode 100644 index 00000000000000..a434bef64070e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_gest_pred_seqeval_partialmatch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_gest_pred_seqeval_partialmatch RoBertaForTokenClassification from Jsevisal +author: John Snow Labs +name: roberta_large_gest_pred_seqeval_partialmatch +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_gest_pred_seqeval_partialmatch` is a English model originally trained by Jsevisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_gest_pred_seqeval_partialmatch_en_5.5.0_3.0_1727139943371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_gest_pred_seqeval_partialmatch_en_5.5.0_3.0_1727139943371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_gest_pred_seqeval_partialmatch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_gest_pred_seqeval_partialmatch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_gest_pred_seqeval_partialmatch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Jsevisal/roberta-large-gest-pred-seqeval-partialmatch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_gest_pred_seqeval_partialmatch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_gest_pred_seqeval_partialmatch_pipeline_en.md new file mode 100644 index 00000000000000..8c75f7cd3f5778 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_gest_pred_seqeval_partialmatch_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_gest_pred_seqeval_partialmatch_pipeline pipeline RoBertaForTokenClassification from Jsevisal +author: John Snow Labs +name: roberta_large_gest_pred_seqeval_partialmatch_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_gest_pred_seqeval_partialmatch_pipeline` is a English model originally trained by Jsevisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_gest_pred_seqeval_partialmatch_pipeline_en_5.5.0_3.0_1727140019503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_gest_pred_seqeval_partialmatch_pipeline_en_5.5.0_3.0_1727140019503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_gest_pred_seqeval_partialmatch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_gest_pred_seqeval_partialmatch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_gest_pred_seqeval_partialmatch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Jsevisal/roberta-large-gest-pred-seqeval-partialmatch + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_temp_classifier_bootstrapped_v2_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_temp_classifier_bootstrapped_v2_en.md new file mode 100644 index 00000000000000..e421fa410fe617 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_temp_classifier_bootstrapped_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_temp_classifier_bootstrapped_v2 RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_large_temp_classifier_bootstrapped_v2 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_temp_classifier_bootstrapped_v2` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_bootstrapped_v2_en_5.5.0_3.0_1727171317924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_bootstrapped_v2_en_5.5.0_3.0_1727171317924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_temp_classifier_bootstrapped_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_temp_classifier_bootstrapped_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_temp_classifier_bootstrapped_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta_large_temp_classifier_bootstrapped_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_temp_classifier_bootstrapped_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_temp_classifier_bootstrapped_v2_pipeline_en.md new file mode 100644 index 00000000000000..a3b7d411fd0303 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_temp_classifier_bootstrapped_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_temp_classifier_bootstrapped_v2_pipeline pipeline RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_large_temp_classifier_bootstrapped_v2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_temp_classifier_bootstrapped_v2_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_bootstrapped_v2_pipeline_en_5.5.0_3.0_1727171385064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_bootstrapped_v2_pipeline_en_5.5.0_3.0_1727171385064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_temp_classifier_bootstrapped_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_temp_classifier_bootstrapped_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_temp_classifier_bootstrapped_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta_large_temp_classifier_bootstrapped_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-robertalarge_finetuned_winogrande_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-robertalarge_finetuned_winogrande_pipeline_en.md new file mode 100644 index 00000000000000..a9028db5c85300 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-robertalarge_finetuned_winogrande_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertalarge_finetuned_winogrande_pipeline pipeline RoBertaForSequenceClassification from Kalslice +author: John Snow Labs +name: robertalarge_finetuned_winogrande_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertalarge_finetuned_winogrande_pipeline` is a English model originally trained by Kalslice. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertalarge_finetuned_winogrande_pipeline_en_5.5.0_3.0_1727167709776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertalarge_finetuned_winogrande_pipeline_en_5.5.0_3.0_1727167709776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertalarge_finetuned_winogrande_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertalarge_finetuned_winogrande_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertalarge_finetuned_winogrande_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Kalslice/robertalarge-finetuned-winogrande + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-robertinha_gl.md b/docs/_posts/ahmedlone127/2024-09-24-robertinha_gl.md new file mode 100644 index 00000000000000..a38be62c597cd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-robertinha_gl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Galician robertinha RoBertaEmbeddings from mrm8488 +author: John Snow Labs +name: robertinha +date: 2024-09-24 +tags: [gl, open_source, onnx, embeddings, roberta] +task: Embeddings +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertinha` is a Galician model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertinha_gl_5.5.0_3.0_1727169258285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertinha_gl_5.5.0_3.0_1727169258285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robertinha","gl") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robertinha","gl") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertinha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|gl| +|Size:|311.7 MB| + +## References + +https://huggingface.co/mrm8488/RoBERTinha \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-robertinha_pipeline_gl.md b/docs/_posts/ahmedlone127/2024-09-24-robertinha_pipeline_gl.md new file mode 100644 index 00000000000000..4d10f023580d33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-robertinha_pipeline_gl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Galician robertinha_pipeline pipeline RoBertaEmbeddings from mrm8488 +author: John Snow Labs +name: robertinha_pipeline +date: 2024-09-24 +tags: [gl, open_source, pipeline, onnx] +task: Embeddings +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertinha_pipeline` is a Galician model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertinha_pipeline_gl_5.5.0_3.0_1727169273729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertinha_pipeline_gl_5.5.0_3.0_1727169273729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertinha_pipeline", lang = "gl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertinha_pipeline", lang = "gl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertinha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gl| +|Size:|311.7 MB| + +## References + +https://huggingface.co/mrm8488/RoBERTinha + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_finetuned_sandouq_ar.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_finetuned_sandouq_ar.md new file mode 100644 index 00000000000000..283ed150378a34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_finetuned_sandouq_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_bert_base_arabertv02_finetuned_sandouq BertSentenceEmbeddings from AbdoMamdouh +author: John Snow Labs +name: sent_bert_base_arabertv02_finetuned_sandouq +date: 2024-09-24 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabertv02_finetuned_sandouq` is a Arabic model originally trained by AbdoMamdouh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_finetuned_sandouq_ar_5.5.0_3.0_1727157206623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_finetuned_sandouq_ar_5.5.0_3.0_1727157206623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabertv02_finetuned_sandouq","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabertv02_finetuned_sandouq","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabertv02_finetuned_sandouq| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|505.1 MB| + +## References + +https://huggingface.co/AbdoMamdouh/bert-base-arabertv02-finetuned-sandouq \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_finetuned_sandouq_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_finetuned_sandouq_pipeline_ar.md new file mode 100644 index 00000000000000..d31b6985a71a31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_finetuned_sandouq_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_bert_base_arabertv02_finetuned_sandouq_pipeline pipeline BertSentenceEmbeddings from AbdoMamdouh +author: John Snow Labs +name: sent_bert_base_arabertv02_finetuned_sandouq_pipeline +date: 2024-09-24 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabertv02_finetuned_sandouq_pipeline` is a Arabic model originally trained by AbdoMamdouh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_finetuned_sandouq_pipeline_ar_5.5.0_3.0_1727157231886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_finetuned_sandouq_pipeline_ar_5.5.0_3.0_1727157231886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_arabertv02_finetuned_sandouq_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_arabertv02_finetuned_sandouq_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabertv02_finetuned_sandouq_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|505.6 MB| + +## References + +https://huggingface.co/AbdoMamdouh/bert-base-arabertv02-finetuned-sandouq + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_blbooks_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_blbooks_cased_pipeline_en.md new file mode 100644 index 00000000000000..3cb0c43d516dd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_blbooks_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_blbooks_cased_pipeline pipeline BertSentenceEmbeddings from bigscience-historical-texts +author: John Snow Labs +name: sent_bert_base_blbooks_cased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_blbooks_cased_pipeline` is a English model originally trained by bigscience-historical-texts. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_blbooks_cased_pipeline_en_5.5.0_3.0_1727157757584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_blbooks_cased_pipeline_en_5.5.0_3.0_1727157757584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_blbooks_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_blbooks_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_blbooks_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.9 MB| + +## References + +https://huggingface.co/bigscience-historical-texts/bert-base-blbooks-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_multilingual_uncased_finetuned_hp_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_multilingual_uncased_finetuned_hp_pipeline_xx.md new file mode 100644 index 00000000000000..3c098590225bbe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_multilingual_uncased_finetuned_hp_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_uncased_finetuned_hp_pipeline pipeline BertSentenceEmbeddings from rman-rahimi-29 +author: John Snow Labs +name: sent_bert_base_multilingual_uncased_finetuned_hp_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_uncased_finetuned_hp_pipeline` is a Multilingual model originally trained by rman-rahimi-29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_uncased_finetuned_hp_pipeline_xx_5.5.0_3.0_1727157727859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_uncased_finetuned_hp_pipeline_xx_5.5.0_3.0_1727157727859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_uncased_finetuned_hp_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_uncased_finetuned_hp_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_uncased_finetuned_hp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|626.1 MB| + +## References + +https://huggingface.co/rman-rahimi-29/bert-base-multilingual-uncased-finetuned-hp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_multilingual_uncased_finetuned_hp_xx.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_multilingual_uncased_finetuned_hp_xx.md new file mode 100644 index 00000000000000..f222d396ad4cee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_multilingual_uncased_finetuned_hp_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_uncased_finetuned_hp BertSentenceEmbeddings from rman-rahimi-29 +author: John Snow Labs +name: sent_bert_base_multilingual_uncased_finetuned_hp +date: 2024-09-24 +tags: [xx, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_uncased_finetuned_hp` is a Multilingual model originally trained by rman-rahimi-29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_uncased_finetuned_hp_xx_5.5.0_3.0_1727157695390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_uncased_finetuned_hp_xx_5.5.0_3.0_1727157695390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_uncased_finetuned_hp","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_multilingual_uncased_finetuned_hp","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_uncased_finetuned_hp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/rman-rahimi-29/bert-base-multilingual-uncased-finetuned-hp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_finetuned_news_1973_1974_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_finetuned_news_1973_1974_en.md new file mode 100644 index 00000000000000..7807f5e2bec964 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_finetuned_news_1973_1974_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_news_1973_1974 BertSentenceEmbeddings from sally9805 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_news_1973_1974 +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_news_1973_1974` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1973_1974_en_5.5.0_3.0_1727157152030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1973_1974_en_5.5.0_3.0_1727157152030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_news_1973_1974","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_news_1973_1974","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_news_1973_1974| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1973-1974 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_finetuned_news_1973_1974_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_finetuned_news_1973_1974_pipeline_en.md new file mode 100644 index 00000000000000..233b1211c3df65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_finetuned_news_1973_1974_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_news_1973_1974_pipeline pipeline BertSentenceEmbeddings from sally9805 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_news_1973_1974_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_news_1973_1974_pipeline` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1973_1974_pipeline_en_5.5.0_3.0_1727157172719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1973_1974_pipeline_en_5.5.0_3.0_1727157172719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_news_1973_1974_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_news_1973_1974_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_news_1973_1974_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1973-1974 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_wikitext_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_wikitext_en.md new file mode 100644 index 00000000000000..e00de8895fb53b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_wikitext_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_wikitext BertSentenceEmbeddings from AiresPucrs +author: John Snow Labs +name: sent_bert_base_wikitext +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_wikitext` is a English model originally trained by AiresPucrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_wikitext_en_5.5.0_3.0_1727157617538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_wikitext_en_5.5.0_3.0_1727157617538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_wikitext","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_wikitext","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_wikitext| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/AiresPucrs/bert-base-wikitext \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_wikitext_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_wikitext_pipeline_en.md new file mode 100644 index 00000000000000..2629719850ff4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_wikitext_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_wikitext_pipeline pipeline BertSentenceEmbeddings from AiresPucrs +author: John Snow Labs +name: sent_bert_base_wikitext_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_wikitext_pipeline` is a English model originally trained by AiresPucrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_wikitext_pipeline_en_5.5.0_3.0_1727157638652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_wikitext_pipeline_en_5.5.0_3.0_1727157638652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_wikitext_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_wikitext_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_wikitext_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.6 MB| + +## References + +https://huggingface.co/AiresPucrs/bert-base-wikitext + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_medium_mlsm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_medium_mlsm_pipeline_en.md new file mode 100644 index 00000000000000..3749aa035da864 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_medium_mlsm_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_medium_mlsm_pipeline pipeline BertSentenceEmbeddings from SzegedAI +author: John Snow Labs +name: sent_bert_medium_mlsm_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_medium_mlsm_pipeline` is a English model originally trained by SzegedAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_medium_mlsm_pipeline_en_5.5.0_3.0_1727178514079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_medium_mlsm_pipeline_en_5.5.0_3.0_1727178514079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_medium_mlsm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_medium_mlsm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_medium_mlsm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|157.7 MB| + +## References + +https://huggingface.co/SzegedAI/bert-medium-mlsm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_multilingial_geolocation_prediction_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_multilingial_geolocation_prediction_pipeline_en.md new file mode 100644 index 00000000000000..1000c44cd006f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_multilingial_geolocation_prediction_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_multilingial_geolocation_prediction_pipeline pipeline BertSentenceEmbeddings from k4tel +author: John Snow Labs +name: sent_bert_multilingial_geolocation_prediction_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_multilingial_geolocation_prediction_pipeline` is a English model originally trained by k4tel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_multilingial_geolocation_prediction_pipeline_en_5.5.0_3.0_1727157396347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_multilingial_geolocation_prediction_pipeline_en_5.5.0_3.0_1727157396347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_multilingial_geolocation_prediction_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_multilingial_geolocation_prediction_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_multilingial_geolocation_prediction_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|663.8 MB| + +## References + +https://huggingface.co/k4tel/bert-multilingial-geolocation-prediction + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_persian_farsi_base_uncased_finetuned_parsbert_fa.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_persian_farsi_base_uncased_finetuned_parsbert_fa.md new file mode 100644 index 00000000000000..280cf90db64cc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_persian_farsi_base_uncased_finetuned_parsbert_fa.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Persian sent_bert_persian_farsi_base_uncased_finetuned_parsbert BertSentenceEmbeddings from Yasamansaffari73 +author: John Snow Labs +name: sent_bert_persian_farsi_base_uncased_finetuned_parsbert +date: 2024-09-24 +tags: [fa, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_persian_farsi_base_uncased_finetuned_parsbert` is a Persian model originally trained by Yasamansaffari73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_finetuned_parsbert_fa_5.5.0_3.0_1727178534063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_finetuned_parsbert_fa_5.5.0_3.0_1727178534063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_persian_farsi_base_uncased_finetuned_parsbert","fa") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_persian_farsi_base_uncased_finetuned_parsbert","fa") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_persian_farsi_base_uncased_finetuned_parsbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|fa| +|Size:|606.5 MB| + +## References + +https://huggingface.co/Yasamansaffari73/bert-fa-base-uncased-finetuned-ParsBert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bertinho_galician_small_cased_gl.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bertinho_galician_small_cased_gl.md new file mode 100644 index 00000000000000..6b21d62bf8d16d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bertinho_galician_small_cased_gl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Galician sent_bertinho_galician_small_cased BertSentenceEmbeddings from dvilares +author: John Snow Labs +name: sent_bertinho_galician_small_cased +date: 2024-09-24 +tags: [gl, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertinho_galician_small_cased` is a Galician model originally trained by dvilares. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_small_cased_gl_5.5.0_3.0_1727178498873.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_small_cased_gl_5.5.0_3.0_1727178498873.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertinho_galician_small_cased","gl") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertinho_galician_small_cased","gl") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertinho_galician_small_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|gl| +|Size:|245.8 MB| + +## References + +https://huggingface.co/dvilares/bertinho-gl-small-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bertinho_galician_small_cased_pipeline_gl.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bertinho_galician_small_cased_pipeline_gl.md new file mode 100644 index 00000000000000..e1ecaf05ea9ffe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bertinho_galician_small_cased_pipeline_gl.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Galician sent_bertinho_galician_small_cased_pipeline pipeline BertSentenceEmbeddings from dvilares +author: John Snow Labs +name: sent_bertinho_galician_small_cased_pipeline +date: 2024-09-24 +tags: [gl, open_source, pipeline, onnx] +task: Embeddings +language: gl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertinho_galician_small_cased_pipeline` is a Galician model originally trained by dvilares. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_small_cased_pipeline_gl_5.5.0_3.0_1727178511734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertinho_galician_small_cased_pipeline_gl_5.5.0_3.0_1727178511734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertinho_galician_small_cased_pipeline", lang = "gl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertinho_galician_small_cased_pipeline", lang = "gl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertinho_galician_small_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gl| +|Size:|246.4 MB| + +## References + +https://huggingface.co/dvilares/bertinho-gl-small-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_danish_bert_iolariu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_danish_bert_iolariu_pipeline_en.md new file mode 100644 index 00000000000000..7c119c34b5efa3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_danish_bert_iolariu_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_danish_bert_iolariu_pipeline pipeline BertSentenceEmbeddings from iolariu +author: John Snow Labs +name: sent_danish_bert_iolariu_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_danish_bert_iolariu_pipeline` is a English model originally trained by iolariu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_danish_bert_iolariu_pipeline_en_5.5.0_3.0_1727157509892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_danish_bert_iolariu_pipeline_en_5.5.0_3.0_1727157509892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_danish_bert_iolariu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_danish_bert_iolariu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_danish_bert_iolariu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.6 MB| + +## References + +https://huggingface.co/iolariu/DA_BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_guaran_bert_tiny_cased_gn.md b/docs/_posts/ahmedlone127/2024-09-24-sent_guaran_bert_tiny_cased_gn.md new file mode 100644 index 00000000000000..d70eba96267028 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_guaran_bert_tiny_cased_gn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Guarani sent_guaran_bert_tiny_cased BertSentenceEmbeddings from mmaguero +author: John Snow Labs +name: sent_guaran_bert_tiny_cased +date: 2024-09-24 +tags: [gn, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: gn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_guaran_bert_tiny_cased` is a Guarani model originally trained by mmaguero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_guaran_bert_tiny_cased_gn_5.5.0_3.0_1727157602791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_guaran_bert_tiny_cased_gn_5.5.0_3.0_1727157602791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_guaran_bert_tiny_cased","gn") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_guaran_bert_tiny_cased","gn") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_guaran_bert_tiny_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|gn| +|Size:|34.5 MB| + +## References + +https://huggingface.co/mmaguero/gn-bert-tiny-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_guaran_bert_tiny_cased_pipeline_gn.md b/docs/_posts/ahmedlone127/2024-09-24-sent_guaran_bert_tiny_cased_pipeline_gn.md new file mode 100644 index 00000000000000..007e42c4cbd052 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_guaran_bert_tiny_cased_pipeline_gn.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Guarani sent_guaran_bert_tiny_cased_pipeline pipeline BertSentenceEmbeddings from mmaguero +author: John Snow Labs +name: sent_guaran_bert_tiny_cased_pipeline +date: 2024-09-24 +tags: [gn, open_source, pipeline, onnx] +task: Embeddings +language: gn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_guaran_bert_tiny_cased_pipeline` is a Guarani model originally trained by mmaguero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_guaran_bert_tiny_cased_pipeline_gn_5.5.0_3.0_1727157605105.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_guaran_bert_tiny_cased_pipeline_gn_5.5.0_3.0_1727157605105.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_guaran_bert_tiny_cased_pipeline", lang = "gn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_guaran_bert_tiny_cased_pipeline", lang = "gn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_guaran_bert_tiny_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gn| +|Size:|35.0 MB| + +## References + +https://huggingface.co/mmaguero/gn-bert-tiny-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_hm_model001_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_hm_model001_en.md new file mode 100644 index 00000000000000..9d8acf7ddf4fec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_hm_model001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_hm_model001 BertSentenceEmbeddings from FAN-L +author: John Snow Labs +name: sent_hm_model001 +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hm_model001` is a English model originally trained by FAN-L. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hm_model001_en_5.5.0_3.0_1727178735714.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hm_model001_en_5.5.0_3.0_1727178735714.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hm_model001","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hm_model001","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hm_model001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/FAN-L/HM_model001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_jaberv2_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_jaberv2_en.md new file mode 100644 index 00000000000000..b46295b566d940 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_jaberv2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_jaberv2 BertSentenceEmbeddings from huawei-noah +author: John Snow Labs +name: sent_jaberv2 +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_jaberv2` is a English model originally trained by huawei-noah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_jaberv2_en_5.5.0_3.0_1727157347613.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_jaberv2_en_5.5.0_3.0_1727157347613.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_jaberv2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_jaberv2","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_jaberv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|504.8 MB| + +## References + +https://huggingface.co/huawei-noah/JABERv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_jaberv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_jaberv2_pipeline_en.md new file mode 100644 index 00000000000000..8043b2ba6c8a2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_jaberv2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_jaberv2_pipeline pipeline BertSentenceEmbeddings from huawei-noah +author: John Snow Labs +name: sent_jaberv2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_jaberv2_pipeline` is a English model originally trained by huawei-noah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_jaberv2_pipeline_en_5.5.0_3.0_1727157373264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_jaberv2_pipeline_en_5.5.0_3.0_1727157373264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_jaberv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_jaberv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_jaberv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|505.4 MB| + +## References + +https://huggingface.co/huawei-noah/JABERv2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_model_mahmoud8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_model_mahmoud8_pipeline_en.md new file mode 100644 index 00000000000000..1f24d1a6c6a6fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_model_mahmoud8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_model_mahmoud8_pipeline pipeline DistilBertForSequenceClassification from Mahmoud8 +author: John Snow Labs +name: sentiment_analysis_model_mahmoud8_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_mahmoud8_pipeline` is a English model originally trained by Mahmoud8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_mahmoud8_pipeline_en_5.5.0_3.0_1727154834425.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_mahmoud8_pipeline_en_5.5.0_3.0_1727154834425.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_model_mahmoud8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_model_mahmoud8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_mahmoud8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mahmoud8/sentiment_analysis_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_with_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_with_distilbert_en.md new file mode 100644 index 00000000000000..88852b01cf048d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_with_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_with_distilbert DistilBertForSequenceClassification from hdv2709 +author: John Snow Labs +name: sentiment_analysis_with_distilbert +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_with_distilbert` is a English model originally trained by hdv2709. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_with_distilbert_en_5.5.0_3.0_1727137046808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_with_distilbert_en_5.5.0_3.0_1727137046808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_with_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_with_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_with_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hdv2709/sentiment_analysis_with_DistilBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sesgo_genero_model_en.md b/docs/_posts/ahmedlone127/2024-09-24-sesgo_genero_model_en.md new file mode 100644 index 00000000000000..739ae8b193240a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sesgo_genero_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sesgo_genero_model RoBertaForSequenceClassification from bonzo1971 +author: John Snow Labs +name: sesgo_genero_model +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sesgo_genero_model` is a English model originally trained by bonzo1971. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sesgo_genero_model_en_5.5.0_3.0_1727171127915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sesgo_genero_model_en_5.5.0_3.0_1727171127915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sesgo_genero_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sesgo_genero_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sesgo_genero_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/bonzo1971/sesgo_genero_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sesgo_genero_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sesgo_genero_model_pipeline_en.md new file mode 100644 index 00000000000000..c9092cb4a3d543 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sesgo_genero_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sesgo_genero_model_pipeline pipeline RoBertaForSequenceClassification from bonzo1971 +author: John Snow Labs +name: sesgo_genero_model_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sesgo_genero_model_pipeline` is a English model originally trained by bonzo1971. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sesgo_genero_model_pipeline_en_5.5.0_3.0_1727171148443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sesgo_genero_model_pipeline_en_5.5.0_3.0_1727171148443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sesgo_genero_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sesgo_genero_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sesgo_genero_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/bonzo1971/sesgo_genero_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sgppellow_en.md b/docs/_posts/ahmedlone127/2024-09-24-sgppellow_en.md new file mode 100644 index 00000000000000..00d288cd61abbe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sgppellow_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sgppellow RoBertaForSequenceClassification from SGPPellow +author: John Snow Labs +name: sgppellow +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sgppellow` is a English model originally trained by SGPPellow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sgppellow_en_5.5.0_3.0_1727171053996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sgppellow_en_5.5.0_3.0_1727171053996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sgppellow","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sgppellow", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sgppellow| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.2 MB| + +## References + +https://huggingface.co/SGPPellow/SGPPellow \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-spanish_sentiment_model_pysentiment_en.md b/docs/_posts/ahmedlone127/2024-09-24-spanish_sentiment_model_pysentiment_en.md new file mode 100644 index 00000000000000..9fc36b7544f0cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-spanish_sentiment_model_pysentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spanish_sentiment_model_pysentiment RoBertaForSequenceClassification from der-emmanuel +author: John Snow Labs +name: spanish_sentiment_model_pysentiment +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanish_sentiment_model_pysentiment` is a English model originally trained by der-emmanuel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanish_sentiment_model_pysentiment_en_5.5.0_3.0_1727167265977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanish_sentiment_model_pysentiment_en_5.5.0_3.0_1727167265977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("spanish_sentiment_model_pysentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("spanish_sentiment_model_pysentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanish_sentiment_model_pysentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/der-emmanuel/es-sentiment-model-pysentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-spanish_sentiment_model_pysentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-spanish_sentiment_model_pysentiment_pipeline_en.md new file mode 100644 index 00000000000000..e18a1e2dc98e8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-spanish_sentiment_model_pysentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spanish_sentiment_model_pysentiment_pipeline pipeline RoBertaForSequenceClassification from der-emmanuel +author: John Snow Labs +name: spanish_sentiment_model_pysentiment_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanish_sentiment_model_pysentiment_pipeline` is a English model originally trained by der-emmanuel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanish_sentiment_model_pysentiment_pipeline_en_5.5.0_3.0_1727167286848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanish_sentiment_model_pysentiment_pipeline_en_5.5.0_3.0_1727167286848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spanish_sentiment_model_pysentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spanish_sentiment_model_pysentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanish_sentiment_model_pysentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/der-emmanuel/es-sentiment-model-pysentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sst2_roberta_large_seed_1_en.md b/docs/_posts/ahmedlone127/2024-09-24-sst2_roberta_large_seed_1_en.md new file mode 100644 index 00000000000000..47e95740e85feb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sst2_roberta_large_seed_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sst2_roberta_large_seed_1 RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: sst2_roberta_large_seed_1 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst2_roberta_large_seed_1` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst2_roberta_large_seed_1_en_5.5.0_3.0_1727167866607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst2_roberta_large_seed_1_en_5.5.0_3.0_1727167866607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sst2_roberta_large_seed_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sst2_roberta_large_seed_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst2_roberta_large_seed_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/utahnlp/sst2_roberta-large_seed-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline_en.md new file mode 100644 index 00000000000000..0d4dc83f5aeed8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline_en_5.5.0_3.0_1727137406514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline_en_5.5.0_3.0_1727137406514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-0-2024-07-26_16-19-31 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-subtopics_bigbird_base_en.md b/docs/_posts/ahmedlone127/2024-09-24-subtopics_bigbird_base_en.md new file mode 100644 index 00000000000000..0499d864c90bcf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-subtopics_bigbird_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English subtopics_bigbird_base RoBertaForSequenceClassification from RogerKam +author: John Snow Labs +name: subtopics_bigbird_base +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`subtopics_bigbird_base` is a English model originally trained by RogerKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/subtopics_bigbird_base_en_5.5.0_3.0_1727167881927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/subtopics_bigbird_base_en_5.5.0_3.0_1727167881927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("subtopics_bigbird_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("subtopics_bigbird_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|subtopics_bigbird_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|436.5 MB| + +## References + +https://huggingface.co/RogerKam/subTopics-bigBird-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-task_2_english_en.md b/docs/_posts/ahmedlone127/2024-09-24-task_2_english_en.md new file mode 100644 index 00000000000000..148b930dc62f95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-task_2_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English task_2_english RoBertaForTokenClassification from esacalderonru +author: John Snow Labs +name: task_2_english +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`task_2_english` is a English model originally trained by esacalderonru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/task_2_english_en_5.5.0_3.0_1727150704327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/task_2_english_en_5.5.0_3.0_1727150704327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("task_2_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("task_2_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|task_2_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|445.1 MB| + +## References + +https://huggingface.co/esacalderonru/Task_2_en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-task_2_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-task_2_english_pipeline_en.md new file mode 100644 index 00000000000000..61c81b70d72fa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-task_2_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English task_2_english_pipeline pipeline RoBertaForTokenClassification from esacalderonru +author: John Snow Labs +name: task_2_english_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`task_2_english_pipeline` is a English model originally trained by esacalderonru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/task_2_english_pipeline_en_5.5.0_3.0_1727150733107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/task_2_english_pipeline_en_5.5.0_3.0_1727150733107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("task_2_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("task_2_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|task_2_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|445.1 MB| + +## References + +https://huggingface.co/esacalderonru/Task_2_en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-test_en.md b/docs/_posts/ahmedlone127/2024-09-24-test_en.md new file mode 100644 index 00000000000000..1bf999b8285888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-test_en.md @@ -0,0 +1,88 @@ +--- +layout: model +title: English test RoBertaForQuestionAnswering from Nadav +author: John Snow Labs +name: test +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test` is a English model originally trained by Nadav. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_en_5.5.0_3.0_1727156654782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_en_5.5.0_3.0_1727156654782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("test","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("test", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|877.1 MB| + +## References + +References + +https://huggingface.co/Nadav/test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tinybert_sentiment_amazon_en.md b/docs/_posts/ahmedlone127/2024-09-24-tinybert_sentiment_amazon_en.md new file mode 100644 index 00000000000000..e5edb90007ea96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tinybert_sentiment_amazon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tinybert_sentiment_amazon BertForSequenceClassification from AdamCodd +author: John Snow Labs +name: tinybert_sentiment_amazon +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinybert_sentiment_amazon` is a English model originally trained by AdamCodd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinybert_sentiment_amazon_en_5.5.0_3.0_1727149426382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinybert_sentiment_amazon_en_5.5.0_3.0_1727149426382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tinybert_sentiment_amazon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tinybert_sentiment_amazon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinybert_sentiment_amazon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/AdamCodd/tinybert-sentiment-amazon \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tmp0xmacdh7_en.md b/docs/_posts/ahmedlone127/2024-09-24-tmp0xmacdh7_en.md new file mode 100644 index 00000000000000..805d580b64e75a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tmp0xmacdh7_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tmp0xmacdh7 DistilBertForSequenceClassification from NikDiGio +author: John Snow Labs +name: tmp0xmacdh7 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tmp0xmacdh7` is a English model originally trained by NikDiGio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tmp0xmacdh7_en_5.5.0_3.0_1727154736193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tmp0xmacdh7_en_5.5.0_3.0_1727154736193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp0xmacdh7","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp0xmacdh7", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tmp0xmacdh7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/NikDiGio/tmp0xmacdh7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-transcript_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-transcript_classification_pipeline_en.md new file mode 100644 index 00000000000000..d3dca89ea32712 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-transcript_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English transcript_classification_pipeline pipeline DistilBertForSequenceClassification from aoshita +author: John Snow Labs +name: transcript_classification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transcript_classification_pipeline` is a English model originally trained by aoshita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transcript_classification_pipeline_en_5.5.0_3.0_1727154562472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transcript_classification_pipeline_en_5.5.0_3.0_1727154562472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("transcript_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("transcript_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transcript_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aoshita/transcript_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_djames62_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_djames62_en.md new file mode 100644 index 00000000000000..33102c4da9f1b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_djames62_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trial_model_djames62 RoBertaForSequenceClassification from djames62 +author: John Snow Labs +name: trial_model_djames62 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_djames62` is a English model originally trained by djames62. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_djames62_en_5.5.0_3.0_1727167306603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_djames62_en_5.5.0_3.0_1727167306603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_djames62","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_djames62", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_djames62| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.4 MB| + +## References + +https://huggingface.co/djames62/trial-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_djames62_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_djames62_pipeline_en.md new file mode 100644 index 00000000000000..7d2c8cca47c05a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_djames62_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trial_model_djames62_pipeline pipeline RoBertaForSequenceClassification from djames62 +author: John Snow Labs +name: trial_model_djames62_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_djames62_pipeline` is a English model originally trained by djames62. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_djames62_pipeline_en_5.5.0_3.0_1727167351420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_djames62_pipeline_en_5.5.0_3.0_1727167351420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trial_model_djames62_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trial_model_djames62_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_djames62_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.4 MB| + +## References + +https://huggingface.co/djames62/trial-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_qstrats_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_qstrats_pipeline_en.md new file mode 100644 index 00000000000000..c082306b56698f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_qstrats_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trial_model_qstrats_pipeline pipeline RoBertaForSequenceClassification from qstrats +author: John Snow Labs +name: trial_model_qstrats_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_qstrats_pipeline` is a English model originally trained by qstrats. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_qstrats_pipeline_en_5.5.0_3.0_1727167523937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_qstrats_pipeline_en_5.5.0_3.0_1727167523937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trial_model_qstrats_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trial_model_qstrats_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_qstrats_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.3 MB| + +## References + +https://huggingface.co/qstrats/trial-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_quant_chef_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_quant_chef_en.md new file mode 100644 index 00000000000000..fa0d5cbc1efacb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_quant_chef_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trial_model_quant_chef RoBertaForSequenceClassification from quant-chef +author: John Snow Labs +name: trial_model_quant_chef +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_quant_chef` is a English model originally trained by quant-chef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_quant_chef_en_5.5.0_3.0_1727167608768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_quant_chef_en_5.5.0_3.0_1727167608768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_quant_chef","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_quant_chef", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_quant_chef| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.1 MB| + +## References + +https://huggingface.co/quant-chef/trial-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_quant_chef_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_quant_chef_pipeline_en.md new file mode 100644 index 00000000000000..61f71f7d496d07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_quant_chef_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trial_model_quant_chef_pipeline pipeline RoBertaForSequenceClassification from quant-chef +author: John Snow Labs +name: trial_model_quant_chef_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_quant_chef_pipeline` is a English model originally trained by quant-chef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_quant_chef_pipeline_en_5.5.0_3.0_1727167651590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_quant_chef_pipeline_en_5.5.0_3.0_1727167651590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trial_model_quant_chef_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trial_model_quant_chef_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_quant_chef_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.1 MB| + +## References + +https://huggingface.co/quant-chef/trial-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_vkattukolu3_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_vkattukolu3_en.md new file mode 100644 index 00000000000000..3010b990a6d208 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_vkattukolu3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trial_model_vkattukolu3 RoBertaForSequenceClassification from vkattukolu3 +author: John Snow Labs +name: trial_model_vkattukolu3 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_vkattukolu3` is a English model originally trained by vkattukolu3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_vkattukolu3_en_5.5.0_3.0_1727167855320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_vkattukolu3_en_5.5.0_3.0_1727167855320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_vkattukolu3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_vkattukolu3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_vkattukolu3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.5 MB| + +## References + +https://huggingface.co/vkattukolu3/trial-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_vkattukolu3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_vkattukolu3_pipeline_en.md new file mode 100644 index 00000000000000..6c00a29f936bfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_vkattukolu3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trial_model_vkattukolu3_pipeline pipeline RoBertaForSequenceClassification from vkattukolu3 +author: John Snow Labs +name: trial_model_vkattukolu3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_vkattukolu3_pipeline` is a English model originally trained by vkattukolu3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_vkattukolu3_pipeline_en_5.5.0_3.0_1727167898668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_vkattukolu3_pipeline_en_5.5.0_3.0_1727167898668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trial_model_vkattukolu3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trial_model_vkattukolu3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_vkattukolu3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.5 MB| + +## References + +https://huggingface.co/vkattukolu3/trial-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-wav2vec2_base_igbo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-wav2vec2_base_igbo_pipeline_en.md new file mode 100644 index 00000000000000..d6e71025e408d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-wav2vec2_base_igbo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English wav2vec2_base_igbo_pipeline pipeline WhisperForCTC from Msughterx +author: John Snow Labs +name: wav2vec2_base_igbo_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wav2vec2_base_igbo_pipeline` is a English model originally trained by Msughterx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wav2vec2_base_igbo_pipeline_en_5.5.0_3.0_1727145348500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wav2vec2_base_igbo_pipeline_en_5.5.0_3.0_1727145348500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wav2vec2_base_igbo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wav2vec2_base_igbo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wav2vec2_base_igbo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Msughterx/wav2vec2-base-igbo + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_ai_nomimode_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_ai_nomimode_en.md new file mode 100644 index 00000000000000..cf1e72c0dbb5f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_ai_nomimode_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_ai_nomimode WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_ai_nomimode +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_ai_nomimode` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_ai_nomimode_en_5.5.0_3.0_1727142115819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_ai_nomimode_en_5.5.0_3.0_1727142115819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_ai_nomimode","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_ai_nomimode", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_ai_nomimode| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-ai-nomimode \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_ai_nomimode_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_ai_nomimode_pipeline_en.md new file mode 100644 index 00000000000000..4cc4366846f2f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_ai_nomimode_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_ai_nomimode_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_ai_nomimode_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_ai_nomimode_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_ai_nomimode_pipeline_en_5.5.0_3.0_1727142206732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_ai_nomimode_pipeline_en_5.5.0_3.0_1727142206732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_ai_nomimode_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_ai_nomimode_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_ai_nomimode_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-ai-nomimode + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_base_basque_cv16_1_eu.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_base_basque_cv16_1_eu.md new file mode 100644 index 00000000000000..b14f84980d5dd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_base_basque_cv16_1_eu.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Basque whisper_base_basque_cv16_1 WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_base_basque_cv16_1 +date: 2024-09-24 +tags: [eu, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: eu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_basque_cv16_1` is a Basque model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_basque_cv16_1_eu_5.5.0_3.0_1727141467311.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_basque_cv16_1_eu_5.5.0_3.0_1727141467311.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_basque_cv16_1","eu") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_basque_cv16_1", "eu") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_basque_cv16_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|eu| +|Size:|641.4 MB| + +## References + +https://huggingface.co/zuazo/whisper-base-eu-cv16_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_base_basque_cv16_1_pipeline_eu.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_base_basque_cv16_1_pipeline_eu.md new file mode 100644 index 00000000000000..4eef3c84fef78e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_base_basque_cv16_1_pipeline_eu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Basque whisper_base_basque_cv16_1_pipeline pipeline WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_base_basque_cv16_1_pipeline +date: 2024-09-24 +tags: [eu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: eu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_basque_cv16_1_pipeline` is a Basque model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_basque_cv16_1_pipeline_eu_5.5.0_3.0_1727141501672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_basque_cv16_1_pipeline_eu_5.5.0_3.0_1727141501672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_basque_cv16_1_pipeline", lang = "eu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_basque_cv16_1_pipeline", lang = "eu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_basque_cv16_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|eu| +|Size:|641.4 MB| + +## References + +https://huggingface.co/zuazo/whisper-base-eu-cv16_1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_base_thai_project_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_base_thai_project_6_pipeline_en.md new file mode 100644 index 00000000000000..d869eb63c939c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_base_thai_project_6_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_thai_project_6_pipeline pipeline WhisperForCTC from Varit +author: John Snow Labs +name: whisper_base_thai_project_6_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_thai_project_6_pipeline` is a English model originally trained by Varit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_thai_project_6_pipeline_en_5.5.0_3.0_1727145914176.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_thai_project_6_pipeline_en_5.5.0_3.0_1727145914176.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_thai_project_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_thai_project_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_thai_project_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.3 MB| + +## References + +https://huggingface.co/Varit/whisper-base-th-project-6 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_divehi_cleandata_dv.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_divehi_cleandata_dv.md new file mode 100644 index 00000000000000..5edb5cd5937918 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_divehi_cleandata_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_cleandata WhisperForCTC from cleandata +author: John Snow Labs +name: whisper_small_divehi_cleandata +date: 2024-09-24 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_cleandata` is a Dhivehi, Divehi, Maldivian model originally trained by cleandata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_cleandata_dv_5.5.0_3.0_1727143368743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_cleandata_dv_5.5.0_3.0_1727143368743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_cleandata","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_cleandata", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_cleandata| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/cleandata/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_divehi_cleandata_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_divehi_cleandata_pipeline_dv.md new file mode 100644 index 00000000000000..8883835d9f3792 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_divehi_cleandata_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_cleandata_pipeline pipeline WhisperForCTC from cleandata +author: John Snow Labs +name: whisper_small_divehi_cleandata_pipeline +date: 2024-09-24 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_cleandata_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by cleandata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_cleandata_pipeline_dv_5.5.0_3.0_1727143459591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_cleandata_pipeline_dv_5.5.0_3.0_1727143459591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_cleandata_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_cleandata_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_cleandata_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/cleandata/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hindi_abatula_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hindi_abatula_en.md new file mode 100644 index 00000000000000..346783fec99509 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hindi_abatula_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_abatula WhisperForCTC from abatula +author: John Snow Labs +name: whisper_small_hindi_abatula +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_abatula` is a English model originally trained by abatula. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_abatula_en_5.5.0_3.0_1727141617044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_abatula_en_5.5.0_3.0_1727141617044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_abatula","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_abatula", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_abatula| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/abatula/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hindi_abatula_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hindi_abatula_pipeline_en.md new file mode 100644 index 00000000000000..5ba327c7a32961 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hindi_abatula_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_abatula_pipeline pipeline WhisperForCTC from abatula +author: John Snow Labs +name: whisper_small_hindi_abatula_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_abatula_pipeline` is a English model originally trained by abatula. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_abatula_pipeline_en_5.5.0_3.0_1727141715892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_abatula_pipeline_en_5.5.0_3.0_1727141715892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_abatula_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_abatula_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_abatula_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/abatula/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hk_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hk_en.md new file mode 100644 index 00000000000000..ff57556d789e5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hk_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hk WhisperForCTC from PenguinbladeZ +author: John Snow Labs +name: whisper_small_hk +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hk` is a English model originally trained by PenguinbladeZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hk_en_5.5.0_3.0_1727194164950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hk_en_5.5.0_3.0_1727194164950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hk","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hk", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/PenguinbladeZ/whisper-small-hk \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hk_pipeline_en.md new file mode 100644 index 00000000000000..f5ad97415b35c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_hk_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hk_pipeline pipeline WhisperForCTC from PenguinbladeZ +author: John Snow Labs +name: whisper_small_hk_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hk_pipeline` is a English model originally trained by PenguinbladeZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hk_pipeline_en_5.5.0_3.0_1727194266210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hk_pipeline_en_5.5.0_3.0_1727194266210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/PenguinbladeZ/whisper-small-hk + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_telugu_4k_pipeline_te.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_telugu_4k_pipeline_te.md new file mode 100644 index 00000000000000..8413e95da35a94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_telugu_4k_pipeline_te.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Telugu whisper_small_telugu_4k_pipeline pipeline WhisperForCTC from bnriiitb +author: John Snow Labs +name: whisper_small_telugu_4k_pipeline +date: 2024-09-24 +tags: [te, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: te +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_telugu_4k_pipeline` is a Telugu model originally trained by bnriiitb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_4k_pipeline_te_5.5.0_3.0_1727144406215.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_4k_pipeline_te_5.5.0_3.0_1727144406215.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_telugu_4k_pipeline", lang = "te") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_telugu_4k_pipeline", lang = "te") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_telugu_4k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|te| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bnriiitb/whisper-small-te-4k + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_telugu_4k_te.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_telugu_4k_te.md new file mode 100644 index 00000000000000..be69d3bbe59aa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_telugu_4k_te.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Telugu whisper_small_telugu_4k WhisperForCTC from bnriiitb +author: John Snow Labs +name: whisper_small_telugu_4k +date: 2024-09-24 +tags: [te, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: te +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_telugu_4k` is a Telugu model originally trained by bnriiitb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_4k_te_5.5.0_3.0_1727144311164.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_4k_te_5.5.0_3.0_1727144311164.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_telugu_4k","te") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_telugu_4k", "te") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_telugu_4k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|te| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bnriiitb/whisper-small-te-4k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_tiny_english_arielcerdap_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_tiny_english_arielcerdap_en.md new file mode 100644 index 00000000000000..f29d82356b56eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_tiny_english_arielcerdap_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_arielcerdap WhisperForCTC from arielcerdap +author: John Snow Labs +name: whisper_tiny_english_arielcerdap +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_arielcerdap` is a English model originally trained by arielcerdap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_arielcerdap_en_5.5.0_3.0_1727142002434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_arielcerdap_en_5.5.0_3.0_1727142002434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_arielcerdap","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_arielcerdap", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_arielcerdap| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/arielcerdap/whisper-tiny-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_tiny_english_arielcerdap_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_tiny_english_arielcerdap_pipeline_en.md new file mode 100644 index 00000000000000..b7044c6a1c1427 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_tiny_english_arielcerdap_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_english_arielcerdap_pipeline pipeline WhisperForCTC from arielcerdap +author: John Snow Labs +name: whisper_tiny_english_arielcerdap_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_arielcerdap_pipeline` is a English model originally trained by arielcerdap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_arielcerdap_pipeline_en_5.5.0_3.0_1727142022683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_arielcerdap_pipeline_en_5.5.0_3.0_1727142022683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_english_arielcerdap_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_english_arielcerdap_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_arielcerdap_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/arielcerdap/whisper-tiny-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-withinapps_ndd_ppma_test_content_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-24-withinapps_ndd_ppma_test_content_cwadj_en.md new file mode 100644 index 00000000000000..6c374505a54741 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-withinapps_ndd_ppma_test_content_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_ppma_test_content_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_ppma_test_content_cwadj +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_ppma_test_content_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_ppma_test_content_cwadj_en_5.5.0_3.0_1727154623208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_ppma_test_content_cwadj_en_5.5.0_3.0_1727154623208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_ppma_test_content_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_ppma_test_content_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_ppma_test_content_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-ppma_test-content-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xddmodel_en.md b/docs/_posts/ahmedlone127/2024-09-24-xddmodel_en.md new file mode 100644 index 00000000000000..f4747f7ddcd376 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xddmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xddmodel XlmRoBertaForTokenClassification from pushokay +author: John Snow Labs +name: xddmodel +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xddmodel` is a English model originally trained by pushokay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xddmodel_en_5.5.0_3.0_1727179961981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xddmodel_en_5.5.0_3.0_1727179961981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xddmodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xddmodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xddmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|388.5 MB| + +## References + +https://huggingface.co/pushokay/xddModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xddmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xddmodel_pipeline_en.md new file mode 100644 index 00000000000000..3b5143356189e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xddmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xddmodel_pipeline pipeline XlmRoBertaForTokenClassification from pushokay +author: John Snow Labs +name: xddmodel_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xddmodel_pipeline` is a English model originally trained by pushokay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xddmodel_pipeline_en_5.5.0_3.0_1727179987730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xddmodel_pipeline_en_5.5.0_3.0_1727179987730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xddmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xddmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xddmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|388.5 MB| + +## References + +https://huggingface.co/pushokay/xddModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_balance_vietnam_aug_delete_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_balance_vietnam_aug_delete_en.md new file mode 100644 index 00000000000000..f376e31e2caac3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_balance_vietnam_aug_delete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_delete XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_delete +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_delete` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_delete_en_5.5.0_3.0_1727170140787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_delete_en_5.5.0_3.0_1727170140787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_delete","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_delete", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_delete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|793.6 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_delete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_balance_vietnam_aug_delete_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_balance_vietnam_aug_delete_pipeline_en.md new file mode 100644 index 00000000000000..7b08fa4d2b15f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_balance_vietnam_aug_delete_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_delete_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_delete_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_delete_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_delete_pipeline_en_5.5.0_3.0_1727170276044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_delete_pipeline_en_5.5.0_3.0_1727170276044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_balance_vietnam_aug_delete_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_balance_vietnam_aug_delete_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_delete_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|793.7 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_delete + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_final_mixed_aug_backtranslation_1_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_final_mixed_aug_backtranslation_1_en.md new file mode 100644 index 00000000000000..b151236e6c9e81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_final_mixed_aug_backtranslation_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_aug_backtranslation_1 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_aug_backtranslation_1 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_aug_backtranslation_1` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_backtranslation_1_en_5.5.0_3.0_1727152826621.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_backtranslation_1_en_5.5.0_3.0_1727152826621.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_aug_backtranslation_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_aug_backtranslation_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_aug_backtranslation_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|794.6 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-aug_backtranslation-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline_en.md new file mode 100644 index 00000000000000..40a8186e5ed5e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline_en_5.5.0_3.0_1727152966363.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline_en_5.5.0_3.0_1727152966363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_aug_backtranslation_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.7 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-aug_backtranslation-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_en.md new file mode 100644 index 00000000000000..3edf0f59e9003b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2 XlmRoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_en_5.5.0_3.0_1727152751701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_en_5.5.0_3.0_1727152751701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|894.7 MB| + +## References + +https://huggingface.co/vg055/xlm-roberta-base-finetuned-IberAuTexTification2024-7030-4epo-task1-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_en.md new file mode 100644 index 00000000000000..3751702e6ac84e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2 XlmRoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_en_5.5.0_3.0_1727170138983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_en_5.5.0_3.0_1727170138983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kinre-finetuned-kin-sent2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline_en.md new file mode 100644 index 00000000000000..1fbb06cd57a998 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline pipeline XlmRoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline_en_5.5.0_3.0_1727170198683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline_en_5.5.0_3.0_1727170198683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_kinyarwanda_kinre_finetuned_kinyarwanda_sent2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kinre-finetuned-kin-sent2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_marc_tielupeng_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_marc_tielupeng_en.md new file mode 100644 index 00000000000000..160bcb71b8bb81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_marc_tielupeng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_tielupeng XlmRoBertaForSequenceClassification from tielupeng +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_tielupeng +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_tielupeng` is a English model originally trained by tielupeng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_tielupeng_en_5.5.0_3.0_1727156592734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_tielupeng_en_5.5.0_3.0_1727156592734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_tielupeng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_tielupeng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_tielupeng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|835.1 MB| + +## References + +https://huggingface.co/tielupeng/xlm-roberta-base-finetuned-marc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_en.md new file mode 100644 index 00000000000000..08ea63e58c1427 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_0ppxnhximxr XlmRoBertaForTokenClassification from 0ppxnhximxr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_0ppxnhximxr +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_0ppxnhximxr` is a English model originally trained by 0ppxnhximxr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_en_5.5.0_3.0_1727180196647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_en_5.5.0_3.0_1727180196647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_0ppxnhximxr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_0ppxnhximxr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_0ppxnhximxr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/0ppxnhximxr/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline_en.md new file mode 100644 index 00000000000000..7ac5021db0ce74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline pipeline XlmRoBertaForTokenClassification from 0ppxnhximxr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline` is a English model originally trained by 0ppxnhximxr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline_en_5.5.0_3.0_1727180277781.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline_en_5.5.0_3.0_1727180277781.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_0ppxnhximxr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/0ppxnhximxr/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_amitjain171980_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_amitjain171980_en.md new file mode 100644 index 00000000000000..73b3e0a5caedad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_amitjain171980_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_amitjain171980 XlmRoBertaForTokenClassification from amitjain171980 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_amitjain171980 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_amitjain171980` is a English model originally trained by amitjain171980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_amitjain171980_en_5.5.0_3.0_1727160303720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_amitjain171980_en_5.5.0_3.0_1727160303720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_amitjain171980","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_amitjain171980", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_amitjain171980| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|855.9 MB| + +## References + +https://huggingface.co/amitjain171980/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_hravi_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_hravi_en.md new file mode 100644 index 00000000000000..1f6358a3ae5ff1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_hravi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_hravi XlmRoBertaForTokenClassification from hravi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_hravi +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_hravi` is a English model originally trained by hravi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hravi_en_5.5.0_3.0_1727180020333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hravi_en_5.5.0_3.0_1727180020333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_hravi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_hravi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_hravi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/hravi/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_k3lana_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_k3lana_en.md new file mode 100644 index 00000000000000..991f5f6df6c27a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_k3lana_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_k3lana XlmRoBertaForTokenClassification from k3lana +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_k3lana +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_k3lana` is a English model originally trained by k3lana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_k3lana_en_5.5.0_3.0_1727174783070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_k3lana_en_5.5.0_3.0_1727174783070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_k3lana","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_k3lana", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_k3lana| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/k3lana/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_k3lana_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_k3lana_pipeline_en.md new file mode 100644 index 00000000000000..d0209a1ce071e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_k3lana_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_k3lana_pipeline pipeline XlmRoBertaForTokenClassification from k3lana +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_k3lana_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_k3lana_pipeline` is a English model originally trained by k3lana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_k3lana_pipeline_en_5.5.0_3.0_1727174847821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_k3lana_pipeline_en_5.5.0_3.0_1727174847821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_k3lana_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_k3lana_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_k3lana_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/k3lana/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_maxnet_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_maxnet_en.md new file mode 100644 index 00000000000000..043428c379df01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_maxnet_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_maxnet XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_maxnet +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_maxnet` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_maxnet_en_5.5.0_3.0_1727160602568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_maxnet_en_5.5.0_3.0_1727160602568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_maxnet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_maxnet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_maxnet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|859.8 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_maxnet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_maxnet_pipeline_en.md new file mode 100644 index 00000000000000..40cd70f02e51b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_all_maxnet_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_maxnet_pipeline pipeline XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_maxnet_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_maxnet_pipeline` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_maxnet_pipeline_en_5.5.0_3.0_1727160669843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_maxnet_pipeline_en_5.5.0_3.0_1727160669843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_maxnet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_maxnet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_maxnet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|859.8 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_pockypocky_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_pockypocky_en.md new file mode 100644 index 00000000000000..e8dd82c98740f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_pockypocky_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_pockypocky XlmRoBertaForTokenClassification from pockypocky +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_pockypocky +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_pockypocky` is a English model originally trained by pockypocky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_pockypocky_en_5.5.0_3.0_1727147476717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_pockypocky_en_5.5.0_3.0_1727147476717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_pockypocky","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_pockypocky", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_pockypocky| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/pockypocky/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_skr1125_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_skr1125_en.md new file mode 100644 index 00000000000000..b97d3b87fef1d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_skr1125_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_skr1125 XlmRoBertaForTokenClassification from skr1125 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_skr1125 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_skr1125` is a English model originally trained by skr1125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_skr1125_en_5.5.0_3.0_1727180150112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_skr1125_en_5.5.0_3.0_1727180150112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_skr1125","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_skr1125", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_skr1125| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/skr1125/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline_en.md new file mode 100644 index 00000000000000..8ea065110376ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline pipeline XlmRoBertaForTokenClassification from cyrildever +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline` is a English model originally trained by cyrildever. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline_en_5.5.0_3.0_1727148236718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline_en_5.5.0_3.0_1727148236718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_cyrildever_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/cyrildever/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_ridealist_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_ridealist_en.md new file mode 100644 index 00000000000000..99d6e6a7491fdf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_ridealist_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_ridealist XlmRoBertaForTokenClassification from Ridealist +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_ridealist +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_ridealist` is a English model originally trained by Ridealist. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ridealist_en_5.5.0_3.0_1727147695062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ridealist_en_5.5.0_3.0_1727147695062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_ridealist","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_ridealist", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_ridealist| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/Ridealist/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_zebans_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_zebans_pipeline_en.md new file mode 100644 index 00000000000000..a1fc33c67227f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_zebans_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_zebans_pipeline pipeline XlmRoBertaForTokenClassification from zebans +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_zebans_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_zebans_pipeline` is a English model originally trained by zebans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_zebans_pipeline_en_5.5.0_3.0_1727160620635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_zebans_pipeline_en_5.5.0_3.0_1727160620635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_zebans_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_zebans_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_zebans_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/zebans/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_cramade_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_cramade_en.md new file mode 100644 index 00000000000000..2b25b99cec94f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_cramade_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_cramade XlmRoBertaForTokenClassification from cramade +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_cramade +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_cramade` is a English model originally trained by cramade. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cramade_en_5.5.0_3.0_1727160712619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cramade_en_5.5.0_3.0_1727160712619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_cramade","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_cramade", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_cramade| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/cramade/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_cramade_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_cramade_pipeline_en.md new file mode 100644 index 00000000000000..aaeebedc093d91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_cramade_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_cramade_pipeline pipeline XlmRoBertaForTokenClassification from cramade +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_cramade_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_cramade_pipeline` is a English model originally trained by cramade. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cramade_pipeline_en_5.5.0_3.0_1727160798955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_cramade_pipeline_en_5.5.0_3.0_1727160798955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_cramade_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_cramade_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_cramade_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/cramade/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_bessho_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_bessho_en.md new file mode 100644 index 00000000000000..ee4e6cf19cf54e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_bessho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_bessho XlmRoBertaForTokenClassification from bessho +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_bessho +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_bessho` is a English model originally trained by bessho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_bessho_en_5.5.0_3.0_1727175116872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_bessho_en_5.5.0_3.0_1727175116872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_bessho","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_bessho", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_bessho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/bessho/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline_en.md new file mode 100644 index 00000000000000..b495880fc8e646 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline pipeline XlmRoBertaForTokenClassification from daSooo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline` is a English model originally trained by daSooo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline_en_5.5.0_3.0_1727180378202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline_en_5.5.0_3.0_1727180378202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_dasooo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/daSooo/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline_en.md new file mode 100644 index 00000000000000..2be52190488257 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline pipeline XlmRoBertaForTokenClassification from Isaacp +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline` is a English model originally trained by Isaacp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline_en_5.5.0_3.0_1727147501827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline_en_5.5.0_3.0_1727147501827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_isaacp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Isaacp/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_rlpeter70_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_rlpeter70_en.md new file mode 100644 index 00000000000000..16c1e9df0ec5ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_rlpeter70_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_rlpeter70 XlmRoBertaForTokenClassification from rlpeter70 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_rlpeter70 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_rlpeter70` is a English model originally trained by rlpeter70. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_rlpeter70_en_5.5.0_3.0_1727160215681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_rlpeter70_en_5.5.0_3.0_1727160215681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_rlpeter70","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_rlpeter70", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_rlpeter70| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/rlpeter70/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline_en.md new file mode 100644 index 00000000000000..3997592a7e3a3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline pipeline XlmRoBertaForTokenClassification from rlpeter70 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline` is a English model originally trained by rlpeter70. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline_en_5.5.0_3.0_1727160282532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline_en_5.5.0_3.0_1727160282532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_rlpeter70_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/rlpeter70/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_xrchen11_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_xrchen11_en.md new file mode 100644 index 00000000000000..7171b5dea74df3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_xrchen11_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_xrchen11 XlmRoBertaForTokenClassification from xrchen11 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_xrchen11 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_xrchen11` is a English model originally trained by xrchen11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_xrchen11_en_5.5.0_3.0_1727147833906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_xrchen11_en_5.5.0_3.0_1727147833906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_xrchen11","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_xrchen11", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_xrchen11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/xrchen11/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_hindi_urdu_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_hindi_urdu_en.md new file mode 100644 index 00000000000000..1c71b5c8f53e25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_hindi_urdu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_urdu XlmRoBertaForTokenClassification from DeepaPeri +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_urdu +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_urdu` is a English model originally trained by DeepaPeri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_urdu_en_5.5.0_3.0_1727147870933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_urdu_en_5.5.0_3.0_1727147870933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_urdu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_urdu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_urdu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.1 MB| + +## References + +https://huggingface.co/DeepaPeri/xlm-roberta-base-finetuned-panx-hi-ur \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_ankit15nov_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_ankit15nov_en.md new file mode 100644 index 00000000000000..60b5496ce7b64c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_ankit15nov_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_ankit15nov XlmRoBertaForTokenClassification from Ankit15nov +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_ankit15nov +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_ankit15nov` is a English model originally trained by Ankit15nov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ankit15nov_en_5.5.0_3.0_1727160899552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ankit15nov_en_5.5.0_3.0_1727160899552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_ankit15nov","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_ankit15nov", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_ankit15nov| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Ankit15nov/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline_en.md new file mode 100644 index 00000000000000..32e95354e577ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline pipeline XlmRoBertaForTokenClassification from Ankit15nov +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline` is a English model originally trained by Ankit15nov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline_en_5.5.0_3.0_1727160987501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline_en_5.5.0_3.0_1727160987501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_ankit15nov_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Ankit15nov/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline_en.md new file mode 100644 index 00000000000000..da219feab531db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline pipeline XlmRoBertaForTokenClassification from khadija267 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline` is a English model originally trained by khadija267. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline_en_5.5.0_3.0_1727174972482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline_en_5.5.0_3.0_1727174972482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_khadija267_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/khadija267/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_en.md new file mode 100644 index 00000000000000..17e1b85d654aed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_en_5.5.0_3.0_1727156531764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_en_5.5.0_3.0_1727156531764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|810.5 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr5e-06_seed42_amh-esp-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..98ce11a0c47b4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727156661585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727156661585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr5e_06_seed42_amh_esp_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|810.5 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr5e-06_seed42_amh-esp-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_insert_vietnamese_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_insert_vietnamese_en.md new file mode 100644 index 00000000000000..01fa458eafc04a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_insert_vietnamese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_mixed_aug_insert_vietnamese XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_mixed_aug_insert_vietnamese +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_mixed_aug_insert_vietnamese` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mixed_aug_insert_vietnamese_en_5.5.0_3.0_1727155878058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mixed_aug_insert_vietnamese_en_5.5.0_3.0_1727155878058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_mixed_aug_insert_vietnamese","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_mixed_aug_insert_vietnamese", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_mixed_aug_insert_vietnamese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|794.9 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Mixed-aug_insert_vi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline_en.md new file mode 100644 index 00000000000000..ecdad5514f3346 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline_en_5.5.0_3.0_1727156009421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline_en_5.5.0_3.0_1727156009421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_mixed_aug_insert_vietnamese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.9 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Mixed-aug_insert_vi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_pharmaconer_kanansharmaa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_pharmaconer_kanansharmaa_pipeline_en.md new file mode 100644 index 00000000000000..fc81b88cceba40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_pharmaconer_kanansharmaa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_pharmaconer_kanansharmaa_pipeline pipeline RoBertaForTokenClassification from kanansharmaa +author: John Snow Labs +name: xlm_roberta_base_pharmaconer_kanansharmaa_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_pharmaconer_kanansharmaa_pipeline` is a English model originally trained by kanansharmaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pharmaconer_kanansharmaa_pipeline_en_5.5.0_3.0_1727139705988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pharmaconer_kanansharmaa_pipeline_en_5.5.0_3.0_1727139705988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_pharmaconer_kanansharmaa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_pharmaconer_kanansharmaa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_pharmaconer_kanansharmaa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|829.0 MB| + +## References + +https://huggingface.co/kanansharmaa/xlm-roberta-base-pharmaconer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_en.md new file mode 100644 index 00000000000000..01d15a0134ae8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_spanish_10000_xnli_spanish XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_spanish_10000_xnli_spanish +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_spanish_10000_xnli_spanish` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_en_5.5.0_3.0_1727170235468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_en_5.5.0_3.0_1727170235468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_spanish_10000_xnli_spanish","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_spanish_10000_xnli_spanish", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_spanish_10000_xnli_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|353.7 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-es-10000-xnli-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline_en.md new file mode 100644 index 00000000000000..6272e45e6ad4e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline_en_5.5.0_3.0_1727170253798.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline_en_5.5.0_3.0_1727170253798.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_spanish_10000_xnli_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|353.7 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-es-10000-xnli-es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_wnli_10_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_wnli_10_en.md new file mode 100644 index 00000000000000..0244a50500d924 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_wnli_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_wnli_10 XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_wnli_10 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_wnli_10` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_wnli_10_en_5.5.0_3.0_1727152300856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_wnli_10_en_5.5.0_3.0_1727152300856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_wnli_10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_wnli_10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_wnli_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|772.2 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-wnli-10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_wnli_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_wnli_10_pipeline_en.md new file mode 100644 index 00000000000000..97d74cb1533d94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_wnli_10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_wnli_10_pipeline pipeline XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_wnli_10_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_wnli_10_pipeline` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_wnli_10_pipeline_en_5.5.0_3.0_1727152443513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_wnli_10_pipeline_en_5.5.0_3.0_1727152443513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_wnli_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_wnli_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_wnli_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|772.3 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-wnli-10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_emojis_iid_fed_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_emojis_iid_fed_en.md new file mode 100644 index 00000000000000..39048f2e2e5270 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_emojis_iid_fed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_iid_fed XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_iid_fed +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_iid_fed` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_iid_fed_en_5.5.0_3.0_1727152208462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_iid_fed_en_5.5.0_3.0_1727152208462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_iid_fed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_emojis_iid_fed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_iid_fed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-IID-Fed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_emojis_iid_fed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_emojis_iid_fed_pipeline_en.md new file mode 100644 index 00000000000000..14ed8ed53537fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_emojis_iid_fed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_iid_fed_pipeline pipeline XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_iid_fed_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_iid_fed_pipeline` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_iid_fed_pipeline_en_5.5.0_3.0_1727152260950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_iid_fed_pipeline_en_5.5.0_3.0_1727152260950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_finetuned_emojis_iid_fed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_finetuned_emojis_iid_fed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_iid_fed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-IID-Fed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_semeval_2018_emojis_cen_1_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_semeval_2018_emojis_cen_1_en.md new file mode 100644 index 00000000000000..be956ab5531532 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_semeval_2018_emojis_cen_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_finetuned_semeval_2018_emojis_cen_1 XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_semeval_2018_emojis_cen_1 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_semeval_2018_emojis_cen_1` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_semeval_2018_emojis_cen_1_en_5.5.0_3.0_1727170435788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_semeval_2018_emojis_cen_1_en_5.5.0_3.0_1727170435788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_semeval_2018_emojis_cen_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_finetuned_semeval_2018_emojis_cen_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_semeval_2018_emojis_cen_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-SemEval-2018-emojis-cen-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline_en.md new file mode 100644 index 00000000000000..c5ee7b90e08a20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline pipeline XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline_en_5.5.0_3.0_1727170486782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline_en_5.5.0_3.0_1727170486782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_semeval_2018_emojis_cen_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-SemEval-2018-emojis-cen-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes_fr.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes_fr.md new file mode 100644 index 00000000000000..2f92903242f5d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes_fr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: French xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes XlmRoBertaForSequenceClassification from waboucay +author: John Snow Labs +name: xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes +date: 2024-09-24 +tags: [fr, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes` is a French model originally trained by waboucay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes_fr_5.5.0_3.0_1727170484761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes_fr_5.5.0_3.0_1727170484761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes","fr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes", "fr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_longformer_base_4096_repnum_wl_rua_wl_3_classes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|fr| +|Size:|1.1 GB| + +## References + +https://huggingface.co/waboucay/xlm-roberta-longformer-base-4096-repnum_wl-rua_wl_3_classes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_v_base_trimmed_german_tweet_sentiment_german_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_v_base_trimmed_german_tweet_sentiment_german_en.md new file mode 100644 index 00000000000000..e76e97197ba8d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_v_base_trimmed_german_tweet_sentiment_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_v_base_trimmed_german_tweet_sentiment_german XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_v_base_trimmed_german_tweet_sentiment_german +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_v_base_trimmed_german_tweet_sentiment_german` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_german_tweet_sentiment_german_en_5.5.0_3.0_1727152544394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_german_tweet_sentiment_german_en_5.5.0_3.0_1727152544394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_v_base_trimmed_german_tweet_sentiment_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_v_base_trimmed_german_tweet_sentiment_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_v_base_trimmed_german_tweet_sentiment_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|750.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-de-tweet-sentiment-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_english_german_all_shuffled_1986_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_english_german_all_shuffled_1986_test1000_en.md new file mode 100644 index 00000000000000..c0f52830f23431 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_english_german_all_shuffled_1986_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_english_german_all_shuffled_1986_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_english_german_all_shuffled_1986_test1000 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_english_german_all_shuffled_1986_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_english_german_all_shuffled_1986_test1000_en_5.5.0_3.0_1727156076571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_english_german_all_shuffled_1986_test1000_en_5.5.0_3.0_1727156076571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_english_german_all_shuffled_1986_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_english_german_all_shuffled_1986_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_english_german_all_shuffled_1986_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|826.3 MB| + +## References + +https://huggingface.co/patpizio/xlmr-en-de-all_shuffled-1986-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_english_german_all_shuffled_1986_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_english_german_all_shuffled_1986_test1000_pipeline_en.md new file mode 100644 index 00000000000000..6d957715ba62cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_english_german_all_shuffled_1986_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_english_german_all_shuffled_1986_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_english_german_all_shuffled_1986_test1000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_english_german_all_shuffled_1986_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_english_german_all_shuffled_1986_test1000_pipeline_en_5.5.0_3.0_1727156194567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_english_german_all_shuffled_1986_test1000_pipeline_en_5.5.0_3.0_1727156194567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_english_german_all_shuffled_1986_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_english_german_all_shuffled_1986_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_english_german_all_shuffled_1986_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.3 MB| + +## References + +https://huggingface.co/patpizio/xlmr-en-de-all_shuffled-1986-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_estonian_english_all_shuffled_1986_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_estonian_english_all_shuffled_1986_test1000_pipeline_en.md new file mode 100644 index 00000000000000..12d100bbfa19dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_estonian_english_all_shuffled_1986_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_estonian_english_all_shuffled_1986_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_estonian_english_all_shuffled_1986_test1000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_estonian_english_all_shuffled_1986_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_1986_test1000_pipeline_en_5.5.0_3.0_1727155957648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_1986_test1000_pipeline_en_5.5.0_3.0_1727155957648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_estonian_english_all_shuffled_1986_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_estonian_english_all_shuffled_1986_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_estonian_english_all_shuffled_1986_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|820.5 MB| + +## References + +https://huggingface.co/patpizio/xlmr-et-en-all_shuffled-1986-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_nepali_english_all_shuffled_1985_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_nepali_english_all_shuffled_1985_test1000_en.md new file mode 100644 index 00000000000000..d84d98b95a687d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_nepali_english_all_shuffled_1985_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_nepali_english_all_shuffled_1985_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_nepali_english_all_shuffled_1985_test1000 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_nepali_english_all_shuffled_1985_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_nepali_english_all_shuffled_1985_test1000_en_5.5.0_3.0_1727156451263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_nepali_english_all_shuffled_1985_test1000_en_5.5.0_3.0_1727156451263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_nepali_english_all_shuffled_1985_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_nepali_english_all_shuffled_1985_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_nepali_english_all_shuffled_1985_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|817.8 MB| + +## References + +https://huggingface.co/patpizio/xlmr-ne-en-all_shuffled-1985-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_nepali_english_all_shuffled_1985_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_nepali_english_all_shuffled_1985_test1000_pipeline_en.md new file mode 100644 index 00000000000000..3aa36292c84891 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_nepali_english_all_shuffled_1985_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_nepali_english_all_shuffled_1985_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_nepali_english_all_shuffled_1985_test1000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_nepali_english_all_shuffled_1985_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_nepali_english_all_shuffled_1985_test1000_pipeline_en_5.5.0_3.0_1727156573161.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_nepali_english_all_shuffled_1985_test1000_pipeline_en_5.5.0_3.0_1727156573161.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_nepali_english_all_shuffled_1985_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_nepali_english_all_shuffled_1985_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_nepali_english_all_shuffled_1985_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|817.8 MB| + +## References + +https://huggingface.co/patpizio/xlmr-ne-en-all_shuffled-1985-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_semantic_textual_relatedness_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_semantic_textual_relatedness_pipeline_en.md new file mode 100644 index 00000000000000..eebcf52cc4dff6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_semantic_textual_relatedness_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_semantic_textual_relatedness_pipeline pipeline XlmRoBertaForSequenceClassification from kietnt0603 +author: John Snow Labs +name: xlmr_semantic_textual_relatedness_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_semantic_textual_relatedness_pipeline` is a English model originally trained by kietnt0603. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_semantic_textual_relatedness_pipeline_en_5.5.0_3.0_1727156263340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_semantic_textual_relatedness_pipeline_en_5.5.0_3.0_1727156263340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_semantic_textual_relatedness_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_semantic_textual_relatedness_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_semantic_textual_relatedness_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/kietnt0603/xlmr-semantic-textual-relatedness + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file