package dl
- Alphabetic
- Public
- All
Type Members
-
class
LanguageDetectorDL extends AnnotatorModel[LanguageDetectorDL] with HasSimpleAnnotate[LanguageDetectorDL] with WriteTensorflowModel with HasEngine
Language Identification and Detection by using CNN and RNN architectures in TensorFlow.
Language Identification and Detection by using CNN and RNN architectures in TensorFlow.
LanguageDetectorDL
is an annotator that detects the language of documents or sentences depending on the inputCols. The models are trained on large datasets such as Wikipedia and Tatoeba. Depending on the language (how similar the characters are), the LanguageDetectorDL works best with text longer than 140 characters. The output is a language code in Wiki Code style.Pretrained models can be loaded with
pretrained
of the companion object:Val languageDetector = LanguageDetectorDL.pretrained() .setInputCols("sentence") .setOutputCol("language")
The default model is
"ld_wiki_tatoeba_cnn_21"
, default language is"xx"
(meaning multi-lingual), if no values are provided. For available pretrained models please see the Models Hub.For extended examples of usage, see the Examples And the LanguageDetectorDLTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotators.ld.dl.LanguageDetectorDL import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val languageDetector = LanguageDetectorDL.pretrained() .setInputCols("document") .setOutputCol("language") val pipeline = new Pipeline() .setStages(Array( documentAssembler, languageDetector )) val data = Seq( "Spark NLP is an open-source text processing library for advanced natural language processing for the Python, Java and Scala programming languages.", "Spark NLP est une bibliothèque de traitement de texte open source pour le traitement avancé du langage naturel pour les langages de programmation Python, Java et Scala.", "Spark NLP ist eine Open-Source-Textverarbeitungsbibliothek für fortgeschrittene natürliche Sprachverarbeitung für die Programmiersprachen Python, Java und Scala." ).toDF("text") val result = pipeline.fit(data).transform(data) result.select("language.result").show(false) +------+ |result| +------+ |[en] | |[fr] | |[de] | +------+
- trait ReadLanguageDetectorDLTensorflowModel extends ReadTensorflowModel
- trait ReadablePretrainedLanguageDetectorDLModel extends ParamsAndFeaturesReadable[LanguageDetectorDL] with HasPretrained[LanguageDetectorDL]
Value Members
-
object
LanguageDetectorDL extends ReadablePretrainedLanguageDetectorDLModel with ReadLanguageDetectorDLTensorflowModel with Serializable
This is the companion object of LanguageDetectorDL.
This is the companion object of LanguageDetectorDL. Please refer to that class for the documentation.