Description
This Named Entity Recognizer is based on a CRF Algorithm
Live Demo Open in Colab Download Copy S3 URI
How to use
documentAssembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
tokenizer = Tokenizer().setInputCols(["document"]).setOutputCol("token")
posTagger = PerceptronModel.pretrained().setInputCols(["token", "document"]).setOutputCol("pos")
embeds = WordEmbeddingsModel.pretrained().setInputCols(["token", "document"]).setOutputCol("embeddings")
nerCrf = NerCrfModel.pretrained().setInputCols(["document", "token","pos", "embeddings"]).setOutputCol("ner")
pipeline = Pipeline(stages=[documentAssembler, tokenizer, posTagger, embeds, nerCrf ])
df = spark.createDataFrame([['Donald Trump and Angela Merkel dont share many oppinions']], ["text"])
result = pipeline.fit(df).transform(df)
result.select("ner.result").show(truncate = False )
result.select("ner").show(truncate = False)
val documentAssembler = new DocumentAssembler().setInputCol("text").setOutputCol("document")
val tokenizer = new Tokenizer().setInputCols(Array("document")).setOutputCol("token")
val posTagger = PerceptronModel.pretrained().setInputCols(Array("token", "document")).setOutputCol("pos")
val embeds = WordEmbeddingsModel.pretrained().setInputCols(Array("token", "document")).setOutputCol("embeddings")
val nerCrf = NerCrfModel.pretrained().setInputCols(Array("document", "token","pos", "embeddings")).setOutputCol("ner")
val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, posTagger, embeds, nerCrf))
val df = Seq("Donald Trump and Angela Merkel dont share many oppinions").toDF("text")
val result = pipeline.fit(df).transform(df)
result.select("ner.result").show(false)
result.select("ner").show(false)
nlu.load('ner.crf').predoct("Donald Trump and Angela Merkel dont share many oppinions")
Results
+-------------------------------------------+
|result |
+-------------------------------------------+
|[I-PER, I-PER, O, I-PER, I-PER, O, O, O, O]|
+-------------------------------------------+
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|ner |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[[named_entity, 0, 5, I-PER, [word -> Donald], []], [named_entity, 7, 11, I-PER, [word -> Trump], []], [named_entity, 13, 15, O, [word -> and], []], [named_entity, 17, 22, I-PER, [word -> Angela], []], [named_entity, 24, 29, I-PER, [word -> Merkel], []], [named_entity, 31, 34, O, [word -> dont], []], [named_entity, 36, 40, O, [word -> share], []], [named_entity, 42, 45, O, [word -> many], []], [named_entity, 47, 55, O, [word -> oppinions], []]]|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Model Information
Model Name: | ner_crf |
Compatibility: | Spark NLP 3.0.0+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence, token, pos, embeddings] |
Output Labels: | [ner] |
Language: | en |