`sparknlp.annotator.ner.zero_shot_ner_model`#

Module Contents#

Classes#

ZeroShotNerModel

ZeroShotNerModel implements zero shot named entity recognition by utilizing RoBERTa

class ZeroShotNerModel(classname='com.johnsnowlabs.nlp.annotators.ner.dl.ZeroShotNerModel', java_model=None)[source]#

ZeroShotNerModel implements zero shot named entity recognition by utilizing RoBERTa transformer models fine tuned on a question answering task.

Its input is a list of document annotations and it automatically generates questions which are used to recognize entities. The definitions of entities is given by a dictionary structures, specifying a set of questions for each entity. The model is based on RoBertaForQuestionAnswering.

For more extended examples see the Examples.

Pretrained models can be loaded with pretrained of the companion object:

zeroShotNer = ZeroShotNerModel.pretrained() \
    .setInputCols(["document"]) \
    .setOutputCol("zer_shot_ner")

Input Annotation types	Output Annotation type
`DOCUMENT, TOKEN`	`NAMED_ENTITY`

Parameters:

entityDefinitions

A dictionary with definitions of named entities. The keys of dictionary are the entity labels and the values are lists of questions. For example:

{
“CITY”: [“Which city?”, “Which town?”], “NAME”: [“What is her name?”, “What is his name?”]}

predictionThreshold

Minimal confidence score to encode an entity (Default: 0.01f)

ignoreEntities

A list of entity labels which are discarded from the output.

References

RoBERTa: A Robustly Optimized BERT Pretraining Approach : for details about the RoBERTa transformer RoBertaForQuestionAnswering : for the SparkNLP implementation of RoBERTa question answering

Examples

>>> document_assembler = DocumentAssembler() \
...     .setInputCol("text") \
...     .setOutputCol("document")
>>> sentence_detector = SentenceDetector() \
...     .setInputCols(["document"]) \
...     .setOutputCol("sentence")
>>> tokenizer = Tokenizer() \
...     .setInputCols(["sentence"]) \
...     .setOutputCol("token")
>>> zero_shot_ner = ZeroShotNerModel() \
...     .pretrained() \
...     .setEntityDefinitions(
...         {
...             "NAME": ["What is his name?", "What is my name?", "What is her name?"],
...             "CITY": ["Which city?", "Which is the city?"]
...         }) \
...     .setInputCols(["sentence", "token"]) \
...     .setOutputCol("zero_shot_ner") \
>>> data = spark.createDataFrame(
...         [["My name is Clara, I live in New York and Hellen lives in Paris."]]
...     ).toDF("text")
>>> Pipeline() \
...     .setStages([document_assembler, sentence_detector, tokenizer, zero_shot_ner]) \
...     .fit(data) \
...     .transform(data) \
...     .selectExpr("document", "explode(zero_shot_ner) AS entity") \
...     .select(
...         "document.result",
...         "entity.result",
...         "entity.metadata.word",
...         "entity.metadata.confidence",
...         "entity.metadata.question") \
...     .show(truncate=False)
+-----------------------------------------------------------------+------+------+----------+------------------+
|result                                                           |result|word  |confidence|question          |
+-----------------------------------------------------------------+------+------+----------+------------------+
|[My name is Clara, I live in New York and Hellen lives in Paris.]|B-CITY|Paris |0.5328949 |Which is the city?|
|[My name is Clara, I live in New York and Hellen lives in Paris.]|B-NAME|Clara |0.9360068 |What is my name?  |
|[My name is Clara, I live in New York and Hellen lives in Paris.]|B-CITY|New   |0.83294415|Which city?       |
|[My name is Clara, I live in New York and Hellen lives in Paris.]|I-CITY|York  |0.83294415|Which city?       |
|[My name is Clara, I live in New York and Hellen lives in Paris.]|B-NAME|Hellen|0.45366877|What is her name? |
+-----------------------------------------------------------------+------+------+----------+------------------+

inputAnnotatorTypes[source]#

outputAnnotatorType = 'named_entity'[source]#

name = 'ZeroShotNerModel'[source]#

predictionThreshold[source]#

ignoreEntities[source]#

setPredictionThreshold(threshold)[source]#

Sets the minimal confidence score to encode an entity

Parameters:

thresholdfloat: minimal confidence score to encode an entity (default is 0.1)

setEntityDefinitions(definitions)[source]#

Set entity definitions

Parameters:

definitionsdict[str, list[str]]

getClasses()[source]#: Returns the list of entities which are recognized

static pretrained(name='zero_shot_ner_roberta', lang='en', remote_loc=None)[source]#

Downloads and loads a pretrained model.

Parameters:

namestr, optional: Name of the pretrained model, by default “roberta_base_qa_squad2”
langstr, optional: Language of the pretrained model, by default “en”
remote_locstr, optional: Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise.

Returns:

RoBertaForQuestionAnswering: The restored model

static load(path)[source]#: Reads an ML instance from the input path, a shortcut of read().load(path).

sparknlp.annotator.ner.zero_shot_ner_model#

Module Contents#

Classes#

`sparknlp.annotator.ner.zero_shot_ner_model`#