SpanBert Coreference Resolution

Description

A coreference resolution model identifies expressions which refer to the same entity in a text. For example, given a sentence “John told Mary he would like to borrow a book from her.” the model will link “he” to “John” and “her” to “Mary”. This model is based on SpanBert, which is fine-tuned on the OntoNotes 5.0 data set.

Predicted Entities

Download Copy S3 URI

How to use

data = spark.createDataFrame([["John told Mary he would like to borrow a book from her."]]).toDF("text")
document_assembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
sentence_detector = SentenceDetector().setInputCols(["document"]).setOutputCol("sentences")
tokenizer = Tokenizer().setInputCols(["sentences"]).setOutputCol("tokens")
corefResolution = SpanBertCorefModel().pretrained("spanbert_base_coref").setInputCols(["sentences", "tokens"]).setOutputCol("corefs")
pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, corefResolution])

model = pipeline.fit(self.data)

model.transform(self.data).selectExpr("explode(corefs) AS coref").selectExpr("coref.result as token", "coref.metadata").show(truncate=False)

val data = Seq("John told Mary he would like to borrow a book from her.").toDF("text")
val document = new DocumentAssembler().setInputCol("text").setOutputCol("document")
val sentencer = SentenceDetector().setInputCols(Array("document")).setOutputCol("sentences")
val tokenizer = new Tokenizer().setInputCols(Array("sentences")).setOutputCol("tokens")
val corefResolution = SpanBertCorefModel.pretrained("spanbert_base_coref").setInputCols(Array("sentences", "tokens")).setOutputCol("corefs")

val pipeline = new Pipeline().setStages(Array(document, sentencer, tokenizer, corefResolution))

val result = pipeline.fit(data).transform(data)

result.selectExpr("explode(corefs) as coref").selectExpr("coref.result as token", "coref.metadata").show(truncate = false)

import nlu
nlu.load("en.coreference.spanbert").predict("""John told Mary he would like to borrow a book from her.""")

Results

+-----+------------------------------------------------------------------------------------+
|token|metadata                                                                            |
+-----+------------------------------------------------------------------------------------+
|John |{head.sentence -> -1, head -> ROOT, head.begin -> -1, head.end -> -1, sentence -> 0}|
|he   |{head.sentence -> 0, head -> John, head.begin -> 0, head.end -> 3, sentence -> 0}   |
|Mary |{head.sentence -> -1, head -> ROOT, head.begin -> -1, head.end -> -1, sentence -> 0}|
|her  |{head.sentence -> 0, head -> Mary, head.begin -> 10, head.end -> 13, sentence -> 0} |
+-----+------------------------------------------------------------------------------------+

Model Information

Model Name:	spanbert_base_coref
Compatibility:	Spark NLP 4.0.0+
License:	Open Source
Edition:	Official
Input Labels:	[sentences, tokens]
Output Labels:	[corefs]
Language:	en
Size:	566.3 MB
Case sensitive:	true

References

OntoNotes 5.0

Benchmarking

label score
f1  77.7

https://github.com/mandarjoshi90/coref

PREVIOUSXLNet Large CoNLL-03 NER Pipeline

NEXTClean Slang in Texts