Legal BERT Base Uncased Embedding

Description

LEGAL-BERT is a family of BERT models for the legal domain, intended to assist legal NLP research, computational law, and legal technology applications. To pre-train the different variations of LEGAL-BERT, we collected 12 GB of diverse English legal text from several fields (e.g., legislation, court cases, contracts) scraped from publicly available resources. Sub-domains variants (CONTRACTS-, EURLEX-, ECHR-) and/or general LEGAL-BERT perform better than using BERT out of the box for domain-specific tasks. A light-weight model (33% the size of BERT-BASE) pre-trained from scratch on legal data with competitive perfomance is also available.

Predicted Entities

Download Copy S3 URI

How to use

embeddings = BertEmbeddings.pretrained("bert_base_uncased_legal", "en") \
.setInputCols("sentence", "token") \
.setOutputCol("embeddings")

nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings])

val embeddings = BertEmbeddings.pretrained("bert_base_uncased_legal", "en")
.setInputCols("sentence", "token")
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings))

import nlu
nlu.load("en.embed.bert.base_uncased_legal").predict("""Put your text here.""")

Model Information

Model Name:	bert_base_uncased_legal
Compatibility:	Spark NLP 3.2.2+
License:	Open Source
Edition:	Official
Input Labels:	[sentence, token]
Output Labels:	[bert]
Language:	en
Case sensitive:	true

Data Source

The model is imported from: https://huggingface.co/nlpaueb/legal-bert-base-uncased

PREVIOUSSpanish BERT Base Uncased Embedding

NEXTSentiment Analysis of French texts