Description
Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. clause_classification
is a English model originally trained by Preetiha
.
Predicted Entities
Cooperation
, No Conflicts
, Payments
, Confidentiality
, Submission To Jurisdiction
, Jurisdictions
, Authority
, Disability
, Costs
, Insurances
, Consent To Jurisdiction
, Tax Withholdings
, Base Salary
, Benefits
, Construction
, Solvency
, Interpretations
, Miscellaneous
, Definitions
, Qualifications
, Liens
, Erisa
, Waiver Of Jury Trials
, Financial Statements
, Defined Terms
, Integration
, Modifications
, Assignments
, Existence
, Arbitration
, Successors
, Applicable Laws
, Venues
, Specific Performance
, Further Assurances
, Amendments
, Headings
, Assigns
, Non-Disparagement
, Powers
, Duties
, Authorizations
, Taxes
, Counterparts
, Terminations
, Disclosures
, Agreements
, Notices
, Books
, Positions
, Titles
, Binding Effects
, Change In Control
, Closings
, Capitalization
, Entire Agreements
, Representations
, Compliance With Laws
, Death
, Anti-Corruption Laws
, Litigations
, Withholdings
, Effective Dates
, Adjustments
, Approvals
, Subsidiaries
, General
, Brokers
, Severability
, Remedies
, Indemnifications
, Indemnity
, Forfeitures
, Sanctions
, Survival
, Publicity
, Vacations
, Expenses
, Fees
, Waivers
, Intellectual Property
, Terms
, Employment
, Consents
, Use Of Proceeds
, Records
, Governing Laws
, Effectiveness
, Transactions With Affiliates
, Releases
, Vesting
, Interests
, Organizations
, Enforcements
, Warranties
, Participations
, Sales
, No Waivers
, No Defaults
, Enforceability
How to use
documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
tokenizer = Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")
sequenceClassifier_loaded = DistilBertForSequenceClassification.pretrained("distilbert_sequence_classifier_clause_classification","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("class")
pipeline = Pipeline(stages=[documentAssembler, tokenizer,sequenceClassifier_loaded])
data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text")
result = pipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new Tokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")
val sequenceClassifier_loaded = DistilBertForSequenceClassification.pretrained("distilbert_sequence_classifier_clause_classification","en")
.setInputCols(Array("document", "token"))
.setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer,sequenceClassifier_loaded))
val data = Seq("PUT YOUR STRING HERE").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.classify.distil_bert.by_preetiha").predict("""PUT YOUR STRING HERE""")
Model Information
Model Name: | distilbert_sequence_classifier_clause_classification |
Compatibility: | Spark NLP 4.1.0+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [document, token] |
Output Labels: | [ner] |
Language: | en |
Size: | 250.0 MB |
Case sensitive: | true |
Max sentence length: | 128 |
References
- https://huggingface.co/Preetiha/clause_classification