Packages

p

com.johnsnowlabs.nlp

pretrained

package pretrained

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class PretrainedPipeline(downloadName: String, lang: String = "en", source: String = ResourceDownloader.publicLoc, parseEmbeddingsVectors: Boolean = false, diskLocation: Option[String] = None) extends Product with Serializable

    Represents a fully constructed and trained Spark NLP pipeline, ready to be used.

    Represents a fully constructed and trained Spark NLP pipeline, ready to be used. This way, a whole pipeline can be defined in 1 line. Additionally, the LightPipeline version of the model can be retrieved with member lightModel.

    For more extended examples see the Pipelines page and our Github Model Repository for available pipeline models.

    Example

    import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
    import com.johnsnowlabs.nlp.SparkNLP
    val testData = spark.createDataFrame(Seq(
    (1, "Google has announced the release of a beta version of the popular TensorFlow machine learning library"),
    (2, "Donald John Trump (born June 14, 1946) is the 45th and current president of the United States")
    )).toDF("id", "text")
    
    val pipeline = PretrainedPipeline("explain_document_dl", lang="en")
    
    val annotation = pipeline.transform(testData)
    
    annotation.select("entities.result").show(false)
    
    /*
    +----------------------------------+
    |result                            |
    +----------------------------------+
    |[Google, TensorFlow]              |
    |[Donald John Trump, United States]|
    +----------------------------------+
    */
    downloadName

    Name of the Pipeline Model

    lang

    Language of the defined pipeline (Default: "en")

    source

    Source where to get the Pipeline Model

  2. case class RepositoryMetadata(metadataFile: String, repoFolder: String, version: String, lastMetadataDownloaded: Timestamp, metadata: List[ResourceMetadata]) extends Product with Serializable

    Describes state of repository Repository could be any s3 folder that has metadata.json describing list of resources inside

  3. trait ResourceDownloader extends AnyRef
  4. case class ResourceMetadata(name: String, language: Option[String], libVersion: Option[Version], sparkVersion: Option[Version], readyToUse: Boolean, time: Timestamp, isZipped: Boolean = false, category: Option[String] = ..., checksum: String = "", annotator: Option[String] = None) extends Ordered[ResourceMetadata] with Product with Serializable
  5. case class ResourceRequest(name: String, language: Option[String] = None, folder: String = ResourceDownloader.publicLoc, libVersion: Version = ResourceDownloader.libVersion, sparkVersion: Version = ResourceDownloader.sparkVersion) extends Product with Serializable
  6. class S3ResourceDownloader extends ResourceDownloader

Value Members

  1. object PretrainedPipeline extends Serializable
  2. object PythonResourceDownloader
  3. object ResourceDownloader
  4. object ResourceMetadata extends Serializable
  5. object ResourceType extends Enumeration

Ungrouped