Packages

class TextSplitter extends AnyRef

Splits texts recursively to match given length

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. TextSplitter
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new TextSplitter(chunkSize: Int, chunkOverlap: Int, keepSeparators: Boolean, patternsAreRegex: Boolean, trimWhitespace: Boolean, lengthFunction: (String) ⇒ Int = _.length)

    chunkSize

    Length of the text chunks, measured by lengthFunction

    chunkOverlap

    Overlap of the text chunks

    keepSeparators

    Whether to keep separators in the final chunks

    patternsAreRegex

    Whether to interpret split patterns as regex

    trimWhitespace

    Whether to trim the whitespace from the final chunks

    lengthFunction

    Function to measure chunk length

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. def escapeRegexIfNeeded(text: String): String
  9. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  11. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  12. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  13. def joinDocs(currentDoc: Seq[String], separator: String): String
  14. def mergeSplits(splits: Seq[String], separator: String): Seq[String]

    Combines smaller text chunks into one that has about the size of chunk size.

    Combines smaller text chunks into one that has about the size of chunk size.

    splits

    Splits from the previous separator

    separator

    The current separator

  15. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  16. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  17. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  18. def splitText(text: String, separators: Seq[String]): Seq[String]

    Splits a text into chunks of roughly given chunk size.

    Splits a text into chunks of roughly given chunk size. The separators are given in a list and will be used in order.

    Inspired by LangChain's RecursiveCharacterTextSplitter.

    text

    Text to split

    separators

    List of separators in decreasing priority

  19. def splitTextWithRegex(text: String, separator: String): Seq[String]

    Splits the given text with the separator.

    Splits the given text with the separator.

    The separator is assumed to be regex (which was optionally escaped).

    text

    Text to split

    separator

    Regex as String

  20. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  21. def toString(): String
    Definition Classes
    AnyRef → Any
  22. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from AnyRef

Inherited from Any

Ungrouped