Ingest

  1. Document Loader

  2. Text Splitter

  3. SupabaseVectorStore

  4. vectorstore.addDocuments

Helpers#

Before we ingest data, let's set up some helper functions.

Create a api/vectorstore/collection/route.ts file.

Add a GET to recieve all user Collections.

Add a POST to create and index a new Collection.

This route calls the create_documents_table and create_hnsw_index SQL functions that we set up in the past section by using:

Then it inserts a row to the collections table that we set up to track user Collections.

Diagram#

ingest

Route#

First let's create a separate POST route for ingesting at /api/vectorstore/route.ts.

Document Loaders#

Document Loaders read Documents from file or Blob and transform them into a universal langchain Document class. After we load all the files we can treat them the same.

Our application accepts .txt, .pdf, and .csv. We'll add more Splitters in a second, but first, let's consider a Text Splitter.

We only need to import and initialize the Loader on a file.

Document Loaders are often third party integrations and you may need to install a dependency depending on which ones you choose.

Splitting text#

We can use a TextSplitter to split documents into smaller chunks. This is useful to avoid confusing the model with unneccessary information and also to keep our context small.

One parameter to consider is "overlap". Overlapping can help retain meaning in text while creating more chunks. Let's look at an example to see how it can be useful.

chunk overlap
  • The example text is a simple phrase: "Jack saw Jill run to see him sing."

  • And our constraint is that we have to split the text into 3 words each.

  • The chunks returned are "run to see" and "Jill run to".

  • The question is: who is running?

Results#

This lesson preview is part of the Langchain.js Bootcamp course and can be unlocked immediately with a \newline Pro subscription or a single-time purchase. Already have access to this course? Log in here.

Unlock This Course

Get unlimited access to Langchain.js Bootcamp, plus 70+ \newline books, guides and courses with the \newline Pro subscription.

Thumbnail for the \newline course Langchain.js Bootcamp