Ingest
Document Loader
Text Splitter
SupabaseVectorStore
vectorstore.addDocuments
Helpers#
Before we ingest data, let's set up some helper functions.
Create a api/vectorstore/collection/route.ts
file.
xxxxxxxxxx
import { createSupabaseClient } from '@/lib/serverUtils';
import { NextResponse } from "next/server";
Add a GET
to recieve all user Collections.
xxxxxxxxxx
export async function GET() {
const supabase = createSupabaseClient()
const {data: {session}} = await supabase.auth.getSession()
if (!session?.user?.id) {
return new Response('Unauthorized', {
status: 401
})
}
const { data, error } = await supabase.from("dataset").select()
if (error) {
return NextResponse.json({error})
}
return NextResponse.json({data})
}
Add a POST
to create and index a new Collection.
xxxxxxxxxx
export async function POST(req: Request) {
const supabase = createSupabaseClient()
const {data: {session}} = await supabase.auth.getSession()
if (!session?.user?.id) {
return new Response('Unauthorized', {
status: 401
})
}
const { tableName } = await req.json()
const { error: createError } = await supabase.rpc('create_documents_table', { table_name: tableName })
if (createError) {
return NextResponse.json({createError})
}
const { error: indexError } = await supabase.rpc('create_hnsw_index', { table_name: tableName })
if (indexError) {
return NextResponse.json({indexError})
}
const { error: collectionsError, data } = await supabase.from("collections").insert({ collection_name: tableName, user_id: session?.user?.id }).select()
if (indexError) {
return NextResponse.json({collectionsError})
}
return NextResponse.json({message: 'success', data}, { status: 201 })
}
This route calls the create_documents_table
and create_hnsw_index
SQL functions that we set up in the past section by using:
xxxxxxxxxx
supabase.rpc(functionName, { arguments })
Then it inserts a row to the collections
table that we set up to track user Collections.
Diagram#

Route#
First let's create a separate POST
route for ingesting at /api/vectorstore/route.ts
.
Document Loaders#
Document Loaders read Documents from file or Blob and transform them into a universal langchain Document class. After we load all the files we can treat them the same.
Our application accepts .txt
, .pdf
, and .csv
. We'll add more Splitters in a second, but first, let's consider a Text Splitter.
We only need to import and initialize the Loader on a file.
xxxxxxxxxx
import { TextLoader } from "langchain/document_loaders/fs/text";
xxxxxxxxxx
const loader = loaderMap(file)
Document Loaders are often third party integrations and you may need to install a dependency depending on which ones you choose.
Splitting text#
We can use a TextSplitter
to split documents into smaller chunks. This is useful to avoid confusing the model with unneccessary information and also to keep our context small.
xxxxxxxxxx
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
One parameter to consider is "overlap". Overlapping can help retain meaning in text while creating more chunks. Let's look at an example to see how it can be useful.

The example text is a simple phrase: "Jack saw Jill run to see him sing."
And our constraint is that we have to split the text into 3 words each.
The chunks returned are "run to see" and "Jill run to".
The question is: who is running?
Results#
This lesson preview is part of the Langchain.js Bootcamp course and can be unlocked immediately with a \newline Pro subscription or a single-time purchase. Already have access to this course? Log in here.
Get unlimited access to Langchain.js Bootcamp, plus 70+ \newline books, guides and courses with the \newline Pro subscription.
