Readme

Part of Val Town Semantic Search.

Generates OpenAI embeddings for all public vals, and stores them in Turso, using the sqlite-vss extension.

  • Create the vals_embeddings and vss_vals_embeddings tables in Turso if they don't already exist.
  • Get all val names from the database of public vals, made by Achille Lacoin.
  • Get all val names from the vals_embeddings table and compute the difference (which ones are missing).
  • Iterate through all missing vals, get their code, get embeddings from OpenAI, and store the result in Turso.
  • When finished, update the vss_vals_embeddings table so we can efficiently query them with the sqlite-vss extension.
    • This is blocked by a bug in Turso that doesn't allow VSS indexes past a certain size.
  • Can now be searched using janpaul123/semanticSearchTurso.
Runs every 1 hrs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
import { decode as base64Decode, encode as base64Encode } from "https://deno.land/std@0.166.0/encoding/base64.ts";
import { createClient } from "https://esm.sh/@libsql/client@0.6.0/web";
import { sqlToJSON } from "https://esm.town/v/nbbaier/sqliteExportHelpers?v=22";
import { db as allValsDb } from "https://esm.town/v/sqlite/db?v=9";
import OpenAI from "npm:openai";
import { truncateMessage } from "npm:openai-tokens";
export default async function(interval: Interval) {
const sqlite = createClient({
url: "libsql://valsembeddings-jpvaltown.turso.io",
authToken: Deno.env.get("TURSO_AUTH_TOKEN_VALSEMBEDDINGS"),
});
sqlite.execute("CREATE TABLE IF NOT EXISTS vals_embeddings (id TEXT NOT NULL, embedding BLOB NOT NULL)");
sqlite.execute("CREATE VIRTUAL TABLE IF NOT EXISTS vss_vals_embeddings USING vss0(embedding(256))");
const allVals = await sqlToJSON(
await allValsDb.execute("SELECT author_username, name, version FROM vals WHERE LENGTH(code) > 10 ORDER BY name"),
) as any;
const existingEmbeddingsIds = new Set(
(await sqlite.execute("SELECT id FROM vals_embeddings")).rows.map((row) => row[0]),
);
function idForVal(val: any): string {
return `${val.author_username}!!${val.name}!!${val.version}`;
}
const newVals = [];
for (const val of allVals) {
const id = idForVal(val);
if (!existingEmbeddingsIds.has(id)) {
newVals.push(val);
}
}
const openai = new OpenAI();
for (const val of newVals) {
const code = (await allValsDb.execute({
sql: "SELECT code FROM vals WHERE author_username = :author_username AND name = :name AND version = :version",
args: val,
})).rows[0][0];
const embedding = await openai.embeddings.create({
model: "text-embedding-3-small",
input: truncateMessage(code, "text-embedding-3-small"),
encoding_format: "base64",
dimensions: 256,
});
const embeddingBinary = base64Decode(embedding.data[0].embedding as any);
if (embeddingBinary.length != 256 * 4) {
throw new Error(`Invalid embeddingBinary.length: ${embeddingBinary.length}`);
}
const id = idForVal(val);
console.log(id, embeddingBinary.length);
sqlite.execute({
sql: "INSERT INTO vals_embeddings (id, embedding) VALUES (:id, :embeddingBinary)",
args: { id, embeddingBinary },
});
console.log(`Inserted ${id}`);
}
sqlite.execute(
"INSERT INTO vss_vals_embeddings (rowid, embedding) SELECT rowid, embedding FROM vals_embeddings WHERE rowid NOT IN (SELECT rowid FROM vss_vals_embeddings)",
);
}
Val Town is a social website to write and deploy JavaScript.
Build APIs and schedule functions from your browser.
Comments
Nobody has commented on this val yet: be the first!
v74
May 29, 2024