Public
Script
Forked from janpaul123/jadeMacaw
v38
May 30, 2024
Readme
Part of Val Town Semantic Search.
Uses Val Town's blob storage to search embeddings of all vals, by downloading them all and iterating through all of them to compute distance. Slow and terrible, but it works!
- Get metadata from blob storage:
allValsBlob${dimensions}EmbeddingsMeta
(currentlyallValsBlob1536EmbeddingsMeta
), which has a list of all indexed vals and where their embedding is stored (batchDataIndex
points to the blob, andvalIndex
represents the offset within the blob).- The blobs have been generated by janpaul123/indexValsBlobs. It is not run automatically.
- Get all blobs with embeddings pointed to by the metadata, e.g.
allValsBlob1536EmbeddingsData_0
forbatchDataIndex
0. - Call OpenAI to generate an embedding for the search query.
- Go through all embeddings and compute cosine similarity with the embedding for the search query.
- Return list sorted by similarity.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import { decode as base64Decode, encode as base64Encode } from "https://deno.land/std@0.166.0/encoding/base64.ts";
import { createClient } from "https://esm.sh/@libsql/client@0.6.0/web";
import { sqlToJSON } from "https://esm.town/v/nbbaier/sqliteExportHelpers?v=22";
import { db as allValsDb } from "https://esm.town/v/sqlite/db?v=9";
import { blob } from "https://esm.town/v/std/blob";
import cosSimilarity from "npm:cos-similarity";
import _ from "npm:lodash";
import OpenAI from "npm:openai";
const dimensions = 1536;
export default async function semanticSearchPublicVals(query) {
const allValsBlobEmbeddingsMeta = (await blob.getJSON(`allValsBlob${dimensions}EmbeddingsMeta`)) ?? {};
const allBatchDataIndexes = _.uniq(Object.values(allValsBlobEmbeddingsMeta).map((item: any) => item.batchDataIndex));
const embeddingsBatches = [];
const allBatchDataIndexesPromises = [];
for (const batchDataIndex of allBatchDataIndexes) {
const embeddingsBatchBlobName = `allValsBlob${dimensions}EmbeddingsData_${batchDataIndex}`;
const promise = blob.get(embeddingsBatchBlobName).then((response) => response.arrayBuffer());
promise.then((data) => {
embeddingsBatches[batchDataIndex as any] = data;
console.log(`Loaded ${embeddingsBatchBlobName} (${data.byteLength} bytes)`);
});
allBatchDataIndexesPromises.push(promise);
}
await Promise.all(allBatchDataIndexesPromises);
const openai = new OpenAI();
const queryEmbedding = (await openai.embeddings.create({
model: "text-embedding-3-small",
input: query,
dimensions: dimensions,
})).data[0].embedding;
const res = [];
for (const id in allValsBlobEmbeddingsMeta) {
const meta = allValsBlobEmbeddingsMeta[id];
const embedding = new Float32Array(
embeddingsBatches[meta.batchDataIndex],
dimensions * 4 * meta.valIndex,
dimensions,
);
const [author_username, name, version] = id.split("!!");
res.push({ author_username, name, version, similarity: cosSimilarity(embedding as any, queryEmbedding) });
}
res.sort((a, b) => b.similarity - a.similarity);
console.log(`Processed ${res.length} records`);
return res.slice(0, 50);
}
const exampleQuery = "check dynamicland website for changes and email me";
console.log(await semanticSearchPublicVals(exampleQuery));
Val Town is a social website to write and deploy JavaScript.
Build APIs and schedule functions from your browser.
Comments
Nobody has commented on this val yet: be the first!
v38
May 30, 2024