This val shows you how you can deploy an app using Cerebras Inference on Val Town in seconds.
Cerebras is an American chip manufacturer that produces large wafer chips that deliver mind-blowing LLM inference speeds. As of this writing on Jan 17, 2025, Cerebras Inference provides Llama 3.18b, 3.1 70b, and 3.370b at a jaw-dropping 2k tokens per second – that's 50x faster than what the frontier labs produce.
Llama 3.370b at 2k tokens per second is particularly noteworthy because it is a GPT-4-class model. This level of intelligence at that level of speed will unlock whole new classes of applications.
There are two ways to get started:
-
Fork this app and customize it (or ask Townie AI to customize it)
-
Start a new chat with Townie AI and copy & paste the following instructions:.
Use Cerebras for AI on the backend like so:
const { OpenAI } = await import("https://esm.sh/openai");
const client = new OpenAI({
apiKey: "YOUR_CEREBRAS_API_KEY",
baseURL: "https://api.cerebras.ai/v1"
});
const response = await client.chat.completions.create({
model: "llama-3.3-70b",
messages: [],
});
const generatedText = response.choices[0].message.content;
For example, the val in this template was created by asking Townie AI to "Make a chatgpt clone", then I hit shift-enter twice, and then pasted in the instructions on how to use Cerebras from above, then hit enter. Townie built this app on its first try, in about 20 seconds.
- Cerebras Searcher - a Perplexity clone that uses the SerpAPI to do RAG and summaries with Cerebras (requires a SerpAPI key)
- Cerebras Coder - an app that generates websites in a second with Cerebras
- Cerebras Debater - an app that truly shows Cerebras's speed: it's Cerebras talking to Cerebras in a debate