In a surprising twist on AI implementation, Maciej “YTM/Elysium” Witkowiak has ported a compact version of Meta’s LLaMA 2 to the Commodore 64. This isn’t a chatbot—it’s a story generator that runs entirely on vintage hardware, assuming that hardware includes at least a 2MB RAM Expansion Unit (REU). The project, titled llama2.c64, is featured in a new video from projectCD.Chronicles and is built using oscar64, a C-based development toolkit for the C64.
What It Does
Think of it like giving a toddler the beginning of a bedtime story and letting them finish it with whatever limited vocabulary and sentence structure they can manage. That’s what llama2.c64 simulates. It uses a pre-trained AI model based on the TinyStories dataset—designed to reflect the linguistic capabilities of 3- to 4-year-olds. You provide a short prompt, and the system completes the story.
The model runs inference entirely on the C64, using a stripped-down LLaMA2 model—the 260K parameter tinystories variant. While the model isn’t capable of conversation, it’s fully capable of generating short, grammatically basic, but coherent tales.
Requirements
To use llama2.c64, you’ll need:
- A Commodore 64 (real or emulated)
- A RAM Expansion Unit with at least 2MB of memory
- One of the following environments:
On VICE Emulator:
x64sc -warp -reu -reusize 2048 -reuimage weights.reu llama2c64.prg
On Ultimate II+:
- Enable REU in the Cartridge Settings.
- Set REU size to 2MB.
- Navigate to the llama2.c64 folder.
- Load
weights.reu
into REU. - Run either
llama2c64.prg
or the compressedllama2exo.prg
.
The optional .exo
version uses Exomizer to reduce the size for real hardware loading convenience.
Building and Testing
The project includes a Makefile that supports:
make build
— compiles the codemake test
— launches the emulator with the correct settingsmake clean
— clears build artifacts
Missing model files (weights.reu, config.bin, tokenizer.bin) are auto-generated from the source stories260K.bin
and tok512.bin
.
How It Works
The model setup includes:
- Tokenizer: Encoded as
tokenizer.bin
, this contains string lengths, vocabulary data, and dictionary offsets. - Config File: Model parameters saved in
config.bin
. - Weights File: Raw float32 weights saved in
weights.reu
, padded to standard REU sizes.
All preprocessing is handled by the provided Python script generate-model-files.py
.
Performance and Limitations
It’s not fast. Each token is generated with the pace you’d expect from an 8-bit CPU doing matrix math. But it works. And it works without any need for cloud computing or even a GPU. The process is local, consistent, and surprisingly fun.
What you get:
- Low power use
- Fully offline generation
- Complete control over your prompts and results
- No external APIs or surveillance
What you give up:
- Speed. This isn’t for real-time applications.
- Model size. You’re limited to models that fit within REU constraints (roughly 8MB max usable space).
A Sample Output
When prompted with the word Zoo, the model continues:
“Zoo was a little girl named Lily. She loved to play outside in the park. One day, she saw a big, red ball. She wanted to play with it, but she didn’t want to play with…”
The same prompt on the original llama2.c
source (run with temperature=0.0) produces nearly identical output, confirming the C64 implementation mirrors its modern counterpart well—just significantly slower.
Technical Enhancements
To improve the mathematical accuracy of AI computations, Maciej implemented custom versions of sine, cosine, and exponentiation. These are more precise than oscar64’s built-ins and draw inspiration from algorithms found in the original C64 BASIC ROM.