LLaMA2 on Commodore 64: AI Story Generator with REU

05/05/2025

450

In a surprising twist on AI implementation, Maciej “YTM/Elysium” Witkowiak has ported a compact version of Meta’s LLaMA 2 to the Commodore 64. This isn’t a chatbot—it’s a story generator that runs entirely on vintage hardware, assuming that hardware includes at least a 2MB RAM Expansion Unit (REU). The project, titled llama2.c64, is featured in a new video from projectCD.Chronicles and is built using oscar64, a C-based development toolkit for the C64.

What It Does

Think of it like giving a toddler the beginning of a bedtime story and letting them finish it with whatever limited vocabulary and sentence structure they can manage. That’s what llama2.c64 simulates. It uses a pre-trained AI model based on the TinyStories dataset—designed to reflect the linguistic capabilities of 3- to 4-year-olds. You provide a short prompt, and the system completes the story.

The model runs inference entirely on the C64, using a stripped-down LLaMA2 model—the 260K parameter tinystories variant. While the model isn’t capable of conversation, it’s fully capable of generating short, grammatically basic, but coherent tales.

Requirements

To use llama2.c64, you’ll need:

A Commodore 64 (real or emulated)
A RAM Expansion Unit with at least 2MB of memory
One of the following environments:

On VICE Emulator:

x64sc -warp -reu -reusize 2048 -reuimage weights.reu llama2c64.prg

On Ultimate II+:

Enable REU in the Cartridge Settings.
Set REU size to 2MB.
Navigate to the llama2.c64 folder.
Load weights.reu into REU.
Run either llama2c64.prg or the compressed llama2exo.prg.

The optional .exo version uses Exomizer to reduce the size for real hardware loading convenience.

Building and Testing

The project includes a Makefile that supports:

make build — compiles the code
make test — launches the emulator with the correct settings
make clean — clears build artifacts

Missing model files (weights.reu, config.bin, tokenizer.bin) are auto-generated from the source stories260K.bin and tok512.bin.

How It Works

The model setup includes:

Tokenizer: Encoded as tokenizer.bin, this contains string lengths, vocabulary data, and dictionary offsets.
Config File: Model parameters saved in config.bin.
Weights File: Raw float32 weights saved in weights.reu, padded to standard REU sizes.

All preprocessing is handled by the provided Python script generate-model-files.py.

Performance and Limitations

It’s not fast. Each token is generated with the pace you’d expect from an 8-bit CPU doing matrix math. But it works. And it works without any need for cloud computing or even a GPU. The process is local, consistent, and surprisingly fun.

What you get:

Low power use
Fully offline generation
Complete control over your prompts and results
No external APIs or surveillance

What you give up:

Speed. This isn’t for real-time applications.
Model size. You’re limited to models that fit within REU constraints (roughly 8MB max usable space).

A Sample Output

When prompted with the word Zoo, the model continues:

“Zoo was a little girl named Lily. She loved to play outside in the park. One day, she saw a big, red ball. She wanted to play with it, but she didn’t want to play with…”

The same prompt on the original llama2.c source (run with temperature=0.0) produces nearly identical output, confirming the C64 implementation mirrors its modern counterpart well—just significantly slower.

Technical Enhancements

To improve the mathematical accuracy of AI computations, Maciej implemented custom versions of sine, cosine, and exponentiation. These are more precise than oscar64’s built-ins and draw inspiration from algorithms found in the original C64 BASIC ROM.