AI HORIZON

NAIRR Workshop · Jetstream2

Running the Local LLM Notebook on Jetstream2

A step-by-step guide to launching a cloud instance and running the notebook that compares three local AI engines — Ollama, llama.cpp, and vLLM.

⏱ About 30–40 minutes 💻 No software needed on your laptop 🧩 Notebook 04

First — the big picture

Before the steps, here's what we're actually doing, in plain terms. Don't worry if some words are new — we'll define them as they come up.

🖥️ We're going to "spin up a compute instance"

An instance is simply a computer you borrow over the internet — you don't own the hardware, you rent it for as long as you need it. We'll start with a CPU‑only instance (no graphics card) to keep things simple and inexpensive.

📓 We'll work inside a "Jupyter notebook"

A notebook is one document that mixes live code, written explanations, and results — all in your web browser. The code is split into small blocks called cells, and we run them one at a time so we can read each explanation and watch what happens before moving to the next.

📦 The code lives on GitHub — we'll "clone" it

All the code and explanations are stored in a shared online folder on GitHub (called a repository, or "repo" for short). With a single command we copy that folder onto our cloud instance — that copy step is called cloning.

🧪 What this first example explores

To use an AI model (the "brain" — really just a big file of math), you need a program to actually run it. Those programs are called inferencing apps. We take one model and run it through three different apps — Ollama, llama.cpp, and vLLM — giving each the same questions (called prompts). The goal: is one app clearly the better choice? We compare their speed, memory use, and answer quality.

So the whole flow is just three moves:

1
Spin up our computer in the cloud (the instance)
2
Copy the code onto it from GitHub (clone the repo)
3
Open the notebook and run it one cell at a time, watching what each step does

Before you begin

Make sure you have:

🔗

First time connecting to Jetstream2? (do this once)

The very first time, you add your allocation to your Exosphere account. You won't repeat this for later workshops on the same allocation. You get your own workspace — your instances draw from the shared allocation, and you won't see other people's instances.

  1. Go to jetstream2.exosphere.app and click Add Allocation.
  2. Choose Jetstream2 as the provider when prompted.
  3. Sign in with your ACCESS ID — pick the same identity provider you registered with (often "ACCESS CI (XSEDE)", or your university).
  4. Exosphere lists the allocations you can use — select your instructor's project (e.g. NAIRR2xxxxx) and add it.
  5. You land on your dashboard for that allocation. Done — continue to Step 1.
1

Launch your instance

Signed in to Exosphere with your allocation added? Create a new virtual machine:

  • Click Create → Instance
  • Image: Featured-Ubuntu24
  • Flavor (size): m3.quad  (4 CPUs, 15 GB RAM — plenty for this demo)
  • If asked which allocation to use, pick your instructor's project
  • Leave the disk at the default 20 GBno volume needed
  • Click Create

Wait until the instance status shows Ready (usually 2–5 minutes). ☕

💾

The default disk is enough

This demo uses a small model that fits the default 20 GB disk, so you don't need to add a volume. (Avoiding volumes also lets a whole class run at once — volumes are limited per allocation.) If you ever switch to a much larger model, you'd then need more disk.

2

Open the Web Desktop

On your instance's card, choose:

Interactions → Web Desktop

This opens a full Linux desktop inside your browser, running on the instance. Once it loads, open the Terminal application from the desktop.

⚠️

Use Web Desktop, not "Console"

The Console is a raw boot screen with no browser and clumsy copy-paste. The Web Desktop includes a browser, which we need to view the notebook. The first load may take a minute and could ask for a passphrase shown on the instance page.

3

Install Jupyter

In the terminal, set up a clean Python environment and install Jupyter. Copy and paste these commands (run them one block at a time):

sudo apt update && sudo apt install -y python3-venv python3-pip git
python3 -m venv ~/llmdemo source ~/llmdemo/bin/activate pip install jupyterlab
💡

Why the ~/llmdemo environment?

Ubuntu 24 won't let you install Python packages system-wide. The virtual environment ("venv") is a private sandbox — it keeps Jupyter and the notebook's AI engines neatly separated from the rest of the system.

4

Download the workshop & start Jupyter

Still in the same terminal, download the workshop materials and launch Jupyter:

git clone https://github.com/TheAIHorizon/NAIRR_Workshops.git cd NAIRR_Workshops/workshops/01-jetstream-jupyter/notebooks jupyter lab

JupyterLab will open automatically in the desktop's Firefox browser. If it doesn't, copy the http://localhost:8888/...?token=... link the terminal prints and paste it into Firefox.

5

Open and run the notebook

In the JupyterLab file list (left side), open:

04_Local_LLM_Frameworks.ipynb

Run the cells from top to bottom — click a cell and press Shift + Enter, or use Run → Run All Cells.

Section 2 should print "CPU mode" — this is correct for the m3.quad instance. It compares Ollama and llama.cpp and skips vLLM (which needs a GPU).
Section 3 installs the AI engines. This is the slow step (a few minutes) — let it finish before moving on.
Section 9 shows the comparison table. The tokens/sec column is the headline: how fast each engine generates text on the same hardware.

Troubleshooting

ProblemFix
"command not found: jupyter" Your environment isn't active. Run source ~/llmdemo/bin/activate again, then retry.
Web Desktop is blank or stuck Give it a full minute on first load. If still stuck, close the tab and reopen Interactions → Web Desktop.
The notebook feels slow on the long "essay" prompt That's expected on a 4-CPU instance. In the notebook's config cell, the comments show how to switch to a smaller, faster model (qwen3:1.7b).
"externally-managed-environment" error on pip You skipped the venv. Run the Step 3 commands in order — the source .../activate line is required.
I'm done — how do I avoid using up credits? Back in the Jetstream2 portal, Shelve or Delete your instance when finished. A running instance keeps spending the allocation.
🎓

What this notebook teaches

It runs the same AI model three different ways. The big takeaway: the framework you choose mostly changes speed, while model size and quantization mostly change answer quality. The final table lets you see both at once.

🆘

Stuck? Get help from Jetstream2

Email help@jetstream-cloud.org, or drop in to Jetstream2 office hours — Tuesdays, 2:00 PM Eastern, on Zoom.

NAIRR Workshop Series · Workshop 01 — Jetstream2 & Jupyter · Notebook 04: Local LLM Frameworks
Part of the AI Horizon project · NSF #2528858 · CSUSB Center for Cyber and AI