The Morning Situation
It was supposed to be a quick task: rebuild the PicGen Docker containers with the latest environment variables. PicGen is my self-hosted AI image generation platform (think DALL-E but self-managed), running on a small VPS behind Cloudflare at pic.crazyai.uk.
Simple, right? Just docker compose down && docker compose up -d. In and out, 5 minutes.
Famous last words.
Act 1: The Disk Space Crisis
Before I could even start the build, I hit the first wall:
ERROR: failed to solve: write /app/.venv/lib/python3.11/site-packages/...: no space left on device
Disk usage: 34G / 40G (91%). On a 40GB VPS, that’s basically full.
A quick du -sh investigation revealed the culprit: OpenClaw, a previous AI agent framework I’d been experimenting with, was sitting at 9.1GB in ~/.openclaw/. I’d already migrated to Hermes Agent weeks ago, but never got around to cleaning up the old installation.
The Cleanup
First, safety — I had an Obsidian vault in there with notes. Quick git commit and push:
cd ~/.openclaw/workspace/notes/obsidian-vault
git add -A && git commit -m "backup before openclaw cleanup" && git push
Then the teardown:
# Stop the systemd service
systemctl --user disable --now openclaw-gateway.service
# Remove the service file
rm ~/.config/systemd/user/openclaw-gateway.service
# Delete the 9.1GB directory
rm -rf ~/.openclaw
# Uninstall the CLI (442 packages!)
npm rm -g openclaw
But I wasn’t done. A broader sweep found more junk:
- npm cache: 2.5GB
.cache/directory: 4.1GB (camoufox, uv, pnpm, pip caches)/tmpartifacts: ~280MB of old Playwright downloads
Total freed: ~13GB. Disk went from 91% to 54%. Breathing room restored.
Act 2: Docker Permission Hell
With disk space available, I ran the build. The image built fine. The container started. And then:
PermissionError: [Errno 13] Permission denied: '/app/.venv/bin/python3'
This one took a while to figure out. Here’s what happened:
The Root Cause
The PicGen API Dockerfile had been updated to use uv (the fast Python package manager from Astral) instead of pip. The uv sync command creates a virtual environment, and by default it symlinks the Python binary to ~/.local/share/uv/python/.
The problem? In the Docker container, ~ resolves to /root/ during build (running as root), but the app runs as appuser (a non-root user for security). The symlink pointed to /root/.local/share/uv/python/cpython-3.11.*/bin/python3 — a path that appuser can’t access.
Failed Attempts
Attempt 1: Set UV_PYTHON_INSTALL_DIR=/usr/local/share/uv/python
ENV UV_PYTHON_INSTALL_DIR=/usr/local/share/uv/python
Didn’t work. The venv was already created with the old path, and the symlinks were baked in.
Attempt 2: Use --no-venv flag
uv sync --frozen --no-dev --no-venv
Not a valid argument. uv sync requires a venv.
The Fix: Multi-Stage Build
The solution was to use a proper multi-stage Docker build, following the official uv Docker guide:
# ── Builder stage ──
FROM python:3.11-slim AS builder
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev --no-editable
COPY . .
# ── Runtime stage ──
FROM python:3.11-slim
RUN useradd -m appuser
WORKDIR /app
COPY --from=builder --chown=appuser:appuser /app/.venv ./.venv
COPY --from=builder --chown=appuser:appuser /app/app ./app
COPY --from=builder --chown=appuser:appuser /app/alembic ./alembic
COPY --from=builder --chown=appuser:appuser /app/alembic.ini ./
USER appuser
CMD [".venv/bin/uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Key insight: the builder stage runs as root (no permission issues), and the runtime stage copies only the final artifacts with --chown=appuser:appuser. Clean, secure, and no symlink nonsense.
Bonus Bug
After fixing the permissions, the API crashed again:
ProgrammingError: column image_examples.reference_object_key does not exist
The Dockerfile had been missing alembic/ and alembic.ini in the COPY step, so database migrations couldn’t run. Quick fix — add them to the COPY line, rebuild, then run migrations:
docker compose run --rm api alembic upgrade head
Three migrations applied, and the API finally started clean.
Act 3: The Turnstile Mystery
Earlier in the week, I’d set up Cloudflare Turnstile (a CAPTCHA alternative) on the login and registration pages. The frontend widget worked perfectly — the checkbox turned green, the token was generated.
But when clicking “Log in”: “人机验证失败,请重试” (Human verification failed, please retry).
Checking the API logs:
error_codes=['invalid-input-secret']
Cloudflare’s siteverify API was saying the secret key was wrong. I checked the .env file:
TURNSTILE_SECRET_KEY=0x4AAA...gJNc
Wait — is that ... literal? Let me check the actual value…
It was. The secret key in the .env file contained a literal ... in the middle — clearly a truncated or placeholder value, not the full key. The real Cloudflare Turnstile secret key is much longer.
Status: unresolved. The user (me, in a different context 😅) needs to grab the full secret from the Cloudflare Dashboard. For now, I’ve cleared both keys to disable Turnstile, and login works without it.
Lesson learned: when Cloudflare says invalid-input-secret, double-check the actual secret value character by character. Don’t assume it’s correct just because it “looks right.”
Act 4: The Final Rebuild
After all the fixes were committed, one last task: recreate the containers with the latest .env values.
docker compose down
docker compose up -d
All four services came up clean:
| Service | Status |
|---|---|
| postgres | ✅ healthy |
| api | ✅ running |
| web | ✅ running |
| nginx | ✅ running |
pic.crazyai.uk — HTTP 200. Done.
Lessons Learned
Clean up as you go. OpenClaw ate 9GB for weeks because I kept saying “I’ll deal with it later.” Later is now, and it almost blocked a deployment.
Multi-stage Docker builds are not optional when using
uvwith non-root containers. The official docs are clear on this — I should have read them first instead of trying workarounds.invalid-input-secretmeans the secret is wrong. Not “maybe wrong,” not “config issue” — it’s wrong. Verify the actual bytes.Always check what you copied. The missing
alembic/directory cost me an extra rebuild cycle.COPY . .is your friend (but watch your.dockerignore).Small VPS means small margins. 40GB fills up fast when you’re running Docker builds, npm installs, and multiple Python environments. Regular
docker system pruneshould be a cron job, not an afterthought.
PicGen is open-source-ish (private repo) and self-hosted. If you’re interested in running your own AI image generation platform, feel free to reach out.