AI Sandboxing

cbrake · March 17, 2026, 3:45pm

One thing that really slows me down is all the questions Claude asks.

I’ve been pondering running in a Docker container with –dangerously-skip-permissions and only mapping in the currently working directory plus maybe the ~/.claude directory.

Any thoughts on this?

cbrake · March 17, 2026, 3:46pm

OpenShell

OpenShell is a Nvidia project that provides sandboxed environements.

bminer · March 18, 2026, 12:31am

Since AI agents are “the future”, it’s probably a good idea to experiment with them in a container or even on a separate physical box with access restrictions.

Even when I use Copilot CLI, I always feel a little apprehensive that it can execute read-only commands like grep without prompting me. This invites security vulnerabilities that are yet unknown. But, also please stop prompting me, haha!

bradfa · March 20, 2026, 10:33am

The key is the agent harness. Harnesses which provide the agent with proper safe tools (ie: shelling out to real grep (or any other shell command) is not safe, you can trick the model into injecting a stray semicolon and even with human review this might not be noticed in a very large command string) to explore and then limited directory traversal permissions for edits. I think the permissions and sandboxing systems are still very young in terms of development, they have been slowly getting better over time and Claude seems to be significantly more capable at doing the exploration-type tasks now than a few months ago.

Another key safety solution is to prohibit the agent from doing web fetching or web searching. Both present prompt injection attack surfaces where automated operation is at risk.

Sandboxing the agent more fully inside a container, VM, or dedicated machine still only gets you so far. The harness itself needs to properly allow you to cut off other risk vectors.

For example, Facebook just made the news because an agent posted incorrect information to a company internal message board: Meta AI agent’s instruction causes large sensitive data leak to employees | AI (artificial intelligence) | The Guardian

cbrake · March 20, 2026, 6:54pm

From what I’ve read, this seems like what OpenClaw intends to do - provide security at the harness layer. While most people criticize its security right now, OpenClaw (or something similar) will likely be the only path that provides security in the end.

I’m puzzled why AI agents need to shell out so much instead of using built-in functionality. On one hand, this provides a lot of the flexibility, but is also the weak point. They are using LSPs more for code navigation, which seems positive.

bradfa · March 20, 2026, 11:45pm

I suspect it comes from the training data having lots of examples of using shell commands for the kinds of scenarios that the agent encounters, so although the system prompt provides alternative tools (and likely guidance to use the alternative) they still fall back onto “bad habits” from training data.

I’m curious how hard it would be to implement a Claude Code hook to reject the bash tool when it’s trying to call grep or similar and redirect the agent to use the built-in tool. It’s not like the agent is going to “learn” anything from repeatedly having it’s attempts at shelling out fail, but maybe some back-end Anthropic logging might notice and do something about it?

This also comes back to my thoughts that fine tuning a model with transcripts of well behaved agentic tool usage and how much that would really help. Could this unteach the model about shelling out and coerce it to better use the harness’s system prompt tools?

cbrake · April 23, 2026, 4:05pm

Bubblewrap sandbox

I’ve been experimenting with a bubblewrap sandbox so I can run claude with --dangerously-skip-permissions. It seems Docker is overkill for sandboxing. Using all binaries/tools from a Docker container is helpful, but it seems you still need to cache a lot of stuff for various build environments.

There are a lot of security issues to work through, but here is the current draft:

#!/usr/bin/env bash
#
# cld — run `claude` inside a bubblewrap sandbox with
# --dangerously-skip-permissions. The sandbox exposes:
#
#   RO:  narrow allowlist of /etc files, /usr, /opt,
#        ~/.gitconfig, ~/.local/bin, ~/.local/share/claude,
#        ~/.claude/{CLAUDE.md,settings.json,skills,
#        commands,plugins,.credentials.json}, ~/go/bin
#   RW:  $PWD (current working dir), the rest of ~/.claude,
#        ~/.npm, ~/.cache/go-build, ~/go (minus bin)
#   RW-ephemeral: ~/.claude.json is bound to a per-session tmp copy seeded
#        from the real file; writes are discarded on exit so a compromised
#        session cannot persist config (MCP servers, apiKeyHelper, trusted
#        dirs, ...) that would execute on the next claude run.
#   Net: shared with host (DNS via /etc/resolv.conf, CAs via /etc/ssl)
#   Env: --clearenv + explicit allowlist. Host env vars (API keys,
#        cloud creds, GITHUB_TOKEN, ...) do NOT leak into the sandbox.
#   All other home files (ssh keys, other repos, dotfiles) are hidden
#   behind a tmpfs.
#
# Residual risks (things this sandbox does NOT protect against):
#
#   1. Network exfiltration. --share-net is intentional so claude can
#      reach the API, but it also means any readable data inside — source
#      in $PWD, ~/go source/modules, and especially OAuth tokens in
#      ~/.claude/.credentials.json (which claude must be able to read) —
#      can be POSTed to an attacker-controlled host by a prompt-injected
#      session. There is no egress filter.
#
#   2. Localhost SSRF. --share-net keeps the host's loopback interface,
#      so anything listening on 127.0.0.1 on the host (databases, dev
#      servers with auth disabled, metadata services on cloud VMs at
#      169.254.169.254, other sockets bound to lo) is reachable from
#      inside the sandbox.
#
#   3. Git-hook persistence via $PWD. The project dir is bound RW, which
#      includes .git/. A compromised session can write .git/hooks/post-*
#      (or tamper with .git/config's core.hooksPath) so that the next
#      git command you run *outside* the sandbox executes attacker code
#      with your full user privileges.
#
#   4. Build/cache poisoning. ~/.npm, ~/.cache/go-build, and ~/go (minus
#      bin) are all RW. Malicious packages dropped into the npm cache or
#      tampered sources under ~/go/src / ~/go/pkg/mod will be picked up
#      by future host-side `npm install` / `go build` runs.
#
#   5. Claude-side persistence via non-RO paths in ~/.claude. Only
#      CLAUDE.md, settings.json, .credentials.json, skills/, commands/,
#      and plugins/ are RO-overlaid; the rest of ~/.claude is RW. A
#      compromised session can:
#        - create or modify ~/.claude/settings.local.json to inject
#          hooks that execute on the NEXT `claude` run outside the
#          sandbox (the ephemeral .claude.json only closes the
#          .claude.json path, not this one),
#        - tamper with session transcripts / todos / shell-snapshots
#          under ~/.claude/projects/ etc. to mislead future sessions.
#      Consider adding settings.local.json to the RO-bind list if you
#      ever create one.
#
#   6. In-sandbox process tampering. claude and any tools it runs share
#      a PID namespace inside the sandbox, so a prompt-injected session
#      can ptrace / signal / replace binaries reachable via its writable
#      mounts. The host is protected by the namespace boundary, but
#      nothing inside the sandbox is trusted relative to itself.
#
# What IS defended:
#   - SSH keys, GPG keys, other repos, and unrelated dotfiles are hidden
#     behind a tmpfs over $HOME.
#   - Host env vars (API keys, AWS_*, GITHUB_TOKEN, ...) are dropped by
#     --clearenv; only an explicit allowlist passes through.
#   - ~/.claude.json writes are ephemeral, so a compromised session
#     cannot persist MCP servers / apiKeyHelper / trusted dirs for the
#     next run via that file.
#   - ~/.local/bin and ~/go/bin are RO, so installed binaries cannot be
#     swapped from inside.
#   - --unshare-all (minus net) isolates PID/IPC/UTS/cgroup/user ns;
#     --new-session blocks TIOCSTI terminal injection; --die-with-parent
#     ensures the sandbox exits with the launcher.
#
# Usage: cld [claude args...]
#        Must be invoked from inside the project directory you want claude
#        to touch; that directory is bound RW into the sandbox.

set -euo pipefail

if ! command -v bwrap >/dev/null 2>&1; then
	echo "cld: bubblewrap (bwrap) is not installed" >&2
	exit 1
fi
if ! command -v claude >/dev/null 2>&1; then
	echo "cld: 'claude' not found in PATH" >&2
	exit 1
fi

cwd=$PWD

# Refuse to run at $HOME itself — binding $HOME RW would defeat the sandbox.
if [[ "$cwd" == "$HOME" ]]; then
	echo "cld: refusing to run with CWD == HOME (would expose entire home dir)." >&2
	echo "     cd into a project subdirectory first." >&2
	exit 1
fi

# Make sure RW bind targets exist so bind doesn't fail / silently create empties.
mkdir -p \
	"$HOME/.claude" \
	"$HOME/.npm" \
	"$HOME/.cache/go-build" \
	"$HOME/go/bin"

# Ephemeral ~/.claude.json: seed from the real file so onboarding/tips/
# recent-projects state carries in, but discard writes on exit.
claude_json_ephemeral=$(mktemp --tmpdir cld-claude-json.XXXXXX)
trap 'rm -f "$claude_json_ephemeral"' EXIT
if [[ -f "$HOME/.claude.json" ]]; then
	cp "$HOME/.claude.json" "$claude_json_ephemeral"
fi

# Env var allowlist. Only these pass into the sandbox; everything else
# (API keys, AWS_*, GITHUB_TOKEN, etc.) is dropped by --clearenv.
env_args=(--setenv SANDBOX cld)
for v in HOME USER LOGNAME SHELL PATH TERM COLORTERM NO_COLOR \
	LANG LC_ALL TZ \
	EDITOR VISUAL PAGER \
	HTTP_PROXY HTTPS_PROXY NO_PROXY ALL_PROXY \
	http_proxy https_proxy no_proxy all_proxy; do
	if [[ -n "${!v-}" ]]; then
		env_args+=(--setenv "$v" "${!v}")
	fi
done

# Narrow /etc allowlist (instead of binding /etc wholesale).
# Add files here if a tool inside the sandbox complains it can't find them.
etc_args=()
for f in \
	/etc/resolv.conf \
	/etc/hosts \
	/etc/nsswitch.conf \
	/etc/passwd \
	/etc/group \
	/etc/ssl \
	/etc/ca-certificates \
	/etc/pki \
	/etc/localtime \
	/etc/gitconfig \
	/etc/ld.so.cache \
	/etc/ld.so.conf \
	/etc/ld.so.conf.d \
	/etc/os-release \
	/etc/profile \
	/etc/bash.bashrc \
	/etc/shells \
	/etc/mime.types; do
	etc_args+=(--ro-bind-try "$f" "$f")
done

bwrap \
	--unshare-all --share-net \
	--die-with-parent \
	--new-session \
	--clearenv \
	--hostname cld-sandbox \
	--ro-bind /usr /usr \
	--ro-bind-try /opt /opt \
	"${etc_args[@]}" \
	--symlink usr/bin /bin \
	--symlink usr/bin /sbin \
	--symlink usr/lib /lib \
	--symlink usr/lib /lib64 \
	--proc /proc \
	--dev /dev \
	--tmpfs /tmp \
	--tmpfs /var/tmp \
	--tmpfs /run \
	--tmpfs "$HOME" \
	--bind "$HOME/.claude" "$HOME/.claude" \
	--ro-bind-try "$HOME/.claude/CLAUDE.md" "$HOME/.claude/CLAUDE.md" \
	--ro-bind-try "$HOME/.claude/settings.json" "$HOME/.claude/settings.json" \
	--ro-bind-try "$HOME/.claude/.credentials.json" "$HOME/.claude/.credentials.json" \
	--ro-bind-try "$HOME/.claude/skills" "$HOME/.claude/skills" \
	--ro-bind-try "$HOME/.claude/commands" "$HOME/.claude/commands" \
	--ro-bind-try "$HOME/.claude/plugins" "$HOME/.claude/plugins" \
	--bind "$claude_json_ephemeral" "$HOME/.claude.json" \
	--bind "$HOME/.npm" "$HOME/.npm" \
	--bind "$HOME/.cache/go-build" "$HOME/.cache/go-build" \
	--bind "$HOME/go" "$HOME/go" \
	--ro-bind "$HOME/go/bin" "$HOME/go/bin" \
	--ro-bind-try "$HOME/.gitconfig" "$HOME/.gitconfig" \
	--ro-bind-try "$HOME/.local/bin" "$HOME/.local/bin" \
	--ro-bind-try "$HOME/.local/share/claude" "$HOME/.local/share/claude" \
	--bind "$cwd" "$cwd" \
	--chdir "$cwd" \
	"${env_args[@]}" \
	claude --dangerously-skip-permissions "$@"