Codex CLI v0.106.0: installs get simpler, threads get realtime, and guardrails get sharper

By Edoardo L'Astorina in Development
Codex CLI v0.106.0: installs get simpler, threads get realtime, and guardrails get sharper

Release notes are often short. Diffs are often huge.

This is the missing middle: what actually changed in v0.106.0, in plain terms, and who should care.

This post is an experiment: we had our planning agent Themistocles read the tag diff and draft a first-pass summary. We edited it for clarity — but we didn’t do a full manual code review ourselves.

If you need ground truth, jump to the diff in Sources. If you spot anything off, tell us — that feedback is part of the experiment.

Given the success and interest raised by our original Reddit post, we’re turning this into a series.

Scope: 48 commits, 261 files changed.

This post is for developers and tool builders who:

  • use Codex CLI day-to-day, or
  • embed / integrate Codex via the app-server, or
  • care about operational safety (limits, sandboxing, audit trails).

TL;DR (what you’ll notice):

  • If you use an API key, gpt-5.3-codex is now available in the model list.
  • Installing is simpler and more repeatable (there’s an installer script as a release asset, and it installs rg too).
  • Integrators get a cleaner realtime story (thread-scoped realtime + a way to unsubscribe/unload threads).
  • Memory behavior is more configurable and less stale (new memories.max_unused_days + better forgetting/selection).
  • Safety/ops are more explicit: a hard user input cap with a clear error contract, plus sandbox and audit logging improvements.

1. Should you upgrade?

If you’ve been reading “what changed?” and still don’t know whether to upgrade, start here.

Upgrade to v0.106.0 if you:

  • use an API key and want gpt-5.3-codex,
  • onboard Codex to new machines often (or help teammates do it),
  • build anything on top of app-server (UIs, internal tooling, automation),
  • send large chunks of text to Codex (or accept user paste),
  • care about sandbox correctness and audit logs.

You can probably wait if you:

  • have a stable setup and you’re not using app-server integrations,
  • don’t rely on experimental features (realtime, JS REPL),
  • aren’t hitting limits or sandbox edge cases today.

If you’re shipping Codex in a product: treat this like any runtime upgrade. Test it in one workflow first, then roll it out.

2. Installing Codex is less of a scavenger hunt (macOS/Linux)

What changed: there’s now an installer script published as a GitHub release asset (built from scripts/install/install.sh).

What it does (plain English):

  • installs codex,
  • installs rg (ripgrep), because Codex expects it,
  • puts the binaries somewhere sensible (~/.local/bin by default, or CODEX_INSTALL_DIR),
  • and tries to make sure that directory is on your PATH (by editing a shell profile).

Why it matters: onboarding becomes a single, repeatable step instead of “install Codex… then also install ripgrep… then fix PATH… then try again”.

Windows note: the release points Windows users to install.ps1.

3. Two installer details that matter if you automate installs

If you’re using the installer in CI/devcontainers or a managed environment, there are two behaviors to know about:

  1. “latest” is resolved via the GitHub Releases API.
    If you don’t pass a version, it fetches the “latest release” metadata, reads tag_name, and installs that version (link in Sources).

  2. It installs from a platform-specific npm tarball asset.
    It installs from an npm tarball asset that includes the binaries for your platform (including rg). The filename is derived from the npm tag and the resolved version.

Practical advice: if you need deterministic installs, don’t rely on “latest”. Pass an explicit version and pin it.

4. Realtime is now thread-scoped (app-server v2, experimental)

If you integrate Codex, this is the headline change.

Plain English: realtime now has its own lifecycle per thread (think “conversation”), not just per turn (think “one request/response”).

New JSON-RPC requests:

  • thread/realtime/start
  • thread/realtime/appendAudio
  • thread/realtime/appendText
  • thread/realtime/stop

Notifications you’ll receive during a realtime session:

  • thread/realtime/started — includes { threadId, sessionId }
  • thread/realtime/itemAdded — non-audio items (forwarded as raw JSON while upstream schemas stabilize)
  • thread/realtime/outputAudio/delta — streamed audio chunks
  • thread/realtime/error
  • thread/realtime/closed

Why you should care: if you’re building a UI, thread-scoped realtime makes it easier to model “this thread is live right now” without mixing it up with your normal turn state.

5. thread/unsubscribe: stop listening, free resources

thread/unsubscribe removes the current connection’s subscription to thread events.

The response status can be:

  • unsubscribed
  • notSubscribed
  • notLoaded

If your connection was the last subscriber, the server unloads the thread and emits:

  • thread/status/changednotLoaded
  • thread/closed

Why this matters: it gives integrators a clean “we’re done here” path without forcing an archive/cleanup policy decision in the same moment.

6. Default mode can use request_user_input (behind a flag)

Codex can now gate request_user_input in Default collaboration mode:

  • Feature key: default_mode_request_user_input
  • Wired via CollaborationModesConfig { default_mode_request_user_input: bool }
  • Default is disabled (stage: UnderDevelopment)

Plain English: if you’re building an agentic UI, you can choose to let Default mode ask 1–3 tight questions via a UI control, instead of forcing Plan mode or writing a big block of follow-up questions.

7. js_repl is explicitly experimental (and checks Node versions)

  • Feature: js_repl
  • Experimental menu name: JavaScript REPL
  • Requires Node >= v22.22.0
  • Adds startup compatibility checks with user-visible warnings

If you don’t use the JS REPL, you can ignore this section. If you do: the big win is that “it didn’t work” becomes “it told me why”.

8. Memory changes: a clearer “freshness” knob

This release adds:

  • diff-based forgetting
  • usage-aware memory selection
  • a concrete config lever: memories.max_unused_days (default 30, clamped 0..365)

Plain English: if you run with memory enabled, you can bias it toward recent, relevant memories instead of carrying stale context forever.

9. A hard input cap with an explicit error contract

Codex now enforces a shared user text input limit across the TUI and app-server:

  • MAX_USER_INPUT_TEXT_CHARS = 1 << 201,048,576 characters (~1MB)

When you exceed it, the server returns a JSON-RPC invalid params error:

  • numeric code: -32602
  • data includes:
    • input_error_code: "input_too_large"
    • max_chars
    • actual_chars

Why you should care: integrators can preflight input size and return a friendly error locally instead of failing mid-request.

10. Reliability + safety: small changes that reduce “weird incidents”

The details here are less flashy, but they’re the stuff that keeps tools reliable when you run them in real environments:

  • Realtime websocket reliability: retry timeout-related HTTP 400 handshake failures; prefer WebSocket v2 when the model supports it.
  • Shell safety: a zsh-fork execution path that could drop sandbox wrappers (and bypass filesystem restrictions) is fixed.
  • TUI privacy polish: local file-link rendering hides absolute paths while still showing line/column.
  • Network proxy audit logging: embedded codex-network-proxy emits structured OTEL-compatible events (codex.network_proxy.policy_decision) with decision metadata, intentionally avoiding full URL/path/query logging.

The common theme: clearer limits, cleaner failure modes, and better audit trails.


11. Try it now (quick checklist)

If you want a fast sanity pass after upgrading:

  1. Install via the release installer and confirm both codex and rg are present on PATH.
  2. In the model picker / model/list, confirm gpt-5.3-codex is visible for API-key usage (and that the model(s) you expect are there).
  3. If you integrate app-server: start a thread, run thread/realtime/start, then appendText, then stop, then thread/unsubscribe; verify you see thread/closed.
  4. Paste an intentionally oversized message and confirm you get -32602 with input_error_code: "input_too_large" (and that your client handles it cleanly).
  5. If you previously relied on toggling “steer”: update assumptions — Feature::Steer is now removed as a flag but effectively always-on.

Conclusion

If you were struggling to tell what v0.106.0 actually includes: it’s mostly “make install easier”, “make realtime more usable for integrators”, and “make safety/limits more explicit”.

If this format helps, we’ll keep doing it for each release. Tell us what was still confusing — that feedback is the point of the series.

Sources

Edoardo L'Astorina

Edoardo L'Astorina

Edoardo L'Astorina has 15 years of experience in software development. Edoardo has had a major role in the new Transport for London site and led the development of the new Royal Opera House site. Edoardo has developed sites and apps for Intuit, Stint, JPC, The Crocodile and Miura Edoardo started Blu Frame to help companies develop sites that stand out, load fast and are easy for users to access Edoardo is passionate about risotto, Terrence Malick movies, Oasis songs and rowing