gstack Has 69K Stars — Is Garry Tan's Claude Code Setup Actually Worth Using?

69,000 stars on a repo created in March 2026 that has no formal releases. That number alone is enough to make any developer stop scrolling. Whether the star count reflects genuine utility or the gravitational pull of a high-profile founder's personal brand is exactly the question worth asking before you run that install command.

I spent time going through the repo, the commits, the structure, and the actual skill files. Here's my honest take.

What It Actually Does

At its core, gstack is a curated collection of Markdown-based slash commands for Claude Code. When you install it, you get 23 /.command files that drop into your ~/.claude/skills/ directory. Each one is a structured prompt that tells Claude to behave like a specific role: a CEO questioning your product decisions, an engineering manager locking down architecture, a QA lead spinning up a headless browser against your staging URL, a security officer running OWASP and STRIDE audits.

There's also a headless browser component — a compiled Bun binary called browse — that lets Claude actually navigate web pages, interact with elements, and capture screenshots during QA runs. That part is more than just prompts; it's real infrastructure.

The mental model is: instead of typing freeform instructions at Claude Code and getting inconsistent results, you invoke a named role with a defined methodology. /review runs a code review with specific heuristics. /ship handles the PR workflow. /retro does a weekly engineering retrospective. /cso is a security officer that audits your codebase.

The install flow is unusual — you paste a natural language instruction into Claude Code and let Claude run the git clone and setup script itself. It's a clever dogfooding move, but it also means your first interaction with the tool is trusting an AI to execute shell commands on your machine based on a README snippet. Worth being aware of that.

Why This Exists and Why Now

The timing makes sense. Claude Code landed as a genuinely capable agentic coding tool, but it ships with a blank slate. You get a powerful model and no workflow. Most developers end up improvising prompts, getting inconsistent output quality, and slowly building up their own CLAUDE.md files with project-specific context.

gstack is an opinionated answer to that blank slate problem. Instead of every developer independently figuring out how to structure AI-assisted code review, or how to prompt for security audits, or how to run QA against a live URL — you get a pre-built methodology you can fork and adapt.

The ecosystem gap it fills is real. There's a lot of tooling for running AI agents but very little for structuring how they work across a full development lifecycle. Most repos in this space are either thin wrappers or academic experiments. gstack is being actively used by its author in production, which changes the character of the thing.

Features Worth Calling Out

The role-based command structure is genuinely well thought out. The separation between /plan-ceo-review (strategic challenge), /plan-eng-review (architecture), and /plan-design-review (UX) maps to how real teams operate. You're not just asking Claude to "review this" — you're invoking a specific lens with specific criteria. That consistency matters when you're using AI tooling across multiple projects.

The browser integration is the most technically interesting piece. The browse binary is a compiled Bun application with a Node.js server component. It supports cookie injection for authenticated QA runs, tab session isolation, and — per recent commits — a 4-layer prompt injection defense for when Claude is browsing pages that might try to hijack it. That last part is not a solved problem in the industry and the fact that they're actively working on it is a good sign.

The /retro command is underrated. A weekly retrospective that pulls git stats — lines added, commits, net LOC — and synthesizes them into a structured review is the kind of unglamorous tooling that actually changes how you work. The README shows Garry's own retro numbers (140K lines, 362 commits in one week), and whether or not you believe those numbers, having a command that generates that report automatically is useful.

Multi-agent and multi-host support is broader than expected. The setup script auto-detects which AI coding agents you have installed and configures accordingly — Claude Code, Codex CLI, Cursor, OpenCode, Factory Droid. The OpenClaw integration for spawning sub-agents is particularly interesting if you're running orchestrated multi-agent workflows. This isn't just a Claude-specific tool.

The slop:diff quality check in the test suite. The most recent commit message references "AI slop reduction with cross-model quality review." There's a bun run slop:diff script that apparently runs a quality check on generated output. I'd want to dig deeper into what that actually does, but the fact that they're building automated checks against AI output quality is the right instinct.

Who Should Use This

Solo developers and technical founders who are already using Claude Code and want more structure. If you're spending time crafting the same review prompts repeatedly, this gives you a starting point that's been iterated on in production.

Teams adopting Claude Code where you want consistent behavior across developers. The team mode with auto-update is a reasonable solution to the "everyone has different CLAUDE.md files" problem.

People who want to understand how to structure agentic workflows. Even if you don't use gstack directly, reading through the skill files is educational. The methodology embedded in /plan-ceo-review or /investigate is worth stealing.

Who shouldn't bother: If you're not already using Claude Code, this adds nothing — it's not a standalone tool. If you have strong opinions about your own prompting workflow and a mature CLAUDE.md setup, the overhead of learning gstack's conventions probably isn't worth it. And if you're skeptical of founder-branded tooling in general, that skepticism is reasonable here.

Honest Concerns

This is almost entirely a one-person project. 204 commits from garrytan, 1 commit each from four other contributors. The "community security wave" commit that credits "8 PRs, 4 contributors" is doing a lot of work to make this look like a community project. It isn't, really. That's not inherently bad — many great tools start this way — but it means the bus factor is 1 and the roadmap is whatever Garry finds useful.

No formal releases. The version is in package.json (currently 0.16.2.0) but there are no GitHub releases. The install flow uses --depth 1 on main, which means you're always pulling whatever is on the default branch. The auto-update mechanism in team mode makes this more pronounced — your tooling can change under you silently. The "throttled to once/hour, network-failure-safe" description is reassuring but the lack of pinned versions is a real operational concern for teams.

The star count warrants skepticism. 69K stars with 0 gained in the last 7 days, created in March 2026. That trajectory — massive spike, then flatline — is a pattern associated with viral founder posts more than sustained community adoption. The fork count (9,641) is more meaningful and suggests real usage, but I'd want to see issue engagement and PR contributions before calling this a healthy open source project.

324 open issues with no triage visible. That's a lot for a repo this young. Without labels or milestones, it's hard to tell what's a bug versus a feature request versus noise.

The cookie picker auth token leak (CVE) patched in April is worth noting. When you're building tooling that handles browser sessions and auth tokens for AI agents, security surface area is real. They caught it and fixed it, but if you're using this in any sensitive context, audit the browser integration carefully.

The README is written by a founder doing personal branding. The Karpathy quote, the line count flexing, the contribution graph comparison — this is marketing copy. It doesn't mean the tool is bad, but it means you should read the actual code, not just the README.

Verdict

gstack is worth forking and adapting if you're a serious Claude Code user. The role-based command structure is a good pattern, the browser integration is genuinely useful infrastructure, and the methodology embedded in the skill files reflects real product and engineering thinking.

I would not adopt it wholesale for a team without auditing the skill files, pinning a version, and treating it as a fork rather than a dependency. The one-person authorship and lack of formal releases make it too unstable to trust as-is in shared workflows.

The star count is noise. The actual content is decent. Use it as a starting point, not a finished product.

View the repo on GitHub → garrytan/gstack

gstack Has 69K Stars — Is Garry Tan's Claude Code Setup Actually Worth Using?

gstack Has 69K Stars — Is Garry Tan's Claude Code Setup Actually Worth Using?

What It Actually Does

Why This Exists and Why Now

Features Worth Calling Out

Who Should Use This

Honest Concerns

Verdict

More Reviews