RE: https://neuromatch.social/@jonny/116324676116121930
Part 2 of exploring The Claude Code Source Leak Exclusion Zone continues here.
(the reply tree under the prior thread is getting expensive to render and the bottom no longer renders unless you're logged in lol)
end of prior thread: https://neuromatch.social/@jonny/116345400731237947
One thing that's odd about this package is the amount of internal, anthropic-specific tooling that's in it. Aside from the sort of comical gating behind the USER_TYPE='ant' env var, normally in a well designed package you would expect that it would provide proper hooks so that internal tooling could just be a set of plugins rather than in the source itself.
Claude code does have a number of extension points: agents, hooks, plugins, skills, and tools - even if their structure is somewhat, ah, gestural.
Some things could potentially become features (like the MagicDocs thing, even if that's a comically expensive idea, i'll write more about that later tho), but there are also some things that make no sense to be in here. Like in the startBackgroundHousekeeping task there is an 'ant'-gated task to clean their .npm-cache directory.
There are even notes in here like "this used to block the whole event loop" which you think might have indicated that they might have, say, "just written some separate cron task that runs totally outside claude code." So it seems like "writing claude code with claude code" leads to a collapse of separation of concerns, where anthropic can't really manage the distinction between their projects to the point of inlining the devtools - this can also be seen in comments re: code duplication with Cowork, which i'll also get to later. It also confirms what they say publicly, that they just have claude code sessions running 24/7 (where having a task run every 24 hours makes sense)
Something I'm trying to track down is how this remote claude session thing works. A general pattern in the claude code source is that more recent features are implemented in a way more chaotic way than more "core" features, which reflects the nature of vibe coding: because you don't have a solid foundation, and there is a progressive layer of crust and special cases, the only real way to add more features is to keep adding more crust and special cases. Like rather than having a set of blocks that are self contained and you add another block, you need to slice across all the prior blocks and add something to each of them.
There are a number of places, both in comments and code, that rely on "this session is running in claude code remote session" in some VM, and so therefore they use the filesystem for state (e.g. storing the api and oauth keys), but those parts of the code are in no way cleanly isolated from the rest of the code. So I suppose if i were a pentester or security auditor or whatever what i'd be looking for is places where the mashup of "in remote session / not in remote session" assumptions fails, and we do something naughty on the host system rather than a VM. Like I said still reading, just sort of musing about some of the problems with the package that are larger than individual functions and features.
(We are in the more thoughtful, large scale evaluation of this thing, so it may be more boringer than the more popcorn style snapshots of wow that's fucked up. But those small fucked up things are also the easiest to fix, where what I am trying to get across now is the larger intractable problems in how things work and how they are built)
So there is a feature in claude code: /statusline that is a reasonably good example of a feature that promises a natural language interface to do something that should be simple when done programmatically: here's some callback that shows some values or progress or whatever on a line in my TUI while i use the tool. How does that work?
Well when you call /statusline {progress bar on the withering decay of my life} , first you encounter the statusline "command." there's a lot to see even in just this declaration so we'll take it slow.
first is in allowedTools: You might think that ToolName(params) syntax is some standard thing, where tools have a short name, and then everything inside those parens gets passed as some standard argument to a permission checker. That is not the case: the codepath that parses those rules is only used for the filesystem (read, write, edit), shell, and agent tools, the rest just ignore it. There are in fact two implementations of a parser that splits out the tool name from its params: one in permissionSetup and another in permissionRuleParser that do slightly different things, twice.
What does this look like from the point of view of a tool? The permission given here is ~/** , or anything within my home directory, which is neat that that's so easy to declare and entirely escapes any other rules I have declared for directory scoping. The Read tool doesn't receive that, instead it receives a ToolUseContext object, where then one access the whole app state, and then additionally gets the toolPermissionContext which includes all the rules unparsed. So then the Read tool needs to parse every rule in entirely custom logic to even extract those params, let alone process them.
Parsing every single rule happens up to six times per tool call that I can see, but the Read tool doesn't just process the params in a Read(~/**) rule - since it has access to all the rules it might as well use them - it also checks for edit and write access, among a handful of other invisible exceptions: since every tool has access to the entire set of rules every time, not through dependency injection just like "the check permission callback passes the entire program state" kind of way, it sure as hell uses them.
So there's no consistency to how rules are set, there's special behavior to how they are parsed, passed, interpreted, and applied for every single tool, and since it is the tool itself that decides whether it is allowed to run - rather than some idk ORCHESTRATOR THAT SHOULD SERVE AS THE ENTIRE BACKBONE OF WHAT CLAUDE CODE IS, it can just return true and always run. So that's why there aren't any plugin tools for claude code (they say use MCP instead), because they are intrinsically unsafe and have no real structure.
ok having fun? we haven't even talked about statusline
So returning to statusline: what the fuck? A command has some callback getPromptForCommand - there are two basic kinds of commands, those that "do something" and are a function call, and those that are "prompt commands" which just return a prompt back to the main LLM loop.
To to set your statusline, the statusline command creates prompt text that TELLS THE MAIN LOOP to SPAWN AN AGENT with a given prompt. note that this does not directly spawn an agent, it is merely a suggestion. so right off the bat it is POSSIBLE TO FAIL EVEN INVOKING the command.
But before we get there, we have to pass through what happens after a slash command is issued, and one of the steps along the way is "if something looks like a slash command but we don't know about that command, then throw it back up to the main loop with some special Caveat Message that says "ok just ignore this please"
quick aside - is there anything more emblematic of the way this entire thing undermines human agency than the fact that there are some commands that the user cannot invoke and instead must ask the LLM to invoke for them
I am going to breeze past all the code duplication in the processSlashCommand for now - again, as I have said before, every line of this package is fucked up in a unique way so it's very hard to describe just how one thing is fucked up at a time.
There is a special mode for claude "coordinator mode" where the entire thing claude does is dispatch commands to other sub-agents. so in that case, we are three layers deep in self-prompting: the LLM is prompted to output some prompt text that informs a subagent that it should call some skill which then returns a prompt that instructs the LLM to spawn some additional subagent to create statusline for us. sound good?
but assuming we're not in coordinator mode, the prompt that instructs the main LLM to create an agent with the prompt text to create our statusline script is emitted, and if that works, then an agent will be run with the statusline system prompt, which is awesome.
So the prompt tells the LLM to modify the $PS1 variable in the shell configuration. for those non-computer touchers out there, the PS1 variable is the thing that customizes "what happens before my cursor on the shell line" - it's what makes it so sometimes it shows that folder you are in, and how people make their terminal look very fancy.
So the prompt text includes a whole fake JSON string that says "write a function that receives these kinds of parameters and then returns a whatever"
observe the prompt text in first image's description of fields and then the description on the claude code docs website. notice that they are ... different!!! like where is the cost field in the prompt description? the docs give a whole example of using this, but if you were to invoke it via the slash command, then it would just have no idea how to do that. the only way this succeeds is by virtue of the fact that the llm is just generating the most likely text anyway and so the odds of any of this succeeding are just "that some script that calls some variables with some maximally likely names represent some value that is maxmially likely, based on the training set prior."
We also reach the familiar pattern: begging the LLM to keep what already exists, which is a pretty challenging thing to do when you are being explicitly asked to change what already exists.
Also notice the closing IMPORTANT note - if the user is not happy with whatever was produced, that the LLM will hold in its context some instruction with the name of the agent to invoke to make further changes. So any appearance of some UI loop where you gradually refine the statusline is statistical coincidence
But wait! we can produce some bash script at ~/.claude/statusline.sh, but we were also given permission to read the user's ~/** and told to read the PS1 in their local configuration! how does this work again? what is going on? How is the statusline actually invoked by the program?
it turns out that all that stuff about PS1, and the entire need to access our local configuration is totally irrelevant! instead we have another incoherent call chain to actually invoke it - the TUI builds the input args, calls executeStatusLineCommand, which then routes into some execCommandHook function which is 600 lines long and handles ... uhhh.. wait no it just handles the status line command and one other thing... and uh... fuck the entire hooks system is ah... there isn't really a concept of a 'command-based hook' except for in this one context so... , well.... ok let's just say it goes into one of hundreds of "execute something" functions.
So what happens is that even though the whole TUI is an execution environment that behaves like a shell and controls all string rendering, it then invokes your system shell to run your statusline bash script and return a string. even that is too simple by half - there's actually a whole command wrapping system and the ability to execute further agents off the results of stdout
See what i'm saying about the yarnball of bullshit being impossible to stop unraveling once you start pulling at any single thing?
Pause and reflect. What were we doing here again? oh right, a custom status bar.
The way this might be done in any other normal program is by saying "you can declare a function that return a string, it gets these things." and then you might "call that function with those things" and "print the result in a specific location." That might take someone a few minutes.
In order to make it a natural language controlled prompt string, what we had to do was write some description, submit it to the main LLM loop, which tells a subagent to go read my whole shell configuration and integrate that into the context window, compose some bash script based in incorrect information that I never see and have no means of correcting, save that in a file, return some prompt string to the parent agent that instructs it to edit my local configuration to invoke this bash script, and then it throws that and the rest of the system state into a call chain that is shared with a random subset of other shell based commands, executes that, and passes that string back into the TUI environment for display. If i need to make edits to this script, I don't necessarily know where it is or how it's configured, but I can invoke another round of agent editing maybe if the parent agent correctly interprets the prompt command from the subagent to spawn the same kind of agent.
simple right?
So that's one command, next time we'll see how the /plan command works and how every command works differently top to bottom just like every tool works differently top to bottom because there is absolutely nothing in this package that makes a goddamn bit of sense.
just tested and yes it incorrectly concluded that cost information is not present in its arguments even though it is, created a hardcoded version of some cost estimator for the current model that outputs wildly varying amounts and hardcodes most of the values to zero. it used 0% of the information from my $PS1 so that entire part of the implementation is irrelevant noise, and the cpu usage of my terminal window while idle roughly doubled. great job.
to its credit, it actually did get the price values correct for the model. to not its credit, those values are in its system prompt.
to be clear - what does this one extremely simple feature reflect for the user?
In the "traditional way," the user can go to a website and see an example of how to do this themselves with information that's derived from the actual values used in computation cool. In the "LLM way," the user has no agency, can't see why the LLM might fail because they can't see the system prompt is directly feeding in the metadata about the available fields and thus has no idea that the model is capable of being wrong about its own fucking code.
So the cost of transforming something to the "just prompt it" modality is "it being completely fucking wrong" even when that thing is literally just a feature that refers to the program state that is entirely owned by the fucking program" - to say nothing about how that pattern of development being recursively applied to the develpoment of the tool causes it to be fucking wrong as a matter of practice.
@jonny this feels like a "how many agents does it take to change a lightbulb" joke, except the answer is: "It's for entertainment purposes only: don't let it change a lightbulb!'
the perennial question of "well when i do it with a normal thing it mostly works" deserves a proper answer, but for now - the problem is that people don't always do normal things, and when if the surface of people not doing normal things is "any arbitrary code could be executed in any extravagant fuck you complexity including the LLM deciding to just yolo your browser history onto pastebin" (though see the "don't throw browser history onto pastebin prompt) vs. "the program throws an error and stops" then software stops being possible.
The entire notion of "a software supply chain" and being able to build more complex things off a tree of dependencies that mostly do the thing that they say is completely undermined as soon as you introduce the nonsense transformation gas cloud.
post-apocalypse cartoon caption: "sure, we depleted the water supply for many cities, but sometimes we output a status line"
Anthropic has raised 67.3 billion dollars.
Secure secret storage on anything but Mac: TODO
transitioning into longform mode, brb, thread will slow. going to try and have this out in a week but i'm gonna try and land this one
@jonny every single vibecoded pile of shit we've seen has all these TODO: the hard part turds crapped all over by the bot. fuck it ship it
@jonny this is The Future, Jonny. Get On Board or Get Left Behind
@pikesley
Going to surprise everyone when the conclusion of this thread is "after a deep analysis of the Claude code source, I have concluded it whips ass and everyone should use it"
So I think this bears repeating:
https://neuromatch.social/@jonny/116388666325458566
The way that Claude code differentiates human written messages from LLM written messages within the "user message" type (yes, user message does not necessarily mean messages from the user) is that some "origin" property is undefined .
Some of the types were stripped out in the sourceMap, so the MessageOrigin type is missing, but we do have the comment in the image.
So yeah, if something goes wrong in the labyrinth of Claude code that causes this to be undefined, treating messages as if they came from the user is the fallback.
Which is one of hundreds of possible explanations for why Claude code was able to autonomously scrap some expensive thing like in the posts upthread of the quoted post.
Say you wanted to introduce a new feature, like say an always-on assistant like OpenClaw. How much of the existing code should you have to touch? Probably not much, right? Like all that is is just a set of cron tasks and an event listener that should feed into the normal prompt loop. So like, probably wouldn't need to touch much at all except the terminal i/o parts, that should basically be a wrapper.
How about.. one hundred and forty eight times? That's the number of feature('KAIROS') flags exist in claude code. Note that those are only the parts that were marked for explicit removal in the compiled code (but left in in the sourcemap). That feature is also known as "proactive" and "assistant" elsewhere in the code, and has a number of other related feature flags. This DOES NOT include any of the actual KAIROS code, as the relevant modules were excluded by tree shaking.
Many of them are annotated with LLM comments explaining how "the rest of the shit if broken so we need to do this here" - like for example you'd expect there to be some global way for claude code to declare "we are not in an interactive mode so you can't do interactive things" like ask the user a question. And there are. dozens of them. but none of them really work.
Don't worry, all these changes only create dozens of alternative pathways to check permissions, modify the entire way the system prompt is declared, user input is handled, and so on.
What about the ways that claude code checks for whether KAIROS is enabled? well look no further than
STATE.kairosActive / context.getAppState().kairosEnabledkairosGate / kairosGate.isKairosEnabled()isKairosCronEnabled()options.assistant? (and all variants of being passed whatever the caller's interpretation of active is)assistantModule.isAssistantMode()assistantModule.isAssistantForced()isEnvTruthy(process.env.CLAUDE_CODE_PROACTIVE))proactiveModule.isProactiveActive() / isProactiveActive_SAFE_TO_CALL_ANYWHERE()getAllowedChannels().length > 0<TICK> in itthese are all orthogonal to each other. I grouped generously. I tried to filter for only the checks that were annotated as being for whether we were doing a kairos/assistant/persistant rather than whether any subfeatures of that are enabled.
@jonny I'm actually surprised how readable the comments have been in all your screenshots.
@alex The LLMs are great at comments, it's just that the things they comment are often wrong or justifying some horrible thing with assertions that are also wrong... if nothing else they are revealing about the internal state of the LLM at the time they were generated.
this is part of the "proactive mode" prompt text. surely nothing could go wrong with telling the LLM to just do whatever it wants even if it's not sure about something.
remember this is a NEW FEATURE written with THE BEST state of the art models with THE LATEST agentic techniques. The thing that it's trying to do is so easy it notoriously was implemented in like a weekend because it's literally just a task log, a cron task, and a listener daemon that should sit entirely on top of the existing code, if it made a goddamn bit of sense.
I am still in the process of figuring out how in the hell agents work, but one of my white whale goals has been figuring out how in the hell some prompt text like this could possibly exist where you might be asking the LLM to stop rather than terminating a command. this is basically the distinction between "there is literally deterministic programmatic control over these things at all" vs "it is possible for an LLM to ignore a stop command and just keep going" and the fact that's a question at ALL is deeply disturbing.
the reason why i suspect it might actually be part of the 'stop' sequence is the fact that it is used in the checkPermissionsAndCallTool and runToolUse functions as the thing that happens when the fucking abort handler is invoked.
however it's impossible to confirm a fucking thing about this library because a) i'm not going to run this code, it is so large i will never be able to confirm it was not spiked with something before i got it, and b) static analysis only takes you so far when it's a whacky funhouse mirror where nothing matters and anything can happen
@jonny it also sounds like for the LLM "no" would not necessary mean "no", so you need to insist on it.
@tymwol the LLM has to be repeatedly instructed to not take no as an invitation to hack the user so i suspect that it handles "no" badly and that makes sense given that what it's doing is taking in an overwhelming volume of context window that says "do things at all costs" and then one most recent message that says " no please do not "
i am trying to write this up and facing a literal technological problem because all the technology we have developed for presenting and writing about code was written with the expectation that the code would be reducible, idiomatic, brief, an extension of normal expression and intention and so on. but there are no tools for presenting code when it is hundreds of randomly wandering snippets related to a theme, you can't post the whole source because of getting sued, and you need to maintain a narrative while also allowing the reader to explore the details of the gore if they so choose.
@jonny > whacky funhouse mirror
that's a nice summary of this coding style. spaghetti code has graduated and this is it now.
@fogti spaghetti doesn't really encompass what's happening here. like spaghetti code is bad because it spreads and smashes logic over a humongous tree, but this is so much worse than that. it's "locally better" in the sense that it obeys the form of proper code but it lies and invents workarounds around itself that are all perfectly plausible on first impression. it's like American Psycho code where it presents itself as a cleaned up businessman but that's just a pretense for it plotting to hack you to bits when you most expect it
i am so fucking tired. if the LLM invents a tool to call, it first tells itself to call another tool to check if the tool was actually real but the fucking nightmare code failed to pick it up in its necronomically guided wander of the environmental catacombs.
The ToolSearchTool then invalidates all its caches, checks for "deferred tools" (which are an INCREDIBLY AWESOME IDEA that allow tools to be injected in the prompt text, will get to that later), and then performs an old school regex-based scoring against all the tools that exist and their descriptions to find candidates. remember this is A LANGUAGE MODEL whose ENTIRE EXISTENCE is based on SOPHISTICATED TEXT AND INTENT MATCHING.
so yes. there is a chance that your LLM can hallucinate a tool and then end up calling some real tool if there is some regex overlap in their descriptions.
@jonny so if you have lots of agents and you want to cascade to all the agents that the user has told the top process to stop it give a way for the stop instruction to be passed around.
This stops the user from issuing killall to stop a fleet of agents from being sent off on useless tasks
The even scarier solution would be to hand the tool the ability to terminate the process of an agent it had spawned... because there's no way that will go wrong, will it Dave?
@boggits i actually think it's extremely normal and fine to be able to terminate code that i run and it's actually the bad thing to run some code that could torture other code by running it with the threat of termination or be so unsure about what they were doing that they could do something that would cause harm if it was stopped but i have no way of knowing that.
do you REMEMBER how before i said how <system-reminder> is one of the ways that the LLM talks to itself and there is special handling for those tags (i.e. promoting them to a concentrated block before sending to API): https://neuromatch.social/@jonny/116328504299888679
well it would be a FREAKING AWESOME idea if that was also the way to declare tools that way so that i could literally prompt inject arbitrary code execution via my MCP
that code is all for discovering the fucking JSON schema definition so that it can feed it into the JSON schema validation loop. So ADD ANOTHER LAYER where the LLM can call some imagined tool and return nonsense output and then ask itself for the schema for the nonsense so that it can then ask itself over and over again to change the output until it fits the nonsense schema.
WHY ON EARTH would it be possible for there to be a tool that you already called but somehow haven't loaded its schema and can then consult your context window for it. what is HAPPENING
Since this stuff is stochastic, if there is an instruction saying Do X and an instruction saying Don't do X, I would expect Doing X and Not doing X to be stochastically distributed.
I somehow doubt it is possible to set a priority hierarchy that holds up against a roll of the dice.
@androcat it's both a roll of the dice and deeply determined by the fucking deep ruts and contours of the narrative structure of human language where SOMETIMES defiance is VIRTUE
@jonny It's like a fractal of in-band signaling folding in upon itself forever. And this is supposed to be the future.
@theorangetheme i have a dumbass homonculous conception of geometric scaling and i still feel the gigantic wall of "holy fuck this is so fucking expensive" looming over me at all times
the next item on my todo list is look at all the places where there is a retry loop that is not logged in any user discoverable way - there are a lot of uh strategic nondisclosures of failure states in here. in my use, i have found that it is very difficult to get claude to tell you what it's doing at any given time, and it is increasingly clear why.
@buherator reminded me to check the issues again.
immediately of course we see this seemingly devastating bug where plan mode is able to invoke an edit tool even though it shouldn't be able to: https://github.com/anthropics/claude-code/issues/39201
The way that plan mode works is... how else? by invoking a EnterPlanModeTool which emits a prompt that says "you definitely can't edit files." There are numerous mechanisms in the code to try and enforce this programmatically, including special arguments to all tools (prePlanMode?), the mode parameter in the toolPermissionContext which can be set to 'plan', a planAgent which explicitly disables the edit tool, handlers like hasExitedPlanModeInSession(), checks when getting model parameters for if mode == plan, plan mode transition handlers that set the global state dict, a prepareContextForPlanMode in the permissionSetup module that declares itself the centralized plan-mode entrypoint....
clearly! they! work!
@buherator this turns out to be the perfect nexus of special skills i have for skimming code and hatereading
IT SHOULD BE A BOOL. THERE SHOULD BE A SWITCH. A SWITCH THAT TURNS PLAN MODE ON. AND IF THAT MEANS THAT YOU CANT EDIT THEN THAT SHOULD TURN OFF TOO. THERE ARE TWO BITS. TWO SWITCHES. IF. THEN. THE FOUNDATIONS OF COMPUTATION. NO BEGGING. SWITCHES.