@cR0w ugh. this is how “they”’re trying to defend “agents” https://www.philschmid.de/why-engineers-struggle-building-agents
@hrbrmstr "Agent engineering" is not engineering. Wild how they try to treat deterministic behavior as outdated when it's what all engineering relies on. We're all so screwed.
@cR0w @hrbrmstr christ on a cracker. This isn't just absolutely fucking not engineering. This is shit that is doing catastrophic, long lasting damage to vast swaths of critical infrastructure.
Like, this is not shit actual senior engineers can just fix our way out of. It's unfixable. The only solution is basically starting over from scratch. Which costs too much. So we'll be forced to try and keep the wheels on a bus that doesn't even have axles.
We're so fucked. So completely fucked.
Someone said the phrase three ifs in a trench-coat yesterday and it just resonates so hard whenever I read these articles.
The transition from deterministic systems to probabilistic agents is uncomfortable. It requires us to trade certainty for semantic flexibility. We no longer know and own the exact execution path.
reeeee

@hrbrmstr is this a satire? Because it sure reads like one 
@creativegamingname @rootwyrm @hrbrmstr Funny how it's the people who can't make things do what they want them to do who are trying to convince us that's it's good, actually.
@hrbrmstr @cR0w saw this come up on the orange site. Not even sure where to begin. I feel like it argues against itself. It seems to admit that there still needs to be a deterministic real API somewhere. Or is my subscription state just going to be vibes? Or maybe we'd be better off if the banking system was some LLM context window.
So then is the argument that it's better for user facing interaction? Non determinism in this space doesn't seem great either. Clients ask my cool new SaaS app for something, it lies, and now either they're frustrated or I'm on the hook to provide something the LLM made up.
@rootwyrm @cR0w @hrbrmstr I believe that point of determinism in programming is to eliminate ambiguous results.
There’s nothing ambiguous about stopping and starting a train; transferring money; or making any decision that affects human life or safety.
Sure fees like a mistake to give control of everything to AI agents, but clearly it’s not gonna stop.
@cR0w For real. We test employees for drugs, qualifications, loyalty, etc. But the managers/leadership/investors are allowed to express thoughts and opinions in areas they don't understand, have never understood, and in some cases have actually been proven wrong.
But they have money! 💰
OH! SWEET 💰 !
💰 doesn't buy happiness. But it will absolutely soften the edges of right/wrong.
@creativegamingname @cR0w @hrbrmstr oh, people saying that shit, I have legal and moral rights to hit in the face with a large cinder block at Mach 4+.
Motherfuckers, we have HAD this shit, except done CORRECTLY for over 25 YEARS. It's called eventual consistency! Except guess what? We know what steps MAY be taken, exactly. And get a DETERMINISTIC result.
Gods, we are SO fucked. Can I just retire? Or just jump off a fucking cliff? I'm no longer picky.
@hrbrmstr @cR0w Laughing out loud at "trust, but verify." That does not mean what they think it means.
None of this shit works. Junior coders just don't have the experience to know that or care of it's wrong because they were raised on "move fast and break things."
In the end, this will all come to a full and complete crash and we'll be back to normal. It won't be gentle. But there are far more people who don't want this than people who do. If Windows 8 is any proof, that will show up in numbers eventually.
@FritzAdalis @wall_e @hrbrmstr Vibe inspections. Which is basically Boeing the last couple decades anyway.
@cR0w at least they are consistent?
@creativegamingname @FritzAdalis @wall_e @hrbrmstr We might not be thinking of the same Boeing then.
@bluestarultor @hrbrmstr @cR0w
The metaphor of driving across the sidewalk because it seemed faster seems like it belongs in an article arguing against using these systems. I'd rather have the deterministic system that doesn't mow down pedestrians.
@cR0w @FritzAdalis @wall_e @hrbrmstr
Well, I mean... if 10 out of 10 planes that take off burst into balls of fire.
They consistently explode?

@scottwilson @rootwyrm @cR0w @hrbrmstr Not to be that person, because obviously the overall point is super correct.
I just keep seeing people say that LLMs are inherently deterministic. But you can actually make LLMs deterministic. Set the temperature parameter to 0 and they will produce the same output each time in my experience.
That parameter determines how much to shift the softmax distribution that the final token is selected from. Set it to 0 and you always get the most likely token.
@trashpanda @scottwilson @rootwyrm @hrbrmstr There's a difference between apparently consistent and deterministic. If it is truly deterministic, how is it any different than any other algorithm?
@cR0w @trashpanda @scottwilson @hrbrmstr because the very millisecond you feed it any variance in the training data whatsoever, your 'temperature 0' "model" now outputs something completely different.
They are not deterministic.
@trashpanda @scottwilson @rootwyrm @cR0w nope.
while it does make the model's sampling deterministic, you are very likely gonna get different results on different machines due to floating point weirdness.
diff quant schemes (GGUF, GPTQ, AWQ, etc.) will also produce different outputs even at temp=0, because they're approximating the original weights differently.
push context size too high and it can also have an impact on the chosen probabilities even with temp=0.
these things rly suck.
@rootwyrm @cR0w @trashpanda @scottwilson so many folks had rly bad days/weeks this year due to OpenAI (et al.) retiring versioned models b/c even temp=0 did not save them. that's an insane setup to base any non-joke process on.
@hrbrmstr @cR0w @trashpanda @scottwilson like I said; single sentence change in a 'temp=0' without even changing the versioned model and it blew things up.
Anyone claiming they're even capable of determinism is on par with insisting LLMs are going to become sentient AGI.
It's a completely idiotic setup to base anything that actually needs to function on. Since it gets the answer wrong more than 25% of the time, best case. Much less something that needs to work reliably.
@hrbrmstr @cR0w @trashpanda @scottwilson but hey! At least we're gonna get a bunch of gigawatt scale data centers out of it!
Hang on, I've just been informed that there is no demand without the LLM bubble.
I'm also hearing that they're letting the LLM design and do safety checks on it.
... okay so we're getting a bunch of structurally unsound fire hazards and nuclear accidents out of it.
@rootwyrm @hrbrmstr @trashpanda @scottwilson With the end goal of pushing out and taking over for existing utilities.
@cR0w @rootwyrm @hrbrmstr @trashpanda I can see it coming…
“Nobody knows how it works, so just give SkyNet full control, it’ll be fine.”
@scottwilson @rootwyrm @hrbrmstr @trashpanda Imagine a few massive PG&Es being run by Copilot, Gemini, etc. and you'll see what some big tech companies see as their future.
@rootwyrm @hrbrmstr @cR0w @trashpanda @scottwilson
Don't forget all of the CO2 and environmental toxins from building a fuck ton of commercial real estate (buildings) that absolutely no one needs and that are designed in a way that would make them extremely difficult to convert into residential housing (or any other usable infrastructure)
Best case... a small subset of them become battery storage for renewables and the rest fall apart, poisoning the communities that they're being built in
@rootwyrm @hrbrmstr @cR0w @trashpanda @scottwilson
It's not an accident when the corp lets it melt down because it's no longer profitable.
@hrbrmstr @cR0w >The Trap: An agent might take 5 minutes and cost $0.50. If step 4 of 5 fails because of a missing or wrong input, crashing the whole execution is unacceptable.
>
>An error is just another input. Instead of crashing, we catch the error, feed it back to the agent, and try to recover.
"The agent takes five minutes to complete a query or task, costs 0.50 USD per run, and will take a guess at what to do when it fails."
I think we might be mistaking which part of this arrangement is unacceptable.
>Agents require verbose, "Idiot-Proof" semantic typing (e.g., "user_email_address" instead of "email") and highly descriptive docstrings that act as "context".
Wasn't the entire promise of NLP the capacity to process... natural... language? Requiring highly specified systems for the agent but tolerating pRoBabiLiSTiC output from the agent is the worst of both worlds.
This blog post has to be bait.
@deliverator @hrbrmstr @cR0w “Welcome to Vibank, how may I help you today?”
“What’s my account balance?”
“You have $21.57 in your checking account.”
“What?! No, that can’t be right, my paycheck just landed this morning!”
“Oh, you’re right. You have $1,300,156.50 in your account.”
“Huh? No, that’s silly. Just tell me the real balance.”
“You owe us $150,000.”
@creativegamingname @rootwyrm @cR0w @hrbrmstr
And, like, it's still possible to do engineering with probabilistic systems but it requires a *theoretical framework for how the system operates* which is the real issue. These LLM systems are impenetrable black boxes, which makes it impossible to know what the risks and failure modes are.
@malcircuit @rootwyrm @cR0w @hrbrmstr
When I read that engineering students were using AI's I just started laughing. People think it's funny when they see weird images and hallucinations from bots. I imagine they'll think it's much less funny when their elevator decides it has arrived, their car doesn't want its maker liable, or a CSE controller that is designed to maintain a specific temperature... doesn't.
Risk assessments become very different things when you have to calculate human lives. Non-serious industries have taken a very aggressive stance and demanded we take them serious and in priority.
We are all gonna learn.
@creativegamingname @rootwyrm @cR0w @hrbrmstr
Yup, it's gonna be a mess
@cR0w @scottwilson @hrbrmstr @trashpanda even PG&E is more competent, but they really think it's all going to be magical AI. Like the "universal basic computing" bullshit. These are functionally illiterate people with severe mental illness who present a clear and present danger to the life and safety of others. By intent. They can't even conceive of what's needed to support it except in vague ideas.
@rootwyrm @scottwilson @hrbrmstr @trashpanda That's not stopping them from trying to do what the shareholders are pushing for.
@cR0w @scottwilson @hrbrmstr @trashpanda like the obsession with "gigawatt" data centers is literally people who have no concept of what a gigawatt actually is. It's just a big, cool number someone said in earshot. If it had been terawatt, they'd be parroting terawatt. They don't actually know or tolerate people who understand the power, the infra, etc., those get fired. Magical "AI" will figure it all out for them.
@rootwyrm @scottwilson @hrbrmstr @trashpanda And unfortunately, the real engineers keep getting assured by the IT "experts" that AI is legit. And you can't blame them for falsely assuming that computer "engineers" are as legitimate as licensed PEs.
@badsamurai @hrbrmstr So you're saying the agent is British. That checks out.
@badsamurai @hrbrmstr @cR0w All fun and games till you say "wow this weather's really cooking isn't it?" or "main it's boiling today"....