infosec.place

Conversation

hrbrmstr 🇺🇦 🇬🇱 🇨🇦

@cR0w ugh. this is how “they”’re trying to defend “agents” https://www.philschmid.de/why-engineers-struggle-building-agents

cR0w h0 h0

cR0w@infosec.exchange

19 hours ago

Reply to @hrbrmstr@mastodon.social

@hrbrmstr "Agent engineering" is not engineering. Wild how they try to treat deterministic behavior as outdated when it's what all engineering relies on. We're all so screwed.

RootWyrm 🇺🇦

rootwyrm@weird.autos

19 hours ago

Reply to @cR0w@infosec.exchange

@cR0w @hrbrmstr christ on a cracker. This isn't just absolutely fucking not engineering. This is shit that is doing catastrophic, long lasting damage to vast swaths of critical infrastructure.

Like, this is not shit actual senior engineers can just fix our way out of. It's unfixable. The only solution is basically starting over from scratch. Which costs too much. So we'll be forced to try and keep the wheels on a bus that doesn't even have axles.

We're so fucked. So completely fucked.

Thomas Mayer

residuum@digitalcourage.social

19 hours ago

Reply to @hrbrmstr@mastodon.social

@hrbrmstr @cR0w

"Traditional software engineering is Deterministic."

Wait, there are people relying on software doing the thing it was designed for?

90s Script Kiddie

90sScriptKiddiw@kolektiva.social

19 hours ago

Reply to @hrbrmstr@mastodon.social

@hrbrmstr @cR0w Didn't you know? Unreliable software is *good* actually. We *must* embrace it! (Every bit of AI boosterism I encounter seems to include an imperative, it's a bit alarming how cult-like folks have become about it)

Seven

creativegamingname@infosec.exchange

19 hours ago

Reply to @rootwyrm@weird.autos

@rootwyrm @cR0w @hrbrmstr

Someone said the phrase three ifs in a trench-coat yesterday and it just resonates so hard whenever I read these articles.

The transition from deterministic systems to probabilistic agents is uncomfortable. It requires us to trade certainty for semantic flexibility. We no longer know and own the exact execution path.

re^{e^{e^{e^e}}}

mind_blown

Yuliya Bagriy

aviskase@infosec.exchange

19 hours ago

Reply to @hrbrmstr@mastodon.social

@hrbrmstr is this a satire? Because it sure reads like one abloblamp

cR0w h0 h0

cR0w@infosec.exchange

19 hours ago

Reply to @creativegamingname@infosec.exchange

@creativegamingname @rootwyrm @hrbrmstr Funny how it's the people who can't make things do what they want them to do who are trying to convince us that's it's good, actually.

deliverator

deliverator@infosec.exchange

19 hours ago

Reply to @hrbrmstr@mastodon.social

@hrbrmstr @cR0w saw this come up on the orange site. Not even sure where to begin. I feel like it argues against itself. It seems to admit that there still needs to be a deterministic real API somewhere. Or is my subscription state just going to be vibes? Or maybe we'd be better off if the banking system was some LLM context window.

So then is the argument that it's better for user facing interaction? Non determinism in this space doesn't seem great either. Clients ask my cool new SaaS app for something, it lies, and now either they're frustrated or I'm on the hook to provide something the LLM made up.

Scott Wilson

scottwilson@infosec.exchange

19 hours ago

Reply to @rootwyrm@weird.autos

@rootwyrm @cR0w @hrbrmstr I believe that point of determinism in programming is to eliminate ambiguous results.

There’s nothing ambiguous about stopping and starting a train; transferring money; or making any decision that affects human life or safety.

Sure fees like a mistake to give control of everything to AI agents, but clearly it’s not gonna stop.

Seven

creativegamingname@infosec.exchange

19 hours ago

Reply to @cR0w@infosec.exchange

@cR0w For real. We test employees for drugs, qualifications, loyalty, etc. But the managers/leadership/investors are allowed to express thoughts and opinions in areas they don't understand, have never understood, and in some cases have actually been proven wrong.

But they have money! 💰

OH! SWEET 💰 !

💰 doesn't buy happiness. But it will absolutely soften the edges of right/wrong.

@rootwyrm @hrbrmstr

RootWyrm 🇺🇦

rootwyrm@weird.autos

19 hours ago

Reply to @creativegamingname@infosec.exchange

@creativegamingname @cR0w @hrbrmstr oh, people saying that shit, I have legal and moral rights to hit in the face with a large cinder block at Mach 4+.

Motherfuckers, we have HAD this shit, except done CORRECTLY for over 25 YEARS. It's called eventual consistency! Except guess what? We know what steps MAY be taken, exactly. And get a DETERMINISTIC result.

Gods, we are SO fucked. Can I just retire? Or just jump off a fucking cliff? I'm no longer picky.

Seven

creativegamingname@infosec.exchange

19 hours ago

Reply to @rootwyrm@weird.autos

@rootwyrm
Legit thought I had the second one scheduled. Second one is looking more realistic.

Maybe we just... don't? Like - they want us to do some really stupid shit, but like... what if we just don't?

@cR0w @hrbrmstr

wall-e / Daniel

wall_e@ioc.exchange

19 hours ago

Reply to @cR0w@infosec.exchange

@cR0w @hrbrmstr pffft determinism is for losers!

Can't wait for BoeingAI to smash me into the face of the earth while 4 non-deterministic flight control systems argue about how to interpret flight sensor data

Fritz Adalis

FritzAdalis@infosec.exchange

19 hours ago

Reply to @wall_e@ioc.exchange

@wall_e @cR0w @hrbrmstr
You don't need to even install the flight sensors, just pretend.

bluestarultor

bluestarultor@tech.lgbt

19 hours ago

Reply to @hrbrmstr@mastodon.social

@hrbrmstr @cR0w Laughing out loud at "trust, but verify." That does not mean what they think it means.

None of this shit works. Junior coders just don't have the experience to know that or care of it's wrong because they were raised on "move fast and break things."

In the end, this will all come to a full and complete crash and we'll be back to normal. It won't be gentle. But there are far more people who don't want this than people who do. If Windows 8 is any proof, that will show up in numbers eventually.

cR0w h0 h0

cR0w@infosec.exchange

18 hours ago

Reply to @FritzAdalis@infosec.exchange

@FritzAdalis @wall_e @hrbrmstr Vibe inspections. Which is basically Boeing the last couple decades anyway.

Seven

creativegamingname@infosec.exchange

18 hours ago

Reply to @cR0w@infosec.exchange

@cR0w at least they are consistent?

@FritzAdalis @wall_e @hrbrmstr

cR0w h0 h0

cR0w@infosec.exchange

18 hours ago

Reply to @creativegamingname@infosec.exchange

@creativegamingname @FritzAdalis @wall_e @hrbrmstr We might not be thinking of the same Boeing then.

Anke

Anke@social.scribblers.club

18 hours ago

Reply to @bluestarultor@tech.lgbt

@bluestarultor @hrbrmstr @cR0w
The metaphor of driving across the sidewalk because it seemed faster seems like it belongs in an article arguing against using these systems. I'd rather have the deterministic system that doesn't mow down pedestrians.

Seven

creativegamingname@infosec.exchange

18 hours ago

Reply to @cR0w@infosec.exchange

@cR0w @FritzAdalis @wall_e @hrbrmstr

Well, I mean... if 10 out of 10 planes that take off burst into balls of fire.
They consistently explode?

blobnervous

Just a trash panda 🦝

trashpanda@m.alittlenook.net

18 hours ago

Reply to @scottwilson@infosec.exchange

@scottwilson @rootwyrm @cR0w @hrbrmstr Not to be that person, because obviously the overall point is super correct.

I just keep seeing people say that LLMs are inherently deterministic. But you can actually make LLMs deterministic. Set the temperature parameter to 0 and they will produce the same output each time in my experience.

That parameter determines how much to shift the softmax distribution that the final token is selected from. Set it to 0 and you always get the most likely token.

cR0w h0 h0

cR0w@infosec.exchange

18 hours ago

Reply to @trashpanda@m.alittlenook.net

@trashpanda @scottwilson @rootwyrm @hrbrmstr There's a difference between apparently consistent and deterministic. If it is truly deterministic, how is it any different than any other algorithm?

RootWyrm 🇺🇦

rootwyrm@weird.autos

18 hours ago

Reply to @cR0w@infosec.exchange

@cR0w @trashpanda @scottwilson @hrbrmstr because the very millisecond you feed it any variance in the training data whatsoever, your 'temperature 0' "model" now outputs something completely different.

They are not deterministic.

hrbrmstr 🇺🇦 🇬🇱 🇨🇦

hrbrmstr@mastodon.social

18 hours ago

Reply to @trashpanda@m.alittlenook.net

@trashpanda @scottwilson @rootwyrm @cR0w nope.

while it does make the model's sampling deterministic, you are very likely gonna get different results on different machines due to floating point weirdness.

diff quant schemes (GGUF, GPTQ, AWQ, etc.) will also produce different outputs even at temp=0, because they're approximating the original weights differently.

push context size too high and it can also have an impact on the chosen probabilities even with temp=0.

these things rly suck.

buherator

18 hours ago

Reply to @rootwyrm@weird.autos

@rootwyrm @cR0w @trashpanda @scottwilson @hrbrmstr See also: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

Mathaetaes

mathaetaes@infosec.exchange

18 hours ago

Reply to @hrbrmstr@mastodon.social

@hrbrmstr @cR0w great example about forcing a problem to fit an existing solution rather than solving a problem.

hrbrmstr 🇺🇦 🇬🇱 🇨🇦

hrbrmstr@mastodon.social

17 hours ago

Reply to @rootwyrm@weird.autos

@rootwyrm @cR0w @trashpanda @scottwilson so many folks had rly bad days/weeks this year due to OpenAI (et al.) retiring versioned models b/c even temp=0 did not save them. that's an insane setup to base any non-joke process on.

RootWyrm 🇺🇦

rootwyrm@weird.autos

17 hours ago

Reply to @hrbrmstr@mastodon.social

@hrbrmstr @cR0w @trashpanda @scottwilson like I said; single sentence change in a 'temp=0' without even changing the versioned model and it blew things up.
Anyone claiming they're even capable of determinism is on par with insisting LLMs are going to become sentient AGI.

It's a completely idiotic setup to base anything that actually needs to function on. Since it gets the answer wrong more than 25% of the time, best case. Much less something that needs to work reliably.

RootWyrm 🇺🇦

rootwyrm@weird.autos

17 hours ago

Reply to @rootwyrm@weird.autos

@hrbrmstr @cR0w @trashpanda @scottwilson but hey! At least we're gonna get a bunch of gigawatt scale data centers out of it!

Hang on, I've just been informed that there is no demand without the LLM bubble.
I'm also hearing that they're letting the LLM design and do safety checks on it.

... okay so we're getting a bunch of structurally unsound fire hazards and nuclear accidents out of it.

cR0w h0 h0

cR0w@infosec.exchange

17 hours ago

Reply to @rootwyrm@weird.autos

@rootwyrm @hrbrmstr @trashpanda @scottwilson With the end goal of pushing out and taking over for existing utilities.

Scott Wilson

scottwilson@infosec.exchange

17 hours ago

Reply to @cR0w@infosec.exchange

@cR0w @rootwyrm @hrbrmstr @trashpanda I can see it coming…

“Nobody knows how it works, so just give SkyNet full control, it’ll be fine.”

cR0w h0 h0

cR0w@infosec.exchange

17 hours ago

Reply to @scottwilson@infosec.exchange

@scottwilson @rootwyrm @hrbrmstr @trashpanda Imagine a few massive PG&Es being run by Copilot, Gemini, etc. and you'll see what some big tech companies see as their future.

For I am CJ

ForiamCJ@infosec.exchange

17 hours ago

Reply to @rootwyrm@weird.autos

@rootwyrm @hrbrmstr @cR0w @trashpanda @scottwilson

Don't forget all of the CO2 and environmental toxins from building a fuck ton of commercial real estate (buildings) that absolutely no one needs and that are designed in a way that would make them extremely difficult to convert into residential housing (or any other usable infrastructure)

Best case... a small subset of them become battery storage for renewables and the rest fall apart, poisoning the communities that they're being built in

Scott Wilson

scottwilson@infosec.exchange

17 hours ago

Reply to @ForiamCJ@infosec.exchange

@ForiamCJ @rootwyrm @hrbrmstr @cR0w @trashpanda Yeah, this irks me to no end.

EVEN IF #GenAI was the greatest technology ever created and worked exactly how the TechLuminatti wanted, it’s STILL frying the planet to a crisp. Horrible.

Fritz Adalis

FritzAdalis@infosec.exchange

17 hours ago

Reply to @rootwyrm@weird.autos

@rootwyrm @hrbrmstr @cR0w @trashpanda @scottwilson
It's not an accident when the corp lets it melt down because it's no longer profitable.

Preston Maness ☭

aspensmonster@tenforward.social

17 hours ago

Reply to @hrbrmstr@mastodon.social

@hrbrmstr @cR0w >The Trap: An agent might take 5 minutes and cost $0.50. If step 4 of 5 fails because of a missing or wrong input, crashing the whole execution is unacceptable.
>
>An error is just another input. Instead of crashing, we catch the error, feed it back to the agent, and try to recover.

"The agent takes five minutes to complete a query or task, costs 0.50 USD per run, and will take a guess at what to do when it fails."

I think we might be mistaking which part of this arrangement is unacceptable.

>Agents require verbose, "Idiot-Proof" semantic typing (e.g., "user_email_address" instead of "email") and highly descriptive docstrings that act as "context".

Wasn't the entire promise of NLP the capacity to process... natural... language? Requiring highly specified systems for the agent but tolerating pRoBabiLiSTiC output from the agent is the worst of both worlds.

This blog post has to be bait.

Michael Wyman

mwyman@mastodon.social

17 hours ago

Reply to @deliverator@infosec.exchange

@deliverator @hrbrmstr @cR0w “Welcome to Vibank, how may I help you today?”
“What’s my account balance?”
“You have $21.57 in your checking account.”
“What?! No, that can’t be right, my paycheck just landed this morning!”
“Oh, you’re right. You have $1,300,156.50 in your account.”
“Huh? No, that’s silly. Just tell me the real balance.”
“You owe us $150,000.”

deliverator

deliverator@infosec.exchange

16 hours ago

Reply to @mwyman@mastodon.social

@mwyman @hrbrmstr @cR0w 😆

"Error context window exceeded you don't exist anymore"

Mallory's Musings & Mischief

malcircuit@thingy.social

17 hours ago

Reply to @creativegamingname@infosec.exchange

Edited 17 hours ago

@creativegamingname @rootwyrm @cR0w @hrbrmstr

And, like, it's still possible to do engineering with probabilistic systems but it requires a *theoretical framework for how the system operates* which is the real issue. These LLM systems are impenetrable black boxes, which makes it impossible to know what the risks and failure modes are.

Seven

creativegamingname@infosec.exchange

16 hours ago

Reply to @malcircuit@thingy.social

@malcircuit @rootwyrm @cR0w @hrbrmstr

When I read that engineering students were using AI's I just started laughing. People think it's funny when they see weird images and hallucinations from bots. I imagine they'll think it's much less funny when their elevator decides it has arrived, their car doesn't want its maker liable, or a CSE controller that is designed to maintain a specific temperature... doesn't.

Risk assessments become very different things when you have to calculate human lives. Non-serious industries have taken a very aggressive stance and demanded we take them serious and in priority.

We are all gonna learn.

Mallory's Musings & Mischief

malcircuit@thingy.social

16 hours ago

Reply to @creativegamingname@infosec.exchange

@creativegamingname @rootwyrm @cR0w @hrbrmstr

Yup, it's gonna be a mess

Seven

creativegamingname@infosec.exchange

16 hours ago

Reply to @malcircuit@thingy.social

@malcircuit @rootwyrm @cR0w @hrbrmstr

Literally.

Buy rubber clothing... and not just the sexy kind.

RootWyrm 🇺🇦

rootwyrm@weird.autos

15 hours ago

Reply to @cR0w@infosec.exchange

@cR0w @scottwilson @hrbrmstr @trashpanda even PG&E is more competent, but they really think it's all going to be magical AI. Like the "universal basic computing" bullshit. These are functionally illiterate people with severe mental illness who present a clear and present danger to the life and safety of others. By intent. They can't even conceive of what's needed to support it except in vague ideas.

cR0w h0 h0

cR0w@infosec.exchange

15 hours ago

Reply to @rootwyrm@weird.autos

@rootwyrm @scottwilson @hrbrmstr @trashpanda That's not stopping them from trying to do what the shareholders are pushing for.

RootWyrm 🇺🇦

rootwyrm@weird.autos

15 hours ago

Reply to @rootwyrm@weird.autos

@cR0w @scottwilson @hrbrmstr @trashpanda like the obsession with "gigawatt" data centers is literally people who have no concept of what a gigawatt actually is. It's just a big, cool number someone said in earshot. If it had been terawatt, they'd be parroting terawatt. They don't actually know or tolerate people who understand the power, the infra, etc., those get fired. Magical "AI" will figure it all out for them.

cR0w h0 h0

cR0w@infosec.exchange

15 hours ago

Reply to @rootwyrm@weird.autos

@rootwyrm @scottwilson @hrbrmstr @trashpanda And unfortunately, the real engineers keep getting assured by the IT "experts" that AI is legit. And you can't blame them for falsely assuming that computer "engineers" are as legitimate as licensed PEs.