Conversation

I put together some detailed notes showing how I use Claude and ChatGPT as part of my daily workflow - in this case describing how I used them for a 6 minute side quest to create myself a GeoJSON map of the boundary of the Adirondack Park in upstate New York
https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-case-study/

1
0
0

I wrote this up in part because I'm tired of hearing people complain that LLMs aren't useful. There are many valid criticisms of them as a technology, but "not being useful" should not be one of them https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-case-study/#llms-are-useful

8
1
0

@simon Agree. Two things can be true: they are useful and they have bad UX.

0
0
0

@simon Agree 💯 . People shouldn't confuse "not as easy to use as I expected" with "this thing doesn't work."

0
0
0

@simon I agree and thank you for writing this up. I'll be sure to check it out. I'd like to explore more reasonable uses with proper use cases but there is just so much messianic chest thumping it's hard to take it seriously. My rant a few days ago got yet another "just wait, AGI is just around the corner" silly reply.

I'm NOT saying you have any part of this. It's just when the bullshit is so thick, it's a bit hard to concentrate.

0
0
0

@simon To most people the idea of computers helping them is centred on needing ever less expert knowledge and experience, so anything that flips that into reverse is going to look useless. There are quite a few parallels with spectrum analysers. To use one productively you need to understand a lot both about the machine and about your intentions, you can also use the machine incorrectly and get seriously misleading results. The difference is that spectrum analysers stay in the lab, mostly, they don't have tech giants integrating them with your kitchen microwave oven (hey, it's RF!) in the same way that LLM chatbots started appearing on shopping websites.

1
0
0

@simon I’m glad you wrote this. I posted a thread making a similar argument on Bluesky last weekend, because I couldn’t understand why people seem so keen to downplay their usefulness. https://bsky.app/profile/olihawkins.bsky.social/post/3knv4xmryic2q

0
0
0

@simon I start thinking they’re useful but untrustworthy, due to the verification effort.

Then I realised that the usefulness is largely in relation to the effort taken to create external oracles to verify truth.

Generating a quick image? Useful. Generating precise data around something I need to be right? Not useful.

1
0
0

@synx508 I've been using "chainsaw" as a comparison (a tool that's only useful if you know what you're doing, and dangerous otherwise) - but "spectrum analysers" is much better for illustrating the skill needed to get great results

1
0
0

@garyfleming Right - the trick with these things is figuring out how to use them productively despite their enormous reliability problems

I love using them for code because it's very easy to check if it works or not - it's much easier to check that code at least runs and produces what looks to be the right output than it is to fact check prose

1
0
0

@simon I agree with your first paragraph and strongly disagree with your second.

My experience of watching people do reviews of PRs tells me almost no-one can tell if non-trivial code is correct if there aren’t good tests in place.

Plenty of other domains where LLMs work well, though - principally where output is subjective.

1
0
0

@simon @synx508 it's an excellent analogy — and it raises the question of why that tech has stayed in the lab and what would happen (to inferential processes, to public conceptions of spectrography, to pseudoscientific uses & abuses) if it came with every microwave & without any guidance. In this analogy, if spectral analysts are outnumbered by susceptible masses, might utility not end up net negative?

1
0
0

@garyfleming Right: part of using these for real work is being incredibly effective at reviewing code and writing tests, both of which are uncommon skills

But if you're knocking out a GeoJSON boundary of a park for fun (a very low-stakes activity) the risks are pretty minimal

1
0
0

@simon agreed, that’s neither precise nor objective, given the needs/stakes. Good use case for an LLM.

0
0
0

@dingemansemark @simon I think it'd be viewed as a superfluous feature on an oven, similar to the quadrature scopes that Marantz put on their FM tuners/receivers about half a century ago. There was a tiny arms race, too. These scopes were adopted by a few other hifi brands, there was a sense that the manufacturers were competing to add the most bells and whistles rather than doing something that most people found useful (there's that word again).

0
0
0

@simon @filippo i think the main area where LLMs are useful is as a sounding board. they can point you in a direction for further research based on conversational discovery, assuming that the model was trained on a large enough corpus.

to me, what would be useful is a model which can reproduce its sources when producing a response. that way, you are more immediately sent on a productive research journey based on that initial sounding board conversation.

0
1
0
@simon

"and it was clearly wrong" - Here's my theory: LLM's are useful if results are easy to verify.

In your example eyeballing can easily tell if the resulting shape is _similar_ to the input area. As I understand your use-case doesn't require too much precision, which is totally fine, but it's important to ask how much harder your problem would get if you wanted to make sure the input and output shapes are precise matches? Would you use an LLM to write some verification code? How do you decide if that code is correct? (I think in this particular case actual verification could be actually pretty easy, but I wanted to stick with the example)
1
0
1

@buherator absolutely: the reason LLMs are so useful for code stuff is that code accuracy is easier to verify than prose, because you can run the code and see if it works

And it's still not easy! Using LLMs has encouraged me to really invest in improving my QA, code review and testing skills

1
0
0
@simon Now let me put on my Grumpy Security Guy Hat:

Verifying code is incredibly hard. One of the main dangers of LLM's I see is that it's really easy to conclude that the code is correct because it works in the general case, but it will break havoc in edge cases. Worst, you won't be able to reason about those edge cases because you wouldn't know how the code works (you can figure it out of course, but then there goes your claimed efficiency).

Now for toy problems this is all good and well. On the other hand we've all seen toy scripts ending up in production...
1
0
0

@buherator that's true, but honestly it's not that different from code reviewing PRs from human authors

If anything LLM code is easier to review: it's more likely to use the simplest approach, it comes with comments that actually match what the code is doing and there's no ego: you don't have to think for a second about if your feedback or requests for changes will offend the author!

But you do have to be very good at code review to use these things responsibly

0
0
1