The Generative AI Bubble is Going to Pop Thread

Soriak

Ars Legatus Legionis
12,492
Subscriptor
That post really needs some editing... but the two big premises that I got to, before giving up, are entirely wrong: NVIDIA's stock price is not based on an expectation that it increases GPU sales quarter over quarter. That's not how asset pricing works. And looking at revenue today when projecting the return on capital expenses also doesn't make any sense. You need to look at future revenue. If we look at various measures of AI capabilities, they increase with available computing power (roughly a linear increase in performance with an exponential growth in computing power). Right now, Cursor makes for a pretty decent programmer. If you give them 10x the computing power, it's very likely to be a pretty good programmer. 10x more computer power, and it could be a very good programmer.

These services are also clearly priced for trial use. Cursor Ultra at $200/month is expensive relative to Microsoft Office, but ridiculously cheap relative to an entry-level programmer, despite being more capable. The article notes that they changed their pricing model, and yes, it's gotten more expensive to use advanced models... but barely, and so what? You can't look at the cost of something without also looking at the value: Cursor with sonnet max is better than what was available a year ago. If Cursor were a very good programmer, paying $100,000/month for a subscription would be a bargain for a company: it's not replacing one programmer -- it's replacing ten of them. Or a hundred. And what you then need is a small number of (very highly paid) software engineers that can manage an army of AI agents, instead of an army of programmers. We're likely already at the stage where a programmer + Cursor is more productive than a programmer with outsourced support in India to do routine tasks. (There's a paper that programmers with AI are less productive than programmers without AI... and the mechanism is that programmers go browse the web/social media while they wait for AI to finish and stay distracted longer than it takes AI to code. Everything has a learning curve. Build in a beep when the task is done and fire programmers who spend all day on social media.)

The other obvious application area that the article is missing (as far as I can tell) is manufacturing. Factories already rely on robots for a lot of tasks, but it's very expensive to train them because you need to go through all potential issues that can arise and manually feed them into the training data. GenAI can let robots address issues they haven't encountered in their training data. If you can cut down on the lengthy and expensive training phase, and also use robots in cases that are less predictable, you can substantially increase automation across factories. For reference, there are roughly 500 million people employed in factories globally. How can someone go on for more than 14,000 words and not talk about manufacturing? In fact, the whole article seems to be about individual user subscriptions... that's obviously not where the money is going to be.
 

w00key

Ars Tribunus Angusticlavius
7,352
Subscriptor
It did highlight OpenAI's success in selling to end users, and Anthropic's. I am gladly paying for the latter and if I run too much into limits, I'll expense that $100/200 a month, that's 5 or 10 parking sessions. Surely a coder that writes me a few major enhancements and bug fixes for Nokia's Python TWAMP implementation (fork here) in an afternoon of messing around is worth that. Including documentation (--help) and a block in the readme showing how the few new command arguments work.

In my own projects I might match Claude's pace, but in a completely foreign project - git cloned it and just started asking for enhancements, no way I can read and understand it before it already finished and is ready for testing in like a minute. You still need to guide it with wisdom and insight but it is an amazingly productive junior level developer.


In this AI economy, you need to have a unique selling point. Claude Code is one, Gemini CLI is another - not as polished yet but the underlying model is better with a huge context limit. No idea what OpenAI is doing, Copilot seems to be received more negatively and I don't have a sub, nor have a reason to try it. Others - they better do something well and better than others. It can be just carving out your niche and do that really well, even if it is "just" sales and support so your product works for their users, not against.



The article is very negative about value added providers. Sure, they spend 150% of revenue on tokens now, but have you seen the price war? Token prices at parity capability is crashing quickly. If you don't need the best of the best, high end models for coding, you might as well use the newest Gemini Flash or GPT 4.1 mini and save a ton of money. Or Qwen3 or Deepseek V3, R1 via Fireworks.ai. A restaurant menu recommendation bot or a chatbot with limited options, basically replacement for press 1 for service, 2 for sales, then please enter your service tag ID, doesn't need o3 or Opus.


One thing it got right, and that is, no idea how token sellers will ever cover their cost with tokens. Their cost now seems to be mostly overhead, training, and they need to sell a lot more tokens to shift it to the part with a gross margin, with inference cost/token sold < 100%. That is going to be insanely hard but not impossible.

But if my millions of tokens used last 2 weeks is any indication, this explosion of usage is about to happen when it finally becomes useful. You can only spend so many tokens chatting, but reading code, making enhancements, running unit tests, interpreting result, finding issues, that in a loop, scales usage up by many zeros.

Before "agents", my peak Kagi Assistant token usage was 50k a day, most days near zero. My Claude Code token usage during the first week was 55 million a day, so add three zeros at least, on the $20 plan where I was forced to take breaks because I ran out of quota. And that includes weekends and a day of near zero usage. That's why I probably need to bump it up to $100 a month. When I get better at this, make bigger plans and let it run longer autonomously, I can multitask with agents and get more done.
 

Soriak

Ars Legatus Legionis
12,492
Subscriptor
In the last week, I used 160 million tokens of claude-4-sonnet to program two react-based apps, which Cursor tells me has a value of $130 (although I think there is some bug in the edit tool, because usage is 5x to 10x than when I just ask it to give me the code and I hit "apply" manually, so probably long-term usage will be less). In my experience, getting a programmer who understands the basics of what you need in a platform for research and getting them to do just one of these things would easily be a few months and $10k. And it'd probably take more of my time to explain it to someone with enough detail that they can go run off and do it vs. just iterating with AI and asking it to make changes.

It's like the early days of Uber when you could get a ride across town for $5 and they burned billions of dollars. Now, they're making $20bn profit in a year and are the dominant player in rideshare... and their real value proposition (the pricing algorithm) is going to become even more valuable with self-driving taxis, when it's all about getting the supply/demand right. For example, you want EVs to go charge not only when they are low on battery, but also when you predict to have lower demand, so that all cars are fully charged and on the road during peak demand time when you can also charge more. (Waymo and Uber have a partnership already.)
 

AndrewZ

Ars Legatus Legionis
11,645
It's like the early days of Uber when you could get a ride across town for $5 and they burned billions of dollars. Now, they're making $20bn profit in a year and are the dominant player in rideshare... and their real value proposition (the pricing algorithm) is going to become even more valuable with self-driving taxis, when it's all about getting the supply/demand right. For example, you want EVs to go charge not only when they are low on battery, but also when you predict to have lower demand, so that all cars are fully charged and on the road during peak demand time when you can also charge more. (Waymo and Uber have a partnership already.)
Except that there are 4+ competitiors, and several more in China, so pricing goes waaaaay down.
 
  • Like
Reactions: Xenocrates

AndrewZ

Ars Legatus Legionis
11,645
For example, you want EVs to go charge not only when they are low on battery, but also when you predict to have lower demand, so that all cars are fully charged and on the road during peak demand time when you can also charge more. (Waymo and Uber have a partnership already.)
The recent Ars article on Waymo seems to indicate that most robo taxi rides occur during peak hours and drops off sharply after that. So plenty of off-peak time to charge.
 

Soriak

Ars Legatus Legionis
12,492
Subscriptor
The recent Ars article on Waymo seems to indicate that most robo taxi rides occur during peak hours and drops off sharply after that. So plenty of off-peak time to charge.
True for all ride share. The challenge is to predict at the level of granularity where you can decide whether to send a car to the nearest charging station or in the opposite direction to another neighborhood. The obvious one is commuting time, which is pretty easy to forecast. Also things like people coming home from bars. But you also need some way to automate where the bars are (so the right cars are in the right place), ideally with some real-time data on capacity (like what Google Maps displays), and some data on local events (road closures, people calling taxis after a concert, etc). Uber can send drivers notifications to go from their suburban location to a specific downtown area 30 minutes away (for a "bonus"), which means the guy who is expected to call an Uber 30 minutes later will have a car magically nearby. You still need this for self-driving cars.

Except that there are 4+ competitiors, and several more in China, so pricing goes waaaaay down.
Great. I love lower prices. Eventually, these companies can start to compete on service quality: the equivalent of first class on a flight. I'd gladly pay 2x if my next hour-long drive came with a way to connect my laptop to a display so I can do some work on a longer commute. Right now, these drives are still super expensive. But in mainland China, an hour-long Didi ride costs about $30. Presumably, a self-driving car would be a lot closer to that.

That's going to have substantial ripple effects, including on house prices. An hour-long commute in a nicely equipped self-driving car is a breeze. I can have my coffee, get started on emails, and show up to the office ready to work. Driving myself for an hour every morning sounds like a nightmare.
 

hanser

Ars Legatus Legionis
42,256
Subscriptor++
In the last week, I used 160 million tokens of claude-4-sonnet to program two react-based apps, which Cursor tells me has a value of $130 (although I think there is some bug in the edit tool, because usage is 5x to 10x than when I just ask it to give me the code and I hit "apply" manually, so probably long-term usage will be less). In my experience, getting a programmer who understands the basics of what you need in a platform for research and getting them to do just one of these things would easily be a few months and $10k. And it'd probably take more of my time to explain it to someone with enough detail that they can go run off and do it vs. just iterating with AI and asking it to make changes.
This falls apart very quickly when you have an existing thing that you need to modify, which is most of the stuff in the world.

I've been using Claude Code a lot this week... it, uh, often does stupid things in software that already exists. I cannot, as the olds would say, "work effectively with legacy code". Where "legacy code" is "thing that exists and has value to customers".

It's great for prototyping net-new stuff, though, for sure.

--

Revisiting my post from the first page...

The marginal costs per token generation will fall
Happening pretty rapidly. I posted a link to a 300 page PDF that had some nice graphs on this in the last page or two that shows the drop in marginal costs in the last 5 years.

It's painfully obvious to me what the killer application is: unemploying humans
Probably still true, but it's not so likely to be existing, competent programmers. It actually seems like it'll be other members of the Professional Managerial Class. If you have a bullshit job, you'll be automated, probably.

--

Jevons Paradox does raise its fascinating head, though.

Like maybe the token providers' businesses will end up like airlines, telecoms, and other commoditized businesses. It seems that way. Were those hype bubbles indicative of a fundamental technology that didn't live up to its promise? Not really. Those technologies worked fine, but it was also a race to the bottom, and many investors in those technologies lost their shirts. But, like, the technology changed the world and benefitted just about everyone in a practical sense.

Some additional reads:


View: https://www.threads.com/@benedictevans/post/DKb_G9yOaqh?xmt=AQF0bousp2iYddOIWXtNVJJ2t1gD4H39XRBODqjfgL-fNQ



View: https://www.threads.com/@benedictevans/post/DFZbNVrx7Ea?xmt=AQF0bousp2iYddOIWXtNVJJ2t1gD4H39XRBODqjfgL-fNQ



View: https://www.threads.com/@carnage4life/post/DGYrG5cyqCK?xmt=AQF0bousp2iYddOIWXtNVJJ2t1gD4H39XRBODqjfgL-fNQ
 

w00key

Ars Tribunus Angusticlavius
7,352
Subscriptor
I've been using Claude Code a lot this week... it, uh, often does stupid things in software that already exists. I cannot, as the olds would say, "work effectively with legacy code". Where "legacy code" is "thing that exists and has value to customers".

It's great for prototyping net-new stuff, though, for sure.
I moved onto maintaining old shit. Like proper old, replacing JUnit 3.8.2, from May 2007, old enough to drink, and refactor / migrate all usages to the latest.

Make a plan, exhaustive search for all x extends TestCase, prep the change and implement.

Seems to work well.
 

dzid

Wise, Aged Ars Veteran
1,243
Subscriptor
And this also means that their cognitive workload will decrease, creating space for more valuable tasks.
In terms of its impact to existing markets and ways of working, I largely agree with you. What I'm less sure of is how well we will manage the transition to ensure that we will utilize freed up cognitive capacity in productive ways.

I think this will need to be taught, preferably integrated into educational curriculums that balance both the potential benefits and pitfalls. Many of these we are, for better or worse, becoming aware of while the tech is rapidly evolving and heavily marketed.

Better to build awareness and address issues than pretend a transformation isn't happening, though.
 

ramases

Ars Tribunus Angusticlavius
8,232
Subscriptor++
I moved onto maintaining old shit. Like proper old, replacing JUnit 3.8.2, from May 2007, old enough to drink, and refactor / migrate all usages to the latest.

Make a plan, exhaustive search for all x extends TestCase, prep the change and implement.

Seems to work well.

This is business function orthogonal, and hence is not impaired by imperfect understanding of the business function/technical implementation mapping.

When you add new functionality to existing code, you more or less need to:

0) Understand the implementation (how is existing stuff done?)
1) Map-tech-to-business: Understand what business purpose was behind it (why does it exist? and why in that particular form?)
2) Business requirements reconciliation: You need to reconcile the purpose of what is there to what should be added
3) Map-business-to-tech: You need to define how the new implementation that serves the reconciled business requirements should look like
4) Implementation reconciliation: Existing implementation needs to be transformed into the implementation outlined in 3)

And that is one of the hardest things in professional software development. The inability to perform this combined reverse/forward translation between technology and business, and reconciliation between the different steps, is why many developers, however technologically competent they may be, never truly become senior or principal developers.
 

w00key

Ars Tribunus Angusticlavius
7,352
Subscriptor
This is business function orthogonal, and hence is not impaired by imperfect understanding of the business function/technical implementation mapping.

When you add new functionality to existing code, you more or less need to:

0) Understand the implementation (how is existing stuff done?)
1) Map-tech-to-business: Understand what business purpose was behind it (why does it exist? and why in that particular form?)
2) Business requirements reconciliation: You need to reconcile the purpose of what is there to what should be added
3) Map-business-to-tech: You need to define how the new implementation that serves the reconciled business requirements should look like
4) Implementation reconciliation: Existing implementation needs to be transformed into the implementation outlined in 3)

And that is one of the hardest things in professional software development. The inability to perform this combined reverse/forward translation between technology and business, and reconciliation between the different steps, is why many developers, however technologically competent they may be, never truly become senior or principal developers.
Well, it's basically fancy autocomplete so you should still be the architect. It's not for nothing that I compare it to an eager, never tired junior. The things you list aren't tasks for those, trying to outsource it to an AI is rather silly.

Gemini Pro though is usually smarter, can ingress books worth of code with its 1M token context (~100k LOC) and documents and answer questions about it. Asking it about the implementation, search for a bug in the haystack given hints, that sometimes works better than expected. If not, eh, alright I'll do it myself.

I wouldn't use Claude for that, yet. Gemini can write plan.md, together with me giving it hints and insights it misses, Claude can implement it. Although step 1 to 3 are compatible with Claude, investigating requirements, the people and uses cases behind it, write stories, check for differences between stories.md and requirements.md, that it can do just fine.
 

dzid

Wise, Aged Ars Veteran
1,243
Subscriptor

Xenocrates

Ars Tribunus Militum
2,192
Subscriptor++
  • Like
  • Angry
Reactions: bjn and dzid

dzid

Wise, Aged Ars Veteran
1,243
Subscriptor
What do you want to bet that they also will keep any revenue related to those AI generated songs, rather than paying out royalties to the artists they're listed under?
I'm betting the company's default position will be "sue us".

Yes, sue them . Whatever it takes to inflict the maximum financial pain on the company's leadership and shareholders and/or investors. Nothing else stands a chance of changing their behavior. Not shame, not a basic sense of decency. Deprive them of money, in every way conceivable.
 
  • Like
Reactions: bjn

Dmytry

Ars Legatus Legionis
11,044
Didn't chatgpt already pretty much talk someone into going after Sam Altman, of all people?

I think as far as bubble goes, there's absolutely nothing that an LLM can do that would live up to the expectations the management is getting after Psycho (Przemysław Dębiak) vs AI programming contest thing the other week.

(Honestly I feel that if any of these expectations of how much that can generalize were true, in combination with the above, we'd be living "I have no mouth and I must scream" right now.)

I did some competitive programming a long time ago, and there's just so many devils in details with that sort of thing that outsiders just wouldn't realize.

Firstly, there is a simulator provided to contestants, with the sources. Meaning that at least in principle, these contests are winnable without great natural language capability (that would be needed for understanding their hard to read descriptions). Some generic ML self play thing that examines the state of the simulator, might be able to beat it a good fraction of the time.

Second is the format. That the problem is NP hard doesn't make the contest any harder or easier. If anything it saves the looking for polynomial solutions. There's only 10 hours and the state of the art for such problems is doing something like AlphaFold, which you just can't do in 10 hours on a shitty laptop.

Which brings me to the third point: "AI" has already moved these problems well past the point of humans having to directly invent heuristics.

And of course, in the real world such problems are rare but when they do occur they are usually a very big deal and command thousands hours of work standing on the shoulders of hundreds of thousands hours of work. Relatively to the sheer time expidenture of real work & familiarity of experts with the problem, this is like 10 second random-board-game-like-chess vs a normal chess match.

(I even were in a contest with Psycho once, a long time ago, I got 5th place and he 8th)

edit: to bring the point back to "bubble", what makes it a bubble is that stunts like this create extremely over inflated expectations. So inflated that any magic which could meet said expectations would end existence of things like "stocks" altogether. Trading on these expectations is like making a humorous bet that the first nuclear bomb would ignite the atmosphere.
 
Last edited:
  • Like
Reactions: bjn and dzid

w00key

Ars Tribunus Angusticlavius
7,352
Subscriptor
Hmm, I think token sellers may be onto something. Last weekend it saved serious time, by just asking pointed questions and pasting in a few links for Gemini in deep research mode to look at.

I had a very, very specific question that even the realtor didn't know off her head. Here is the huge page with zoning rule for this municipality, and this is the link to the rules governing how things must look (height restriction, flat vs pointy roofs, etc), figure it out for this exact plot.

It wasn't a single turn great success but with each Q&A, I got closer and figured it out "together" in record time. That was... Impressive.

It was also a bit more stubborn than Gemini in CLI mode, no "apologies, my bad" non stop, a few "no I'm right, see x", and "yes I already included that in the time estimate". Which is great, I want the truth, not ass kissing even when I'm wrong.


The economics of investing billions into something that seems interchangable is still questionable. Claude also did alright, helped with drafting an email and keeping a todo so I don't skip a point, but for the huge read commands, I don't trust it not to run out of context, yet. If they all get to the point of good enough, I wonder what makes one unique vs another. I see zero reason to buy OpenAI's offering too, these two are plenty smart, and it may become "whoever is the cheapest" or "bundled" a la Teams in office 365 in a year.
 

Soriak

Ars Legatus Legionis
12,492
Subscriptor
It's possible that the models will all become commoditized and it just doesn't matter which one you use. But for now, I'm skeptical. Claude is better at coding than ChatGPT. ChatGPT is really good at researching and aggregating strengths and weaknesses of different options (and has an amazing Agent mode). Gemini's Veo 3/Flow is by far the best video generator.

It's possible that "being amazing at everything at the same time" simply is not possible, and training data that makes you good at one thing could make you worse at something else. In that case, we should expect to see an ecosystem of different models that are specialized on different skills. Definitely true with humans: high-performing academics are some of the least practical people in the world, whereas "practical" people don't make for innovative researchers. Same with some innovators/start-up founders: they have characteristics that make them really good at the one thing they're doing, but if you staffed a company with people like that, it would fail miserably.

Companies could also compete on how they fine-tune their training data. For example, companies have been hiring people to look at AI prompts and vote based on the ones they thought were better. That's how we ended up with some sycophantic models: raters had a preference for those messages that actual users don't want (or at least not the users who post online about how bad those models were). On the other hand, people naturally suffer from confirmation bias, and so anything based on these kind of ratings is not likely to optimize for accuracy. Then you have the same problem as with news: people say they want accuracy, but they sort into belief-confirming news (there's a reason we don't just have one news channel that objectively reports everything). So you can have models that cater to different audiences. Some models could also be fine-tuned by domain experts, which would be extremely expensive, but also allow for very profitable niche deployments.
 
It's possible that the models will all become commoditized and it just doesn't matter which one you use. But for now, I'm skeptical. Claude is better at coding than ChatGPT. ChatGPT is really good at researching and aggregating strengths and weaknesses of different options (and has an amazing Agent mode). Gemini's Veo 3/Flow is by far the best video generator.
Can you expect that current Claude will be better than next GPT or vice versa? After OpenAI ups the game to claim programming contest wins, just AI by itself?

Or from another angle. Did Anthropic convince you not to hire people who don't use specifically Claude for coding? I don't think so; therefore it has roughly equivalent performance at the task that LLMs actually do perform well at (convincing you).

I think the models all have the same strength, convincing people. With the outcomes landing on the spectrum from typical "meh" reaction to attempting to marry it. This growth curve plateaus as they reach all the susceptible people.

As far as performing well at the actual tasks in question, that's the many trillion dollar question. Supplementing a worker for $200 / month is small peanuts. Replacing a worker for half the salary is what justifies current valuations. It is the latter that they are going for with things like e.g. that programming contest, where for all outwards appearances the AI coded a solution from a description, without having to have it hand held at all, and done that better than just about everyone except that one Polish guy who's really good at it.

The valuations are driven by an expectation that one of them will actually create a say $50 000 / year programmer which is almost as good as hiring that Polish guy at >$200 000 / year. They are not driven by "maybe Claude will be the best one for coding assistance even though Gemini scored best on benchmarks and OpenAI scored best on some shit they themselves sponsored".

There's a lot of problems with the notion of coding assistance. All I see people talk up is them doing things that they wouldn't be doing otherwise (and likely not up to a production level standard). When I look at people who I know to be decent programmers, I see overwhelming meh reception - "I used it for refactoring, which was neat, but I am not sure it saved me any time because it broke a few things" or even "I used it for architecture and then I ended up looking all that up anyway" (where it competes vs reading some of the same material it trained on).
 
Last edited:
  • Like
Reactions: bjn

Soriak

Ars Legatus Legionis
12,492
Subscriptor
Can you expect that current Claude will be better than next GPT or vice versa? After OpenAI ups the game to claim programming contest wins, just AI by itself?
I'm pretty confident next GPT will be better than current Claude, but next GPT vs. next Claude -- who knows? Opus is a beast, but way too expensive for me to use day-to-day. If the cost of that came down (by a LOT), I would already have a performance improvement, so no need to speculate about what's possible. I use o3-pro frequently, but not for programming.

Or from another angle. Did Anthropic convince you not to hire people who don't use specifically Claude for coding? I don't think so; therefore it has roughly equivalent performance at the task that LLMs actually do perform well at (convincing you).
I don't think any hiring requirement would involve experience with a specific AI because the skill of using AI is transferrable (but actually hard to acquire). It'd be like not hiring someone with extensive Google Sheet experience because your company runs on Excel... the difference between the two is trivial and easily learned, but knowing how to use a spreadsheet matters.

I think the models all have the same strength, convincing people. With the outcomes landing on the spectrum from typical "meh" reaction to attempting to marry it. This growth curve plateaus as they reach all the susceptible people.
Really depends on what you use them for. If you have random conversations with AI about everyday topics... I don't know, and don't care. But I can tell if one AI model writes code that works and one AI model writes codes that introduces new bugs. (The problem with "non-existent functions" to me seems solved with the use of mcp like Context7. No idea why this ever seemed like a huge problem: you can just feed documentation to AI, and they're literally designed to process text. Similarly, you can have AI document your codebase for another AI agent: way too long/detailed for a human, but AI will process it in a second. Makes good documentation more important, but fortunately, AI can also help with that.) I can tell if one model can clearly walk me through different analysis plans. And I can watch the videos and images that come out of these tools. I'm not claiming some objective benchmark of what's "good," but for various tasks, doing them with one model just takes less time and leads to better output than using a different model. If the "best" model changes, I don't have a problem with switching.

As far as performing well at the actual tasks in question, that's the many trillion dollar question. Supplementing a worker for $200 / month is small peanuts. Replacing a worker for half the salary is what justifies current valuations. It is the latter that they are going for with things like e.g. that programming contest, where for all outwards appearances the AI coded a solution from a description, without having to have it hand held at all, and done that better than just about everyone except that one Polish guy who's really good at it.

The valuations are driven by an expectation that one of them will actually create a say $50 000 / year programmer which is almost as good as hiring that Polish guy at >$200 000 / year. They are not driven by "maybe Claude will be the best one for coding assistance even though Gemini scored best on benchmarks and OpenAI scored best on some shit they themselves sponsored".
Yes, except there's no reason to be cheaper than human labor. The Polish guy who gets paid $200k/year costs the company $400k/year with benefits, and he works 40 hours per week. The AI that replaces him and works 24/7 is worth $1.6m/year. No need to offer stock options, no extensive onboarding process, no risk that it takes trade secrets to a competitor who makes a better offer, and it gets better with a software update rather than extensive training. So $1.6m is a floor, not a ceiling. And in reality, that programmer is probably making at least $400k, so you can double the figure.

There's a lot of problems with the notion of coding assistance. All I see people talk up is them doing things that they wouldn't be doing otherwise (and likely not up to a production level standard). When I look at people who I know to be decent programmers, I see overwhelming meh reception - "I used it for refactoring, which was neat, but I am not sure it saved me any time because it broke a few things" or even "I used it for architecture and then I ended up looking all that up anyway" (where it competes vs reading some of the same material it trained on).
People aren't talking about internal uses publicly because at this point, that's a competitive advantage -- why would they disclose publicly how you can copy them? My success rate for research grant applications over the last year has been 100%, and I sure as hell don't tell other people how they can beat me on the next application. And if I don't do this, people employed for companies that compete for multi-billion dollar markets aren't going to be sharing either. People are sharing toy examples to try and signal that they have some expertise with this.
 
I'm pretty confident next GPT will be better than current Claude, but next GPT vs. next Claude -- who knows? Opus is a beast, but way too expensive for me to use day-to-day. If the cost of that came down (by a LOT), I would already have a performance improvement, so no need to speculate about what's possible. I use o3-pro frequently, but not for programming.


I don't think any hiring requirement would involve experience with a specific AI because the skill of using AI is transferrable (but actually hard to acquire). It'd be like not hiring someone with extensive Google Sheet experience because your company runs on Excel... the difference between the two is trivial and easily learned, but knowing how to use a spreadsheet matters.
I'm talking about AI's performance at persuading people, e.g. persuading you to compare it to spreadsheets already, even though it was utterly and obviously counter-productive until sometime this year (and might still be counter productive in a proper randomized trial).

As far as skills in using it go... I think the most important skill right now is accurate determination of the bounds of its effective use (also see the +20% / -19% study). It's rather unusual for a tool - you don't get handed a hammer that says "screws, bolts, nuts, electrical connections, ..." and have to figure out you need to use it on nails.

In the longer term, I think it's a bit like a horse using a steam engine. If this thing actually produces net mechanical power, then the horse gets replaced soon. If it's not then it gets pared down to a wheeled carriage with the dead-weight removed.

The trillion dollar question of LLMs is if their trillions of parameters are just a form of storage, or there's some form of intelligence being built on that substrate which needs those matrix multiplications to "think" and those parameters to store "how to think". If its the former, as many (e.g. Yann LeCun) suspect, then extremely large models will not last as the state of the art solution.

Really depends on what you use them for. If you have random conversations with AI about everyday topics... I don't know, and don't care. But I can tell if one AI model writes code that works and one AI model writes codes that introduces new bugs. (The problem with "non-existent functions" to me seems solved with the use of mcp like Context7. No idea why this ever seemed like a huge problem: you can just feed documentation to AI, and they're literally designed to process text. Similarly, you can have AI document your codebase for another AI agent: way too long/detailed for a human, but AI will process it in a second. Makes good documentation more important, but fortunately, AI can also help with that.) I can tell if one model can clearly walk me through different analysis plans. And I can watch the videos and images that come out of these tools. I'm not claiming some objective benchmark of what's "good," but for various tasks, doing them with one model just takes less time and leads to better output than using a different model. If the "best" model changes, I don't have a problem with switching.
The field is moving so fast, even small models are ahead of where huge models were a year ago. This seems quite commodized to me.

Yes, except there's no reason to be cheaper than human labor. The Polish guy who gets paid $200k/year costs the company $400k/year with benefits, and he works 40 hours per week. The AI that replaces him and works 24/7 is worth $1.6m/year. No need to offer stock options, no extensive onboarding process, no risk that it takes trade secrets to a competitor who makes a better offer, and it gets better with a software update rather than extensive training. So $1.6m is a floor, not a ceiling.
This is precisely why I say its a bubble. The sort of magic that can live up to expectations you just laid out, would go full singularity.

The real thing, if you ran the contest for a week or a month and restrict Psycho and other contestants to 40 hours a week and AI do the full 168 hours a week... what would actually happen is that AI result would plateau. At somewhere well below state of the art for that sort of problems. People would keep climbing towards state of the art and then surpass it (probably not until after months).

It's not even so much that AI can't possibly do that in principle (although I suspect LLMs can't). It's that the AI which can do that - whose time actually adds up like this, without human involvement - would end things like stock markets pretty soon.

And nobody would be selling access to it; Microsoft would just do the Microsoft-in-the-1990s thing all over again and take over the entire software market. And entire "things that can be done with software market". And entire "things that can be done with robots they made" market.

And in reality, that programmer is probably making at least $400k, so you can double the figure.
I'm going with the world market for the figure. Most people don't want to move to silicon valley or some such to max out their salaries. And I'm assuming we are talking about AI as an individual contributor, while programmers in their 40s typically get paid a lot for their value apart from IC.

People aren't talking about internal uses publicly because at this point, that's a competitive advantage -- why would they disclose publicly how you can copy them? My success rate for research grant applications over the last year has been 100%, and I sure as hell don't tell other people how they can beat me on the next application. And if I don't do this, people employed for companies that compete for multi-billion dollar markets aren't going to be sharing either. People are sharing toy examples to try and signal that they have some expertise with this.
Well, the ante-AI state was that people were signaling their fitness to actually make a good use of grant money, by writing grant applications. AI is subverting this at the moment, even though I'm sure you think that better use of AI entitles you to the grant money.
 

dzid

Wise, Aged Ars Veteran
1,243
Subscriptor
Two AI stories from the last couple of days; I'm having a hard time deciding which is more disturbing:

People Are Becoming "Sloppers" Who Have to Ask AI Before They Do Anything

I suppose we already knew some segment of the population were going to become enthusiastic early adopters and probable casualties.

"... people who constantly use ChatGPT to do virtually anything have garnered the moniker of "sloppers."

For instance, Foster pointed to one viral video in which a man recounted going on a first date with a woman — but was surprised when she pulled out ChatGPT on her phone and asked it what she should order off the menu. ("There was no second date," he concluded.)
Of course, there are a few popular names for the same activity. They include:
  • 'botlicker'
  • 'second hand thinker' (this one sort of appeals to me)

Now on to the second, and I think decidedly more dystopian headline:

Man in Prison Gets Hired as Software Engineer at Silicon Valley Startup, Works Every Day From Cell

Now, I know these stories can be framed as those of redemption, but come on: this guy in his early 20s got a 15 to 30 year sentence for flipping some dark web drugs. Not a shining pillar of the community, maybe, but 15 to 30 is America's way to keep the private prison business raking in money.

The state wants a cut, too, though: they take 10% of his salary.

One could say it's the ultimate form of remote work. It makes me deeply uncomfortable.
 

dzid

Wise, Aged Ars Veteran
1,243
Subscriptor
This is a link to a post I made in the Observatory thread where I normally post things about recent research I find of interest. The reason I'm referencing it here is because it got me thinking about how LLM research is sucking all the oxygen out of the room and other AI/ML related projects or research get very little attention (like this one about its use investigating how the brain's hippocampus categorizes object information as an optimization strategy when integrating object, spatial and temporal information before/during memory storage).

I also comment a bit about how brute-force scaling appears to be a loser in terms of advancing the technology in any significant way. That may be OK for techbros trying to outdo one another in the big AI 'land grab', but for most of us and the environment, it sucks.

I've thought for a while now that if we're serious about really advancing AI, one part of that process has to be learning more about how our own brains work. And not only does research like this show us that more traditional non-LLM ML strategies have a place in subject-focused areas, the results could potentially feed into ongoing AI research.

Would that be a slower, more incremental strategy? Sure. But I also think that it would be to our (most people, anyway) benefit that reckless startup founders with dollar signs in their eyes had the brakes put on their operations. Any input or criticism is appreciated.
 
  • Like
Reactions: bjn

Skoop

Ars Legatus Legionis
32,884
Moderator
/// OFFICIAL MODERATION NOTICE ///

This is not a thread for AI news aggregation.

There's a technical thread in Programmer's Symposium for code talk. This thread is limited strictly to the economics and business model of the AI industry. That is narrow to make it appropriate for The Boardroom.

This other stuff is pretty much Lounge. Keep it topical, per piacere.
 
  • Like
Reactions: Pino90

dzid

Wise, Aged Ars Veteran
1,243
Subscriptor
/// OFFICIAL MODERATION NOTICE ///

This is not a thread for AI news aggregation.

There's a technical thread in Programmer's Symposium for code talk. This thread is limited strictly to the economics and business model of the AI industry. That is narrow to make it appropriate for The Boardroom.

This other stuff is pretty much Lounge. Keep it topical, per piacere.
Got it. My apologies. To be clear on my most recent post, is that acceptable in this thread? The research angle probably not, but I wanted to highlight the focused ML business opportunities getting sidelined or overshadowed as well as what seems to me a dead end for LLM scaling.
 
I think ultimately the technical details about LLMs are no longer even relevant (edit: that is, to the bubble / finances). I'll contrast it with the VR / "AR" bubble to explain why.

VR headset adoption could have met expectations, in some alternate universe where people crave having a screen strapped to their face. And had it met those expectations, we'd just have a multi billion dollar VR industry, in an otherwise recognizable world with stock markets.

AI expectations are about the technology itself, and the expectations are so high that any technology which meets them would rapidly result in "singularity" straight from science fiction (from which the expectations derive).

The middle of the road vision - no singularity, it's just a very useful tool - doesn't work out either. It is predicated on wilful suspension of all understanding of how markets work. Compiler vendors have never managed to capture even a tiny fraction of the highly paid programmer's time saved on not hand writing assembly. Automation of a formerly intellectual task simply devalues the task to nearly nothing - that is how it always has been. And yet for AI, enthusiastic supporters are driven to calculations that yield a huge multiple of what a programmer is paid a year.

Even (highly dubiously) taking OpenAI's contest performance entirely uncritically on face value, there is almost no money in that. A compiler vendor, too, could sponsor a contest where the top assembly optimization folks from AMD or NVidia (or, if it was in the past, Intel) have to translate a 2000 line C++ program to assembly in 10 hours (If they win against all odds, just make it 3000). And the compiler would beat them fair and square without any question about its ability to generalize. In literally a fraction of a second. And you can make absolutely didly squat from selling an optimizing compiler.
 
Last edited:

dzid

Wise, Aged Ars Veteran
1,243
Subscriptor
I think ultimately the technical details about LLMs are no longer even relevant. I'll contrast it with the VR / "AR" bubble to explain why.

VR headset adoption could have met expectations, in some alternate universe where people crave having a screen strapped to their face. And had it met those expectations, we'd just have a multi billion dollar VR industry, in an otherwise recognizable world with stock markets.

AI expectations are about the technology itself, and the expectations are so high that any technology which meets them would rapidly result in "singularity" straight from science fiction (from which the expectations derive).

The middle of the road vision - no singularity, it's just a very useful tool - doesn't work out either. It is predicated on wilful suspension of all understanding of how markets work. Compiler vendors have never managed to capture even a tiny fraction of the highly paid programmer's time saved on not hand writing assembly. And yet for AI, enthusiastic supporters are driven to calculations that yield a huge multiple of what a programmer is paid a year.

Even (highly dubiously) taking OpenAI's contest performance entirely uncritically on face value, there is almost no money in that. A compiler vendor, too, could sponsor a contest where the top assembly optimization folks from AMD or NVidia (or, if it was in the past, Intel) have to translate 2000 line C++ program to assembly in 10 hours (If they win against all odds, just make it 3000). And the compiler would beat them fair and square without any question about its ability to generalize. And you can make absolutely didly squat from selling an optimizing compiler.
Given the AI companies and their financial backers have staked so much on this vision, though, which seems so limited technologically, you're saying they're committed to the path they've started down with its associated hype train driving as much investment as they can eke out of it?

If they've no intention exploring other ways to improve the technology, what's stopping the Chinese or anyone else from doing so? I feel a sense of deja vu, because I feel much the same as I did when social media had run its initial explosive course: ok, this is clearly an immature technology with a poorly developed business ecosystem (for the public, anyway). We should spend some time doing further development and work on deploying a more mature product in a year or two.

But that of course didn't happen then, and it's not going to happen this time, I suppose.
 
Given the AI companies and their financial backers have staked so much on this vision, though, which seems so limited technologically, you're saying they're committed to the path they've started down with its associated hype train driving as much investment as they can eke out of it?

If they've no intention exploring other ways to improve the technology, what's stopping the Chinese or anyone else from doing so?
Presumably the same things that prevented anyone from building Terminator-esque Skynet back in 1980s. The whole notion may be silly. We don't know how to do it. And it is quite likely we simply don't have the computational power to pull it off, just as we didn't back in 1980s.

They are trying to loosely approximate human brains with multiply-adds. Except human brains do orders of magnitude more of those multiply-adds a second. From the entire past history of computing we know that various underpowered approximations only outperform humans in very limited domains.

I feel a sense of deja vu, because I feel much the same as I did when social media had run its initial explosive course: ok, this is clearly an immature technology with a poorly developed business ecosystem (for the public, anyway). We should spend some time doing further development and work on deploying a more mature product in a year or two.

But that of course didn't happen then, and it's not going to happen this time, I suppose.
I think it's rather different from social media as well. Social media was built entirely on proven technology - internet, databases, etc. It wasn't a technological project at all. And it worked, to the extent that people actually were into it (the way people were not into VR/AR).

LLMs are very different in that regard.

Picture a very strange gold rush where large, multi trillion dollar valuation gold mining conglomerates are buying very expensive shovels and digging holes, and then selling holes at loss, with the business model predicated on the idea that the greater fool, I mean, greater genius, will figure out how to extract value from the holes and will buy more holes and there will be an enormous demand for holes at last and then holes could be sold at profit.
 
Last edited:

dzid

Wise, Aged Ars Veteran
1,243
Subscriptor
Presumably the same things that prevented anyone from building Terminator-esque Skynet back in 1980s. The whole notion may be silly. We don't know how to do it. And it is quite likely we simply don't have the computational power to pull it off, just as we didn't back in 1980s.

They are trying to loosely approximate human brains with multiply-adds. Except human brains do orders of magnitude more of those multiply-adds a second.


I think it's rather different from social media as well. Social media was built entirely on proven technology - internet, databases, etc. It wasn't a technological project at all. And it worked, to the extent that people actually were into it (the way people were not into VR/AR).

LLMs are very different in that regard.

Picture a very strange gold rush where large, multi trillion dollar valuation gold mining conglomerates are buying very expensive shovels and digging holes, and then selling holes at loss, with the business model predicated that the greater fool, i mean, greater genius, will figure out how to extract value from the holes and will buy more holes.
It's not that I'm comparing the technologies directly. It's that both of them have potential, and a great deal of that potential I feel has been squandered or at least not realized in the case of social media, and in fact has been detrimental to society in some ways.

I'm now getting the same sort of feeling with LLMs. A singular focus on grabbing market share with little regard for the myriad ways the technology is not yet ready, and no real efforts to build up new capability using, among other things, knowledge of our own brains, just brute-force scaling that cannot address underlying shortcomings.

The greater fool model you describe, I can understand but not respect.
 
  • Like
Reactions: blindbear

w00key

Ars Tribunus Angusticlavius
7,352
Subscriptor
What I see above is a bunch of blahblah from someone who hasn't invested time and a tiny bit of money, sub $200, into figuring out what the state of the art is in AI right now.

Do that and you will see enough niches to justify investments. For office work, it will be just as necessary as an Office 365 or Google Workspace subscription. For development work, even at current level of capability, it is easily worth $20-200 a month depending on how much you work with it.

For the last claim, see the thread in the programmer's symposium. We are all still getting used to these new tools, from multiple vendors, and not everything is a success, but given realistic expectation and some skill from the one driving the bus, it's a very major time saver and significant improves output quality. Most of us are still on the first month of usage and it's a paradigm change, much bigger than a new programming language or IDE, of course day / week 1 isn't all sunshine and rainbows.


But is $100/month/developer going to cover the investments, or is this an absurd overinvestment like laying fiber around 2000, so much that even 20 years later, there is zero reason (except on the ocean floor) to add new ones, that remains to be seen.
 
What I see above is a bunch of blahblah from someone who hasn't invested time and a tiny bit of money, sub $200, into figuring out what the state of the art is in AI right now.

Do that and you will see enough niches to justify investments. For office work, it will be just as necessary as an Office 365 or Google Workspace subscription. For development work, even at current level of capability, it is easily worth $20-200 a month depending on how much you work with it.
I was primarily responding to Soriak's 1.6 million dollars per year style of estimation.

$20 to maybe $200 per month is far more realistic, but it is too little in comparison to the money being spent on the bubble.

And ultimately - how much do you pay per month for your compiler? Your IDE? Your syntax highlighting? And a compiler clearly saves a lot of labor compared to writing in assembly.

This is an AI bubble thread, financials and such.

For the last claim, see the thread in the programmer's symposium. We are all still getting used to these new tools, from multiple vendors, and not everything is a success, but given realistic expectation and some skill from the one driving the bus, it's a very major time saver and significant improves output quality. Most of us are still on the first month of usage and it's a paradigm change, much bigger than a new programming language or IDE, of course day / week 1 isn't all sunshine and rainbows.
Let's just pretend for the sake of argument that I agreed that it is as effective as not writing in assembly (I disagree, and for that matter I doubt you believe that it is quite as significant as not writing in assembly, but it isn't relevant).

The effectiveness of not writing in assembly did not create a trillion dollar compiler industry. That not writing in assembly could make a $200 000 / year programmer 5x or more productive, did not mean that there was some 1 million dollar per developer per year surplus that compiler vendors could capture.

And yet the AI bubble is predicated on that imagined surplus.
But is $100/month/developer going to cover the investments, or is this an absurd overinvestment like laying fiber around 2000, so much that even 20 years later, there is zero reason (except on the ocean floor) to add new ones, that remains to be seen.
2 million developers paying $100 per month is only $2.4 billion per year. Minus expenses. That does not come remotely close to justifying the investments (high tens of billions of dollars).
 

w00key

Ars Tribunus Angusticlavius
7,352
Subscriptor
And ultimately - how much do you pay per month for your compiler? Your IDE? Your syntax highlighting? And a compiler clearly saves a lot of labor compared to writing in assembly.
It depends on the offering. Now, around zero, for the tools I use. If I was a serious C/C++ developer for the Windows platform, a VS Pro or Enterprise license at 100-500 a month is an investment.

Adobe all product pack at $70 is another subscription I have.

If Claude, Gemini asks for $200, and it saves 10 hours of time, it's a 3x+ ROI and fine.


2 million developers paying $100 per month is only $2.4 billion per year. Minus expenses. That does not come remotely close to justifying the investments (high tens of billions of dollars).

It's ~20M x ???, can be 200, can be 2000. Who knows. People pay for VS Enterprise and the most enterprisey SAAS tools, there's a market for everything.

People on the other thread are further along than I am, orchestrating / directing multiple agents at once, that scales up usage and revenue per developer.


And it's not just plain developers but also every office worker, which is a much better group. For those, it isn't demonstrated how much faster you work and I have no first hand experience with it, so I will refrain from wild guesses. But even the TAA of developer tools is large enough to become a serious industry on its own.

Just a back of the envelope calculation, 20 million users x 200 a month = 48B a year industry. And there's no saying that's the cap, people regularly run out of tokens on Claude Max 20x or switch to API PAYG billing for a 10x+ higher bill, no limits as long as your card authorizes.


https://thenewstack.io/claude-code-...opic-launches-enterprise-analytics-dashboard/ shows some Enterprise only features, and that's PAYG only, per token. With my usage, it would be 500/month worth of revenue unless I get a big discount.
 
Last edited:
  • Like
Reactions: Pino90

Pino90

Ars Scholae Palatinae
1,045
Subscriptor
Just a back of the envelope calculation, 20 million users x 200 a month = 48B a year industry. And there's no saying that's the cap, people regularly run out of tokens on Claude Max 20x or switch to API PAYG billing for a 10x+ higher bill, no limits as long as your card authorizes.
Not only do I burn through tokens, but I'd happily pay more for increased limits. If there was a €500 + VAT plan with double the tokens, I'd buy it immediately and probably upgrade all our licenses.

I don't think I'm the only one limited by token availability.

Put it another way: even if the current MAX plan cost €1000 (a 5x price increase), it would still be incredibly cost-effective.

It performs like a junior developer but faster and better, requiring fewer iterations and less communication. It needs no mentoring, creates no managerial overhead, requires no meetings, and has no bad days or holidays.

We use AI tools for meeting notes, summaries, product manuals, and user documentation. I haven't written an email in months. I simply tell Claude what I want to communicate, and it handles it professionally. Then there's the actual coding, which I have documented extensively in the other threads (while true and adventures coding with AI).

It saves so much time for actual useful work.

The market potential is enormous, and these tools are becoming remarkably capable. I think that technology has matured faster than pricing models have adjusted. This temporary market inefficiency likely won't persist as AI companies optimize their revenue strategies.
 

Pino90

Ars Scholae Palatinae
1,045
Subscriptor
What I see above is a bunch of blahblah from someone who hasn't invested time and a tiny bit of money, sub $200, into figuring out what the state of the art is in AI right now.
This is so true. People formed opinions based on the first mediocre versions of ChatGPT and keep repeating the same talking points as if the technology hasn't evolved.

But it has evolved. It's become dramatically better and much cheaper.

Yet there are still people on their high horses delivering verdicts while the world is rapidly changing right before their eyes, completely unnoticed by them.

They have decided they don't like it, and the rest of the world should adjust to their desires. But it's not happening.

The annoying part is that there is not even the slightest effort to learn to use the tech and to understand it. It's the same talking points over and over and over (plagiarism anyone?).

It's honestly astonishing, and frankly, it's getting boring.

The market is there, and it's enormous. No one is saying that the technology is perfect. It has a lot of issues. But it's evolving and it's getting better. If people need to criticize it, it'd be nicer to actually read an informed opinion instead of the same, perpetual blah blah blah.

Sorry for the rant.
 
  • Like
Reactions: hanser
This is so true. People formed opinions based on the first mediocre versions of ChatGPT and keep repeating the same talking points as if the technology hasn't evolved.

But it has evolved. It's become dramatically better and much cheaper.
People just like you had been saying that about every previous version starting from GPT 3.5 . The same story, if only these ignorant luddites would check the thing that came out a few months ago, because it is completely unlike anything that came before.
Yet there are still people on their high horses delivering verdicts while the world is rapidly changing right before their eyes, completely unnoticed by them.

They have decided they don't like it, and the rest of the world should adjust to their desires. But it's not happening.
Adjust to our desires? Sorry, but what are you even on? As we speak, the world is being adjusted to your desires - additional CO2 being pumped into the atmosphere, that doesn't need to go there, a trillion dollars getting redirected towards enormous datacentres. Government health experts have been replaced with AI. And so on and so forth.

The annoying part is that there is not even the slightest effort to learn to use the tech and to understand it. It's the same talking points over and over and over (plagiarism anyone?).
I tried the tech, multiple times, and each time it was simply nowhere near as good as its fans insisted at the time.

It is this gap that makes it a bubble. If it is useful but the expectation is that of a technological singularity around the corner, it is still a bubble.

If I were to guess, I think a milder version of the same phenomenon that makes people propose to AI girlfriends, can happen to susceptible programmers in professional context.
It's honestly astonishing, and frankly, it's getting boring.
It's equally boring when for years now, you (plural you) keep making the same argument that "yeah the previous versions sucked but you got to try the latest one".

The market is there, and it's enormous. No one is saying that the technology is perfect. It has a lot of issues. But it's evolving and it's getting better. If people need to criticize it, it'd be nicer to actually read an informed opinion instead of the same, perpetual blah blah blah.

Sorry for the rant.
I have an informed opinion - I tried it multiple times (last that I tried was Gemini 2.5 pro), on insistence of - not literally you, but people just like you - who had this same reaction you are having, but to the previous versions which were as even you would agree utter garbage. What makes you different from them?

What I go with these days is opinions of programmers I knew from before the AI and whose abilities I respect. Some of them use AI and find it to be marginally useful (with big caveats). Many don't use AI. When a significant fraction of them find it to be useful, will be a time to see if it is useful in a specific context.
 

hanser

Ars Legatus Legionis
42,256
Subscriptor++
This week I used my $20 Personal plan with Claude Code and did 4 separate tasks in domains and repositories I am unfamiliar with because I wanted to. It solved problems our customers care about, and it took me maybe... 2 hours to do all 4 tasks spread out throughout the week. It would have taken an order of magnitude longer to do those tasks without CC, if I could do them at all, which I'm not confident I could have.

That alone was worth $500/mo or more.

I have been a professional developer with that title for over 10 years, and a developer without that title for an additional 5. I have shipped software used by millions of people before the advent of AI, and shipped software worth over $10bn in manufacturing efficiencies, also before the advent of AI.

Doing niche scientific programming that LLMs aren't good at, and then generalizing that experience to "LLMs aren't good at programming" is dumb. LLMs can and do very well at LOB applications. I don't want to hear any no-true-Scotsman fallacies about that not being real programming, because companies pay programmers billions of dollars a year to do just that. As it happens, LLMs aren't great at GPU programming, either, because training data is sparse. But I haven't seen CUDA experts proclaiming that LLM as programming aids are overrated.

Your continual naysaying is getting boring, and it's frankly uninformed. You post reasonable takes on a variety of topics. On this topic, your takes aren't reasonable, and shouldn't be taken seriously by laypeople. Go post that crap in the programming forum, where it belongs.

It's reasonable to assert that the economics aren't there or whatever, or that valuations are out of whack, or any number of other things, but keep the programming assertions to the programming forums. (And I'll do the same.)
 
Last edited:
It's reasonable to assert that the economics aren't there or whatever, or that valuations are out of whack, or any number of other things,
That's what I keep saying. That it is worth $500 to you, isn't a measure of how much cash can be extracted for it. It's clearly a very competitive market. (Then the fanboys went on their standard "you must not have tried it! you luddite!" shtick, then I respond with I tried it and it wasn't good at what I do, etc)

Look around and you'll probably see a hundred things without which you wouldn't be able to work. If a common key on your keyboard stopped working and you couldn't get a new keyboard, that would cause hundreds of dollars in lost productivity, too.

Where I see it going is that in a year or two, far cheaper models - maybe even local ones, although that is less clear - will get about as good as the current top of the line models. And will have much faster response times, which can make them preferable even if "not as good" in some sense.

The industry has in fact hypothesized an answer to that conundrum - they suggest that in a year or two their top of the line models will replace you entirely, while smaller models will still fall short of doing so.

but keep the programming assertions to the programming forums. (And I'll do the same.)
File, let's adopt "its as good as not writing in assembly" for the purpose of discussion of the economics. edit: and for office work perhaps "as good as not using a non powered typewriter" or something.
 

mailseth

Ars Scholae Palatinae
1,395
Subscriptor
As it happens, LLMs aren't great at GPU programming, either, because training data is sparse. But I haven't seen CUDA experts proclaiming that LLM as programming aids are overrated.
Funny you should mention this because GPU programming was my first effective use of ChatGPT. In my GPU work LLMs have thus-far been effective at:
  • Building pybind /cmake bindings around CUDA
  • Writing extensive documentation for GPU-naive Python-using coworkers
  • Validating examples in said documentation
  • Validation of input parameters from python / C++ with a useful error messages passed into python-space
  • Debugging frameworks / test scripts
  • Extensive test cases / validation
  • Translating legacy OpenCL to CUDA
  • More efficient use of asynchronous / CuPy data streams
  • and correctly one-shotting an on-GPU binary search routine I had previously written with a subtle bug.
You could make the case that most of this isn’t actually GPU programming, which is technically correct, but they are all things that would otherwise distract from the effort. All these things would have taken much more time without the AI assist, or not happened at all. Months of work took a few weeks.
 
Last edited:
  • Like
Reactions: Pino90