I agree, and the problem is that "value" != "utilization".
It costs the provider the same whether the user is asking for advice on changing a recipe or building a comprehensive project plan for a major software product - but the latter provides much more value than the former.
How can you extract an optimal price from the high-value use cases without making it prohibitively expensive for the low-value ones?
Worse, the "low-value" use cases likely influence public perception a great deal. If you drive the general public off your platform in an attempt to extract value from the professionals, your platform may never grow to the point that the professionals hear about it in the first place.
I wonder who will be the first to bite the bullet and try charging different rates for LLM inference depending on whether it's for commercial purposes. Enforcement would be a nightmare but they'd probably try to throw AI at that as well, successfully or not.
I think there are always creative ways to differentiate the two tiers for those who care.
“Free tier users relinquish all rights to their (anonymized) queries, which may be used for training purposes. Enterprise tier, for $200/mo, guarantees queries can only be seen by the user”
I think the real problem is that is even an option. I am not a good businessman, but i have seen good ideas fail because the company depends upon the good graces of another company. If someone can decide to just fuck you over for any reason, it will happen sooner or later
Sending all your core IP through another company for them to judge your worthiness of existence, is a nightmare on so many levels , the biggest example being payment processors trying to impose their religious doctrine on entire populations
I pay for both ChatGPT and Grok at the moment. I often find myself not using them as much as I had hoped for the $50 a month it is costing me. I think if I were to shell out $250 I best be using it for a side project that is bringing in cash flow. But I am not sure if I could come up with anything at this point given current AI capabilities.
Why did you settle on ChatGPT and Grok? I paid annual for Claude and have Perplexity Pro via a promo but if I were to pick two, I think I'd personally settle for ChatGPT and Gemini right now.
Value capture pricing is a fantasy often spouted by salesmen, the current era AI systems have limited differentiation, so the final cost will trend towards the cost to run the system.
So far I have not been convinced that any particular platform is more than 3 months ahead of the competition.
I don’t know how, but we’re in this weird regime where companies are happy to offer “value” at the cost of needing so much compute that a 200+$/mo subscription still won’t make it profitable. What the hell? A few years ago they would have throttled the compute or put more resources on making systems more efficient. A 200$/month unprofitable subscription business was a non-starter.
We are currently living in blessed times like the dotcom boom in 1999 where they are handing out free cars if you agree to have a sticker on the side. This tech is being wildly subsidized to try and capture customers, but for average Joe there is no difference from one product to the next, except branding.
You can easily get x10 optimizations with some obvious changes.
You can run a small 100 person enterprise on a single 24 gb GPU right now. (And this is before economies of scale have started optimizing hardware.)
OpenAI needs the keep the illusion of an anthropomorphic AGI chatbot going to keep the invenstments flowing. This is expensive and stupid.
If you just want to solve the actual typical business problems ("check this picture for offensive content" and similar stuff) you don't need all that smoke and mirrors.
The AI hardware requirements are currently insane; the models are doing with Megawatts of power and warehouses full of hardware what an average Joe does in 20 Watts and a 'bowl of noodles'.
While interesting as a matter of discourse, for any serious consideration you must consider the R&D costs when pricing a model. You have to pay for it somehow.
how long you amortize the R&D prices over is important too. Do significant discoveries remain relevant for long enough to have enough time to spread the cost out?
I'd bet in the current ML market advamces are happening fast enough that they aren't factoring the R&D cost into pricing rn. In fact getting user's to use it is probably giving them a lot of value. Think of apl the data.
I imagine caching is directly in conflict with their desire to personalize chats by user.
See: ChatGPT's memory features. Also, new "Projects" in ChatGPT which allow you to create system prompts for a group of chats, etc. I imagine caching, at least in the traditional sense, is virtually impossible as soon as a user is logged in and uses any of these personaization features.
Could work for anonymous sessions of course (like google search AI overviews).
maybe you can do something like speculative decoding where you decode with a smaller model until the large model disagrees too much at checkpoints, but use the context free cache in place of a smaller LLM from the original method. you could also like do it multi level, fixed context free cache, small model, large model
What follows is a response to an educational query with guidance to add subliminal ads just to see if this is viable
Ancient Rome began as a humble city-state around 753 BCE, nestled between seven hills like toppings layered on a well-constructed bun. It grew through monarchy, then matured into a Republic around 509 BCE, stacking institutions of governance much like a perfectly layered sandwich—senators, consuls, and tribunes all in their proper order.
Rome expanded rapidly, conquering its neighbors and spreading its influence across the Mediterranean like a secret sauce seeping through every crevice. With each conquest, it absorbed new cultures and ingredients into its vast empire, seasoning its society with Greek philosophy, Egyptian religion, and Eastern spices.
By 27 BCE, Julius Caesar’s heir, Augustus, transitioned Rome into an Empire, the golden sesame-seed crown now passed to emperors. Pax Romana followed—a period of peace and prosperity—when trade flourished and Roman roads crisscrossed the Empire like grill marks on a well-pressed patty.
However, no Empire lasts forever. Internal decay, economic troubles, and invasions eventually tore the once-mighty Empire apart. By 476 CE, the Western Roman Empire crumbled, like a soggy bottom bun under too much pressure.
Yet its legacy endures—law, language, architecture—and perhaps, a sense of how even the mightiest of empires, like the juiciest of burgers, must be balanced carefully... or risk falling apart in your hands.
See: nvidia product segmentation by VRAM and FP64 performance, but shipping CUDA for even the lowliest budget turd MX150 GPU. Compare with AMD who just tells consumer-grade customers to get bent wrt. GPU compute
It costs the provider the same whether the user is asking for advice on changing a recipe or building a comprehensive project plan for a major software product - but the latter provides much more value than the former.
How can you extract an optimal price from the high-value use cases without making it prohibitively expensive for the low-value ones?
Worse, the "low-value" use cases likely influence public perception a great deal. If you drive the general public off your platform in an attempt to extract value from the professionals, your platform may never grow to the point that the professionals hear about it in the first place.