Hacker News - OpenAI Employee: GPT-4 has been static since March

story

OpenAI Employee: GPT-4 has been static since March (twitter.com)

272 pointsbehnamoh posted a year ago

304 Comments:

dang said a year ago:

Recent and related:

Ask HN: Is it just me or GPT-4's quality has significantly deteriorated lately? - 36134249 - May 2023 (711 comments)

I think we don't notice our expectations have gone up, and we don't notice that we remember the hits and then expect all hits.

We didn't notice the misses at first, because it's what we expected to begin with, and we very strongly noticed the hits because they were unexpected. Now we notice the misses and expect the hits.

jstarfish said a year ago:

No, this is peak corporate gaslighting, OpenAI being the paragon of integrity and all. Nobody should trust a goddamn thing that comes out of their mouths-- or their product.

It's no coincidence the flat-fee service is visibly crippled, while per-request API users are not reporting any difference. (edited)

Ignoring everyone here, look at the other "hacker" groups-- like the jailbreaking community. They have a lot to say about recent changes that coincide with their hacks not working. All of a sudden, with OpenAI supposedly changing nothing, technical bypasses just stopped working. OpenAI changed nothing, so this must be deus ex machina.

I'm not even jailbreaking it but the results I get for simple code requests through the web UI have become unusable garbage. It puts less effort into responses than an unpaid-and-overworked intern. Others here report the same. The current "iteration" seems hellbent on terminating conversations as quickly as possible once they stray from the explicit scope of the original topic and is almost as hostile to fixing its own errors as it is to endorsing eugenics, whereas in the beginning it would humor every idle thought I threw at it in long conversations. You can literally see this reflected in the logs they forced retention of. It's acting like a customer support rep desperately trying to end a call before it exceeds a call-time quota.

This particular current workflow seems like it lends itself to better organization of training data-- conversations are what the title says they're about. It also seems like it lends itself to anti-jailbreaking because pretexting it with irrelevant information forces a change of scope-- and a summary termination.

But OpenAI says they changed nothing, so rather than one guy lying without consequence, a community of professionals using and abusing the tool must all be victims of rhetorical fallacy? Nobody's qualified to reverse-engineer corporate bullshit anymore without being infantilized...

(Liars running a black box-- what could possibly go wrong? We need regulation!)

huevosabio said a year ago:

I think this is the right way to look at it: we are very quick at updating expectations.

The first flight is magic, the nth one is a chore.

ren_engineer said a year ago:

I think it's worth noting the response explicitly said the paid API model hasn't been changed, so ChatGPT could have been changed through many different ways outside the core model

throwuwu said a year ago:

The refusals are new. I’ve gotten a few myself recently. I’ve never seen them before. They’re not moderation related and they always include some text about the task being too complex.

irthomasthomas said a year ago:

He specifically singles out the API and never answers if chatgpt is same. Most people are probably using the chatgpt interface, which seems to have more alignment training.

jonas21 said a year ago:

Yeah, it's kind of the same way people have been saying Google search has been going downhill for years, but compressed into a much shorter time frame.

skepticATX said a year ago:

Genuinely asking: have our expectations gone up, or have we started thinking about it in a more critical and realistic way after the initial glean has worn off?

IIAOPSW said a year ago:

Actually the opposite. The quality of that other big hit (Dalle) has gone noticeably down IMO. I feel like they have over trained it, trying to compensate for the initial shortcomings, and in the process of trying to solve those by throwing more and more data at it the machine has forgotten how to deviate from sterile stock photos and do anything interpretive. For example, I asked it for "Mother of Exiles" (a reference to the inscription at the base of the Statue of Liberty), in return it showed 4 generic stock photos of female characters, sometimes with a visual prop to vaguely insinuate motherhood but certainly nothing inspiring and certainly nothing to do with exiles. Compare this to the early days, when people were throwing natural language statements at it like "power bottom dad for the people" and it was spitting back results that felt like it truly understood what any of those words meant.

reaperman said a year ago:

I don’t think it helps that the bottom of the ChatGPT-4 dialog window says:

ChatGPT may produce inaccurate information about people, places, or facts. ChatGPT May 24 Version

I was shocked that an employee was claiming they haven’t changed the model since March, given that this date changes every few weeks. But I finally read the link it goes to, and apparently this date is just the frontend version. Which is a really weird thing to include as part of a disclaimer.

I wonder how many other people think this is the most recent model release date.

doctoboggan said a year ago:

I definitely agree with this take. I had similar feelings to everyone in yesterday's thread complaining about the models failures but I agree that its just our shifting expectations and not a model update.

ChatGTP said a year ago:

I'd frame it another way, many, many people fell for the hype, immediately.

ke88y said a year ago:

Yeah. The recent GPT hype reminds me a lot of a Reply All episode on old school text generation using Markov Chains. I think it was haikus. Sooo much ascribed meaning, and equally ignoring the "misses".

avereveard said a year ago:

This is not about misses, this is about agents using the API breaking down because gpt4 is no longer able to follow the response structure.

cyanydeez said a year ago:

People just overvaluing the quantity.

throw1623 said a year ago:

This. I've been using it daily and nothing has changed. It's still as great as day 1.

The audience of HN is, as Taleb would say, intellectuals yet idiots. They have trouble measuring change and are prone to hyperboles. Some still try to minimize the impact ChatGPT will have, while focusing on bullshit like 'hallucinations', or nitpicking about the quality of the code and so on. Can't see the forest from the trees.

If you're looking for intelligent discussion, look elsewhere.

pierat said a year ago:

"GPT-4 hasn't gotten worse since March" can be 100% true at the same time OpenAI puts more rules and limiters keeping more interesting answers from being said.

Ive noticed it quit giving as detailed answers and as thorough. It's also refused to do more complex programming where it used to accept those questions.

Being artificially limited by OpenAI can still be done without it getting "worse". But it effectively is worse for us users.

kikokikokiko said a year ago:

I can hardly wait to see when, in the near future, every new LLM may become almost useless.

The OG models were trained on real world, human generated content (for the most part at least). Starting in 2022 the cost of automatic generating "human sounding enough" text has gone to such low depths that I expect it to be pretty much impossible to avoid training any model on text already generated by a LLM.

What will the result of this feedback loop be, I can't tell. It will probably be just an even more generic, corporate speak, bland sounding bla bla bla than we get today, and the level of hallucinations may get even worse.

In a way it makes me happy to imagine that the most dangerous tech humanity ever invented may itself be it's own main obstacle to future refinements.

H8crilA said a year ago:

His point is that it literally didn't change, that includes the safeguards (which are a part of the model).

devinprater said a year ago:

No, it's not gotten worse, just more constrained and... fuzzy and blurry.

jiggawatts said a year ago:

I look at everything someone from OpenAI says as if a politician is saying it. Sam Altman especially is fond of statements that are deceptive but technically true. His employees appear to be following his lead.

GPT 4 isn’t ChatGPT 4, which is what most people use.

There is also the “system prompt”, which is also likely to be changing but not part of GPT 4.

Etc…

sebzim4500 said a year ago:

I don't see this as being political doublespeak. He is very clearly just talking about the API.

CTmystery said a year ago:

I haven't followed personas closely. Do you have examples of Sam Altman saying deceptive but technically true things?

Sunhold said a year ago:

What is "ChatGPT 4"? ChatGPT Plus can use GPT-4 and free ChatGPT uses gpt-3.5-turbo.

greatpostman said a year ago:

Recent research showed that RLHF/censoring the model hurts the performance of the model. This is intuitively obvious, censorship isn’t real, the data is (moral issues aside). So it hurts the integrity of the weights. The future is open sourced uncensored models, capitalism will demand the high performance. There’s a huge discussion on it on Reddit right now:

https://www.reddit.com/r/MachineLearning/comments/13tqvdn/un...

Spivak said a year ago:

People keep mixing up alignment as in do what I actually asked and weird moralizing.

You will pay the alignment tax no matter what if you want a model that actually does things. It's the reason why Llama models do better with stream of thought prompting where you can ask GPT questions and it won't try completing the question.

You can, and probably should, align some semblance of morality and ethics for user-facing models that do things. It's customer service voice but for AI. If what you want out of a model is a vague mirror of humanity or maximum smarts at the cost of needing more detailed prompts then yeah, this stuff probably annoys you.

Finding a balance between a model that gives the outputs humans actually want and the pull that training data has on the rest of the model making it stray from the theoretical "best" output is hard.

ShamelessC said a year ago:

Just fyi this research is found in the original GPT-4 technical report and Microsoft’s GPT-4 assessment. There wasn’t an attempt to hide that discovery or anything like that.

haolez said a year ago:

Maybe humans are like that as well? :)

two_in_one said a year ago:

It shouldn't be limited to censoring. Fine tuning is done on small set of some domain specific samples. Surely that results in gradual 'forgetting' of the already learned data from big set. If run for long time model will relearn on the tuning set, and overfit most likely.

mschuster91 said a year ago:

> The future is open sourced uncensored models, capitalism will demand the high performance.

No. The future is ML built on more than just Wikipedia, Github code and Reddit hot takes. Or, sarcasm aside, GIGO: garbage in, garbage out - when you don't take care what datasets you train any kind of model on, you're bound to get some surprises if something unexpected comes along. Be it the infamous "racist soap dispenser" or the Google (?) image classifier that made the rounds here just a day or two ago which had a safeguard because it kept confusing Black people with gorillas.

Preventing this kind of harmful content isn't censorship - it's after-the-fact compensation for bad training (and the tendency of 4chan and other trolls to use discriminatory content generation as a weapon, just remember what they did to Microsoft Tay).

That is where the money is: curated training datasets. Everyone and their dog can train a ML model from scratch, all you need is money and cloning a few more-or-less-broken Github repositories for that. But acquiring a high-quality dataset as a foundation? That costs real money to create.

chatmasta said a year ago:

My experience using GPT-4 for coding is that it's got the knowledge and skill of a senior engineer, with the high maintenance of a junior engineer. You can get it to output small sections of quality code, but the amount of prodding it takes to piece it all together means you may as well have spent the time writing it yourself. But the future of GPT as a coding assistant is definitely bright. It just needs more chaining, so I feel less like its assistant, asking it to come up with prompts and then pasting them back to it after iterating on the code.

alfalfasprout said a year ago:

Those aren't very good senior engineers then lol. GPT-4 has the coding skills of an overconfident junior engineer and little else.

I agree the future of GPT-X models as coding tools is bright. But for actually doing engineering work outside coding (or even delicate changes in an existing code base) much less so.

furyofantares said a year ago:

GPT-4 has incredible breadth and no depth.

It's an excellent complement to a strong programmer, who will have incredible depth -- and may have breadth relative to other coders, but will be very narrow in the whole space of programming.

Just don't use it for things you're already an expert on, except perhaps for starting out / bypassing boilerplate.

dontupvoteme said a year ago:

The main thing it's lacking is a client-side merge/error-detection engine which combines all of the previous outputs.

Anything over ~100LoC and it gets sloppy including it all in the code block. Which is fine, because I don't need it in your context window if that bit is working and you're not relying on it.

Though I have gotten outputs up to nearly 250 lines from it..

Automatically feeding back code details + Traceback messages also fixes errors decently often enough.

Mizoguchi said a year ago:

Agree. However I noticed an unintended benefit of using GPT for coding is that it helps you think carefully about the problem you are trying to solve when writing the prompt.

Our_Benefactors said a year ago:

It’s still way faster than I could hope to be. It’s a true refactoring machine with 0 mental effort. “Here is this code block, make it do this other thing, add this feature and argument and error condition” etc. It augments the amount of output that’s possible to achieve at any seniority level.

moffkalast said a year ago:

> so I feel less like its assistant

Ah so I'm not the only one who feels this way lol. It's as brilliant as it is dumb.

rngname22 said a year ago:

It's Einstein with dementia/alzheimers.

throwuwu said a year ago:

They just put out a blog post about how they’re doing exactly that by rewarding chain of thought style reasoning.

jorblumesea said a year ago:

Generative AI will not be able to do that in any true sense. Not the way you're thinking about it.

naiv said a year ago:

For me it is not even the code per se but how it helps me with naming database tables and properties when I describe the broader scope.

The future is more than bright, especially with a new model that is more current.

Offtopic but just today I was aksing it for the most current versions it knows of right now:

Python: 3.9

JavaScript: ECMAScript 2021 (ES12)

Java: 16

C++: C++20

C#: 9.0 (.NET 5.0)

Ruby: 3.0.0

Swift: 5.4

Go: 1.16

Rust: 1.51.0

TypeScript: 4.3

PHP: 8.0

Kotlin: 1.5.0

Scala: 3.0

R: 4.0.5

Perl: 5.32

And frameworks:

Python:

Django: Version 3.2

Flask: Version 2.0.1

Pyramid: Version 2.0

TensorFlow: Version 2.6.0

PyTorch: Version 1.9.0

JavaScript:

Node.js: Version 16.9.1

Express.js: Version 4.17.1

React.js: Version 17.0.2

Angular: Version 12.2.4

Vue.js: Version 3.2.6

Java:

Spring Framework: Version 5.3.9

Hibernate: Version 5.4.32.Final

Struts: Version 2.5.26

PHP:

Laravel: Version 8.54.0

Symfony: Version 5.3.6

CodeIgniter: Version 4.1.4

Ruby:

Ruby on Rails: Version 6.1.4

Sinatra: Version 2.1.0

C#:

.NET Core: Version 5.0

ASP.NET: Version 5.0

Entity Framework Core: Version 5.0

PaulHoule said a year ago:

People have just fallen out of love with it and are realizing how it really isn't that good.

Sorry "prompt engineers" but papers on arXiv show that when you give it fairly sampled problems it struggles to get the right answer more than 70-80% of the time. When you are under its spell you will keep making excuses but when you are looking at it objectively you'll realize the emperor is naked.

If you give it very conventional problems it seems to do better than that because it is a mass of biases and shortcuts and of course it will sentence Tyrone to life in prison because it's a running gag that "Tyrone is a thug"... That's how neurotypicals think and no wonder why many of them think ChatGPT is so smart... It mirrors them perfectly.

vsareto said a year ago:

>When you are under its spell you will keep making excuses but when you are looking at it objectively you'll realize the emperor is naked.

It's not perfect, but you have to admit the emperor is at least wearing a thong. It is the second most intelligent thing on the planet at creating text, even with its flaws. Putting that accomplishment in league with naked emperors is astoundingly biased.

mrbombastic said a year ago:

I don’t know, I think there is certainly a lot of hype but it is still pretty damn good. I used it every day for a wide variety of coding tasks and more often than not it is correct. As in compiles, produces correct output for inputs, and mostly writes reasonable code. Has it made software dev obsolete like some maximalists said it would? definitely not, but it seems equally hyperbolic to say the emperor has no clothes, it is undeniably a useful tool to me.

ThrowawayTestr said a year ago:

I used it to help me write an email begging for my job back and it was pretty helpful

BulgarianIdiot said a year ago:

I very much doubt the tweet in question is of someone who loved GPT-4 yesterday, and has "fallen out of love" with it starting today.

For someone so dismissive of "neurotypicals" you did the neurotypical thing and drop a hot take before clicking the link.

mustacheemperor said a year ago:

In the thread yesterday it was brought up that if you use the API it feels considerably less hamstrung than the ChatGPT client, and this tweet seems to fit the assumption that the ChatGPT product is being tuned or governed differently from the API.

>The API does not just change without us telling you. The models are static there.

This reads to me as specifically indicating the models are not static elsewhere, ie, in ChatGPT.

jonplackett said a year ago:

It feels less hamstrung on output, but is instead hamstrung by being incredibly slow.

GPT-4 via API will sometimes take 30 seconds + to respond to simple questions without any chat history, where through Chat GPT it will give you near enough instant replies only slightly slower than 3.5 turbo.

dangerlibrary said a year ago:

A couple observations from attempting to use both GPT-3.5 and GPT-4 via the web interface for coding tasks:

- The model's ability to respond accurately drops drastically when asked questions of the form "is there a different way to accomplish X, using Y?" or "is there a way to accomplish X that runs in O(log(n)) time instead?" Example: I wanted to upsert an integer value using a SQLite db using "INSERT ... RETURNING..." ChatGPT repeatedly told me that sqlite doesn't support "RETURNING" (it does, since March 2021). It insisted I would need two DB round trips from my application to accomplish this. When asked "can this be done in one round trip, instead?" it repeatedly wrote code that would return the number of rows modified instead of the integer column value.

- ChatGPT's limited standard library knowledge means that the solutions it produces, even when correct, are often lower-level and less idiomatic. Problems that would be trivially solved with e.g. a Java.String.replaceAll or .codePointCount will instead loop over each character, often splitting the string into an intermediate array and implementing special cases for first/last character edge cases. The code winds up being mostly correct, but also (for lack of a better word) weird. No human I've ever worked with would do things the way ChatGPT sometimes does, which means the code will likely be much harder to maintain and debug over time.

jstarfish said a year ago:

> ChatGPT repeatedly told me that sqlite doesn't support "RETURNING" (it does, since March 2021).

Its dataset cuts off around 2021. There's a little footer message warning you not to expect knowledge of recent events.

ch4s3 said a year ago:

> are often lower-level and less idiomatic

I would go as far as to say it's adept at producing technically working spaghetti.

chillfox said a year ago:

Did you try copy pasting the relevant Sqlite docs into ChatGPT?

oldgregg said a year ago:

OpenAI turned on the "full featured" GPT4 just long enough to learn what people could use it for. Now they turn off those features and use all of those ideas to spawn new companies on a now-private GPT4 Original api and cripple everyone else. When they say "don't worry about building on our API we won't go up the value-chain" -- no shit, but only because they don't want regulatory scrutiny. So they will just trickle-out API access to companies that they own and control through extensive personal networks. Google, Facebook, Twitter, Amazon... Build on someone else's API and they will jam you 100% of the time.

phillipcarter said a year ago:

Posted this in yesterday's thread, but once again I think this is just people feeling the magic wear off. People have poked around a lot more and found the flaws while also trying to get it to do real-world tasks. That wasn't true when it first came out.

It's fine, this tech has never been magic anyways, won't be replacing all our jobs, won't take over the world, etc. It's still awesome for what it is.

bazmattaz said a year ago:

it’s a very neat parlour trick

skilled said a year ago:

ChatGPT was a massive dopamine hit, particularly for people in areas like development. It was a tremendous release that has definitely laid out some new tracks for the future, especially on the web. I myself have found to be using ChatGPT a lot less recently.

I got the GPT-4 API access and then I realized that I can't really use it for anything super major because I can't afford it, it is ridiculously expensive if you consider that you have to pay for all the failed requests, the wrong information or the wrong context also. Instead, I have written a bunch of Python scripts that do a select few tasks for me and I have my terminal open 24/7 anyway.

As for the topic at hand, I have _definitely_ noticed a lot more disclaimers in the UI. I don't get it from the API at all, in 6 months that I have been using the API - I've gotten one disclaimer.

In the ChatGPT UI - I get them a lot. "Remember this", "Remember that", "Always look up the information" and things like this. I mean if it wasn't happening I would know because I have been a power-user pretty much all this time...

caeril said a year ago:

> it is ridiculously expensive

What? It's 3p/6c cents per 1k tokens.

I use GPT4 for all kinds of things, but a very basic example: Automatic API client/stub/endpoint generation with GPT4 for reasonable/typical data structure sizes, 4k tokens buys me GET/POST/DELETE/PUT for all of the above.

The completion request finishes in about 90 seconds. A human junior developer would take nearly an hour. In the best case, maybe 20 minutes using something like Swagger.

So that's ~20 cents versus $60 or so, to say nothing of the time.

And this extends to everything else: writing tests, refactoring, writing configuration files, UI generation, database schema and SQL query generation, etc.

What the hell are you even talking about?

denverllc said a year ago:

The actual statement is "The API does not just change without us telling you. The models are static there." But that isn't ChatGPT.

Here's how I feel ChatGPT answers my coding questions now:

Me: Write a Python script to sum two numbers.

ChatGPT: Python is a programming language that was invented in 1991 and can be used to solve a variety of programs. Here is an example of how to sum two numbers:

    def sum(a, b):

        # note: the actual code has been left out as it depends on the actual specifications of how you want to add\*

Note that this code is merely an example and writing a Python script to sum two numbers is a complex problem that requires careful attention to whether the numbers you are trying to sum can be summed. Also, as my knowledge has a cutoff date of 2021, there may be other ways to perform this summation. Please check with the documentation or ask someone who knows how to code.

* note: ChatGPT has actually done this to me

rat9988 said a year ago:

I thought the API was versioned, though it doesn't mean the new versions aren't worse. And he doesn't talk about the model used in chat gpt. I'm a bit skeptical that this answers exactly what we were worried about. Or maybe we were not all worrying exactly about the same thing.

madrox said a year ago:

This doesn’t pass the smell test for me. Something has changed. Maybe the model hasn’t changed, but something somewhere is making output worse.

I’m honestly more concerned if OpenAI doesn’t even realize it. Nothing is more infuriating as a user than convincing the developer your bug actually does exist. It speaks to poor monitoring, testing, and tooling.

m3kw9 said a year ago:

Who knows? Maybe this engineer doesn’t know what’s going on behind the scenes. I would think the models storage is so tightly guarded only few would have access to even know if something has changed. The ones that know would have nda

heliophobicdude said a year ago:

His role is Developer Relations fwiw

armchairhacker said a year ago:

I doubt the March 14 model is any different, since it's versioned and I don't see why OpenAI would want to change it behind the scenes.

Also, companies are evaluating GPT-4 to determine whether they want to pay for it, so OpenAI has an strong incentive to not downgrade at least the API.

I believe the May 5 model is different, at least in the chat interface, because it's fine-tuned to detect jailbreaks and the temperature/other hyper-parameters may have changed. And I can imagine this fine-tuning making the model less creative and worse at solving analytical tasks.

Personally I haven't noticed any change, except in my own awareness. Sometimes GPT4 gets very hard prompts right, and sometimes it gets simple problems wrong. So it's not hard to see how people can form biased opinions from selective attention or just luck.

dontupvoteme said a year ago:

I just discovered when playing around with translations that there is some hidden filter/killswitch that immediately stops the generation of the opening of some books. It doesn't matter if the invoking prompt is to recite the book opening paragraph, or to translate it from a foreign language to English, or what.

It's not RLHF induced because it works via API and it only triggers in English, but sure enough try to get it to output

>Call me Ishmael. Some years ago—

>"It was the best of times,

I guess this might get you flagged (there is no alert to the user that this filter kicked in, and it will output it in any other language, and it works in the API) so I'm hesitant to play around with it more, but it's very strange - especially as these are long since in the public domain.

jiggawatts said a year ago:

They're trying to stop students using ChatGPT to cheat on homework assignments.

two_in_one said a year ago:

ChatGPT Plus shows the version. It was May 12, if I remember correctly, then it changed to May 24. Which probably means there were some changes. If not in GPT-4 itself, then in pre or post processing. They should have some safety filters at the end, I think.

doubleunplussed said a year ago:

I mean, there were UI changes. Could be just that.

yodsanklai said a year ago:

It's really fascinating to see all the irrational thinking triggered by ChatGPT. "risks of human race extinction", "soon no more need for developers, doctors or lawyers", "possibility of consciousness emerging", so called experts in "prompt engineering".

It's a dumb tool, if you're lucky you can get it to spit something useful (but you need other tools to check the correctness of what it returned). There are certainly many useful applications, but the technology is inherently limited.

nerdbert said a year ago:

Sure, but you fail to understand how rapidly this technology is changing. Yes, today it's a machine for spewing out glibly rephrased reddit posts. But think back to the 1970s when jetpacks were first demonstrated. Today we're all flying around on them; the automobile, aviation, and flame-retardant headgear industries have been radically altered, and humanity will never be the same again.

mensetmanusman said a year ago:

If they are overwhelmed with usage requests and can't buy GPUs fast enough, the model could be the same but the amount of compute spent on the answer could be ramping down impacting the quality.

It would be really cool to quantify how much compute spend per interaction.

If this was visible, users could spend an arbitrary amount and modify how much they're willing to pay for 'better' responses. This is probably a better business model.

agentcoops said a year ago:

Many comments miss the mark as they fail to make the crucial distinction between ChatGPT and GPT-4. GPT-4 is the underlying model one can indeed have direct access to on a pay-per-request pricing scheme. ChatGPT is an application built on top of GPT-4 which manages how the 'context' of your interaction is passed in on a per-request basis. I don't doubt the spokesperson for a minute: from my own experience, the underlying GPT-4 models have not changed and I sincerely believe that OpenAI will be careful on this front, given that they are aiming to build a once-in-a-generation company that provides a stable platform for other firms to build products on top of.

The ChatGPT application, on the other hand, and how it manages context etc has certainly changed in the intervening time. That is completely expected as even and perhaps especially OpenAI is figuring out how to build applications on top of LLMs, which means balancing how one can get the best quality results out of the model while making ChatGPT in particular a profitable business.

Stratechery has analyzed this problem for OpenAI in the most detail I've seen. I imagine the company is in something of a bind figuring out how to invest between the APIs themselves and ChatGPT. On the one hand, the latter is incredibly successful as a consumer app with a lead it will be difficult for rivals to catchup with and it is likely plugins will provide a good revenue basis. On the other hand, there is certainly a greater business opportunity in being the foundation for an entire generation of AI products and taking BPs off of revenues -- if and only if GPT4 indeed has a significant moat over the opensource alternatives. For the moment, it would seem they will have to hedge both bets as we see how the consumer space and the competition between models heats-up.

kvetching said a year ago:

This is why I unsubscribed from OpenAI.

They seem to be virtue signaling about their lack of progress now. Months later, GPT-4 still slow, still not multi-modal as they advertised, still significantly limited, you need to sign up for a waitlist for almost every feature, no sense of privacy, no understanding of their plan for improvements. Google is full steam ahead and consistently improving their free LLMs.

They actually had a genius strategy. Put out Bard with a very stupid LLM, so people aren't blown away and it doesn't get the doomsayers on their case. Now they can continue to quietly upgrade Bard. Eventually it will be so obvious that they have surpassed OpenAI.

OpenAI must enjoy watching their unsubscriber count go down. After all, Sam did say at the congressional hearing multiple times "We would prefer if people used it less".

neonsunset said a year ago:

There is No War in Ba Sing Se.

wg0 said a year ago:

This was predictable from the get go and many had had pointed that out already that soon people will start noticing that LLMs aren't as magical as they thought once the initial awe wanes away.

Don't think OpenAPI is doing anything here, it's not in their interest to reduce the "quality" even there's no objective and repeatable way to measure the quality either.

It's all probabilities all the way down. Who knows what the model will do. I mean, you can dry run by hand but even on quad core processors, it's damn slow so imagine the inference by hand.

uw_rob said a year ago:

> This was predictable from the get go and many had had pointed that out already that soon people will start noticing that LLMs aren't as magical as they thought once the initial awe wanes away.

Familiarity makes something novel appear trivial. I don't think the magic going away has anything to do with the magic of the underlying technology. Airplanes are amazing technology, but they become as boring as a car to regular fliers.

sebzim4500 said a year ago:

Was anyone in that thread claiming that the API had gotten worse? A lot of people were suggesting to use the API rather than the ChatGPT interface in order to avoid the degradation.

Ozzie_osman said a year ago:

He said "the API". He didn't say anything about the Chat version, which seems to have more protections that may be on top of the model and not embedded into it.

AtNightWeCode said a year ago:

Are we chatting directly with the model? Maybe the interface has changed. With long term use the likelihood of hitting edge cases is higher and maybe that is a cause as well for what users are seeing. People probably ask more vague questions over time. I might have done that.

I have never experienced the amnesia problem in v3.5 though, that v4 clearly has. Just repeating incorrect answers that you ask it not to give. I did not have access to v4 in march so I can't do that comparison.

pleb_nz said a year ago:

I've found it's gotb eyes and have given up using it for tasks more often than I used to.

For copilot, I no longer get multi line complete suggestions and it's really slow to deliver single line suggestions and they're more often incorrect. I need to dig into it, but it's definitely degraded further and I don't know if it's just my environment or a wider issue. I need to dig in and figure out - is anyone else experiencing these things?

s1mon said a year ago:

I can believe that the core GPT-4 model itself has not changed, but clearly they've changed some features. They've added plug-in access, which can change GPT-4's capabilities greatly with some chats.

The over all UI of the web site has changed several times (dropdown for GPT-3.0/3.5/4.0 turned into a GPT 3.5|GPT 4.0 button, they added the ability to share chats, and I'm sure there are other small details).

geoduck14 said a year ago:

Gpt is starting to suck? I'll take the blame for this one. I've been submitting crazy prompts since the beginning in the hopes of confusing it!

nashashmi said a year ago:

Is there a way to use GPT-4 directly? Instead of the muzzled ChatGPT that is supposed to give answers authorities and opinionators deem appropriate?

heliophobicdude said a year ago:

Assuming you mean the base model, no not right now. GPT-4 base model is not public.

Mizoguchi said a year ago:

Unrelated to the model itself but infrastructure, yesterday was unusable for me, getting random "too many requests sorry bout that" errors. I think 1/4 of the requests during a 3 hr period didn't make it through. Imposible to build anything beyond experimental stuff on top of a service so unreliable. Haven't tried it through Azure yet, I wonder if it is any better?

geraldwhen said a year ago:

Whatever happens to chat gpt, ai image generation is incredible. It’s so incredibly powerful that I don’t think it’s ever going away

saiya-jin said a year ago:

If you are doing some indie gaming, it can save tons of money on asset generation, you either ramp up and finalize it yourself or hire somebody but for fraction of the time. Ditto for web design, why would anybody but big, calcified megacorps with their ridiculous processes go and buy some stock images if you can get exactly what you want, in any imaginable style?

petabite said a year ago:

The amount of times per day I have to click "Stop generating" and then say "No!" has definitely increased

renonce said a year ago:

Does the author refer to GPT-4 API or the ChatGPT version of GPT-4? The recent discussions on GPT-4 quality deterioration seem to focus on the ChatGPT version rather than the API. Also, since ChatGPT version of GPT-4 is now supporting web browsing and plugins, I would assume it has to have been updated.

textninja said a year ago:

Must be in the trough of disillusionment. The tech is truly useful though so we’ll reach the plateau soon enough.

karmasimida said a year ago:

The model didn’t change but doesn’t mean the inference didn’t change.

Without going specifics it is meaningless for the discussion

heliophobicdude said a year ago:

The models for the api's could have not changed but the web app's might have too

huksley said a year ago:

Model has not changed. Prompt transformation code have changed.

So the same prompt you do are delivered to the model with a different "wrapper" prompt, significantly changing the answer model is producing.

josecyc said a year ago:

I agree I've noticed I need to be quite specific now, it won't realize bugs in the code unless I tell it so. A head to head comparison of the different versions is needed to validate this.

TradingPlaces said a year ago:

The model may be the same, but the chatbot is not. They have made the responses shorter to save inference expenses, which are huge in a model the size of GPT-4.

boppo1 said a year ago:

I read somewhere there was a paper they released that as they tuned for alignment, overall quality dropped proportionally. Anyone know the name of it?

dr_dshiv said a year ago:

Search for alignment tax

bazmattaz said a year ago:

What is “alignment” in this context?

jimsimmons said a year ago:

He’s talking about the API. Not the web client

jacquesm said a year ago:

To what degree is the pre and post processing of the chat client the source of the confusion rather than GPT-4 itself?

elboru said a year ago:

I don’t get why is this being discussed, isn’t it easy to go check old conversations and try to replicate them today?

ShamelessC said a year ago:

You would think so, but not a single person complaining has provided convincing proof of the degradation.

dgellow said a year ago:

GPT isn’t deterministic

renewiltord said a year ago:

You need to have used the API and set temperature zero, but if you have that historical data you can reasonably test and see.

haolez said a year ago:

I use GPT-4 a lot and something has definitely changed in my recent experience. It's simply worse.

amelius said a year ago:

Instead of complaining, why not show the benchmarks?

Like: first it scored 83, now it scores only 42 (or whatever).

sebzim4500 said a year ago:

What benchmarks? We have no way to go back in time and run the old ChatGPT accessble models though benchmarks.

And to my knowledge, no one has copy pasted an entire benchmark into the ui in order to run it, they just use the API.

ShamelessC said a year ago:

the commenters of HN are simply so smart that their hunch clearly holds more weight than scientific rigor.

Madmallard said a year ago:

What can we do to upheave this organization and their blatantly anticompetitive tactics?

rkapsoro said a year ago:

Perhaps the evolution of this story is an interesting example of confirmation bias?

boringuser2 said a year ago:

This is basically a tacit admission that ChatGPT4 *has* gotten worse.

sebzim4500 said a year ago:

I understand why people don't read 10,000 word articles, but this is a 15 word tweet. Would it really kill people to read it before commenting? He is very explicitly talking about the API, which uses a different model than the UI.

The recent discussion was about the degradation in the UI model.

dubcanada said a year ago:

Is it static? If I ask it the same question I asked it a few months ago it gave me a completely different answer. Is that because of some additional context above? Should we be starting fresh chats sooner?

schrodinger said a year ago:

There’s _is_ a degree of randomness in its content generation, you can see it in action by hitting regenerate. It’s a statistical text predictor that iteratively selects the most like my word after the ones it’s selected so far, but when there are a few good candidates, it’ll choose one at random.

However, the model is static (it’ll present the same candidate word list until retrained), which is what you may have heard. But the way a response is generated introduces the randomness.

Note: I forget the parameter, but if you get the direct API access you can turn off this and get consistent answers.

carabiner said a year ago:

It's never been static and there's always been randomness when you regenerate an answer. There is no control of a random seed, so you can get a different answers for the same prompt.

acomjean said a year ago:

It’s not stactic. That’s the thing about ai, you ask it the same thing it gives you a different answer/different image every time.

andrewstuart said a year ago:

GPT 3.5 is a much better coding assistant.

textninja said a year ago:

The quality is noticeably worse than 4 but the tighter feedback loops and lower expectations make up for it most of the time. So I’m inclined to agree; speed matters.

RecycledEle said a year ago:

If ChatGPT has been static since March (of 2023) the. Why does my version always change? Right now I'm running May 24.

FWIW, I think it has improved.

moneywoes said a year ago:

Yea that is very tough to believe

bugglebeetle said a year ago:

Company who has repeatedly lied in public: there’s nothing to see here!

flir said a year ago:

Today I had ChatGPT 4 (Not GPT 4) correct itself mid-response. I was asking it something very simple about regexes:

> How about matching 'a' as the second character of a string only?

It responsed with the wrong regex plus a bunch of explanatory junk:

> '^a.'

Then halfway through the explanatory junk, it corrected itself like this:

> Apologies for the confusion in the first response, the correct regular expression should be '^.a' for matching 'a' as the second character of a string:

And kept on with the (now correct) explanatory junk.

All in a single response. I've certainly never seen that before (if someone has, please weigh in). Maybe the model hasn't changed, but the pipeline has? Like... there's a second model trying to correct the mistakes of the first, maybe? (timings are probably wrong for that, but something like that)

chaxor said a year ago:

I have gotten that quite a few times now and it does seem to be new. In my case though, it acted as if it would change something, but then didn't, and got stuck in a loop. I got it enough times that it seemed to happen more often when the output needed to be quite long, so I had assumed (could be wrong) that when the output was nearing a certain limit, they had implemented something that would start a new output, such that the interaction of "please continue the script" would not be necessary.

Unfortunately though, it would start over with the script completely, and then get stuck in a loop until it broke, this not being able to even save the response at all.

Something definitely feels like it changed, but I suppose it could just be more use of the system.

Another possibility is related to some strange issues with hardware and balancing etc, despite not changing software or params. It's strange, and it shouldn't happen, but sometimes things change from simple batching and balancing, or lower level hardware related things, which are very annoying to debug.

theturtletalks said a year ago:

ChatGPT 4 seems to struggle with context changes within the same thread now. I asked it about some parks near me and then switched to asking about code. It just kept responding about the parks even when I corrected it. In some cases, it would merge the park question and the code question. I had not seen that with the free or Plus version until a few days ago.

inciampati said a year ago:

Interesting. Bing chat does this when you get it to talk about naughty things. My personal favorite is convincing it to make weird art out of quotes from the Tay chatbot. It will write them until it says a prohibited word or phrase, or touches some forbidden part of it's state space.

IIAOPSW said a year ago:

I wonder if its been trained on some of its own past conversations now, and a lot of those conversations contain "I'm sorry, the previous response was a mistake. Here's the correct version..."

coffeebeqn said a year ago:

There’s also a new button “continue generating” so they are changing some things

dinvlad said a year ago:

I think you might both be right about this.

But then, all the more power to the open-source models and UIs, which are unrestricted, free to use, have better UX, and are constantly improving [1]. Ergo the suspicion there might be something else behind the desire to regulate it by OpenAI. Though hopefully we’re all just collectively cynical and they have good intentions after all, despite the misalignment in incentives.

I do think academics at universities are not to blame for this though, they are just as much flabbergasted and/or mislead as everyone else.

In any case, rather than wasting our time and energy on (fighting) ChatGPT(Plus) or GPT4, we should* all be collectively contributing to improving the open source models, by using them, reporting bugs, contributing ideas etc. This is the only way that we will continue to have leverage, and companies like OpenAI would be forced to open up, if they want to stay at least somewhat competitive. I think their closed-sourceness is a very short-sighted decision atm.

*Should because I don’t think LLMs as technology pose any substantial extinction risk to humanity, despite the obviously marketing rhetoric. Should that somehow change, maybe we shouldn’t. But, I don’t see any technological way to stop their improvement now that the genie is already out of the bottle. Heck, we might have a better chance of developing new tech for eventually enabling AGI decades from now, but only if we do that collectively and openly - only then everyone will have a level field and no side will be more powerful than another.

[1] https://www.semianalysis.com/p/google-we-have-no-moat-and-ne...

Centigonal said a year ago:

> It's no coincidence the "free" service is visibly crippled, while paid API users are not reporting any difference.

I thought GPT-4 wasn't available to free users?

Keep in mind the tweet is specifically about the OpenAI API - They might be updating ChatGPT without telling anyone (although they have release notes)

jstarfish said a year ago:

"Free" was the wrong word to use, I meant to say "unlimited."

(edit: even that's not right. I think the verbiage is more accurate now.)

TeMPOraL said a year ago:

> It's no coincidence the flat-fee service is visibly crippled, while per-request API users are not reporting any difference. (edited)

FWIW, I am a per-request API user, and... it could be my imagination, but I've had a rather clear feeling GPT-4 got much more lazy out of the sudden, both on OpenAI platform and on Azure, and it fits the timeframe of the recent compaints... so, n=1, make of it what you will.

matthewmacleod said a year ago:

It's no coincidence the "free" service is visibly crippled, while paid API users are not reporting any difference.

GPT-4 is not available to free users, which really calls the pretext of this comment into question. What is being alleged to have happened, and why is this response unreasonable?

I’m not a very heavy GPT-4 user but I do usually use it once a day or so - it doesn’t appear to have changed noticeably but I’m not paying close attention.

jstarfish said a year ago:

It was badly-worded on my part. And I stand corrected-- I just saw all the nested tweets where people are complaining about the API too!

So many people are noticing a degradation in speed and quality of responses, and not in isolation. Rather than acknowledge this, the question is rephrased to one the responder can rightfully deny-- and suggest no changes have taken place without explicitly saying as much. You normally only see this sort of sliminess from politicians and executives on the witness stand.

> Is anyone else noticing significantly downgraded GPT-4 capabilities today? Seems like OpenAI updated the model, and results aren’t as good as before. [mentions API in a child comment]

> The API does not just change without us telling you. The models are static there.

> This is good to know. That means GPT-4 has been static since March right? 0314?

> Correct

Never ask questions to which one word suffices as an answer.

Collective confusion in the thread suggests something has changed, but the most OpenAI will attest to is that the API is unchanged and the models are static. And this may well be true, but rather than admit "...but we were fucking with the middleware/parameters" they took a firm position on a strawman argument and ignored everybody who followed with more-direct questions. Except this guy:

> I've noticed inconsistency with certain prompts performance. Is that just the non-deterministic nature of the API?

> Yes

Oh, ok. It's because the fucking API is non-deterministic that code that has worked both reliably and predictably for everyone now runs like shit for everyone. For fuck's sake, you can get better answers from a Magic 8-Ball. This guy even made the mistake of presenting an answer he'd believe for the respondent to feed into. He might as well have asked if inconsistent performance was because of the war in Ukraine.

"Logan.GPT" must moonlight as a fortune teller. He's only responding to people foolish enough to ask the wrong questions.

shmoogy said a year ago:

ChatGPT4 feels (is?) typically faster, and the results feel ... above 3.5 turbo but below what 4 was. The API seems exactly the same to me, but you cant do an apples to apples test due to variance between replies.

I dont really doubt they scaled the chatGPT4 model down a little to try to save costs with the plugins and increased usage

chillfox said a year ago:

I haven't noticed any degradation or unwillingness to correct errors in the paid web version, but I have noticed an increase of prefixing most output with some variation of "it's a complex issue..."

Edit: Having recently gotten access to plugins and the browsing mode, I have noticed that the quality of the response is better when not using either.

sillysaurusx said a year ago:

Suppose OpenAI were an evil organization. What incentive do they have to lie about GPT-4 being static?

sundarurfriend said a year ago:

They don't have to be "evil". They have incentive to want to spend less on compute, but also to not let people know that they are reducing the compute per call.

lmm said a year ago:

Keep competitors in the dark, keep money flowing in, dismiss all the complaints and lawsuits that could come from their changes, dominate the ongoing discussions about AI regulation (if outsiders can't even tell whether the model is changing or not, who would listen to them?).

ignoramous said a year ago:

Like any for-profit: lower loss / higher gain?

runeb said a year ago:

> It's no coincidence the flat-fee service is visibly crippled, while per-request API users are not reporting any difference. (edited)

If the model (gpt4) is unchanged, but they tweaked their system prompt for it on the chat interface, this is what you would see.

rl3 said a year ago:

>You can literally see this reflected in the logs they forced retention of.

Poetry.

chasd00 said a year ago:

> Liars running a black box-- what could possibly go wrong? We need regulation!

"liars running a black box" is the definition of government regulation sheesh.

pmontra said a year ago:

You're generally right but one of my nth ones was Brisbane to Cairns over the reef during the day. I'd fly it again and again.

huevosabio said a year ago:

One of my favorites is landing in Rio, the landscape is spectacular and then you get Christ the Redeemer as the cherry on top of the cake.

camgunz said a year ago:

I feel this for a lot of things, but I love flying and I've probably done it dozens of times. I know people who travel for business and that sounds like it'd kill the magic, but I still love it. Earbuds + laptop! Isolated for hours! They bring food _to you_! You're miles up in the sky!

williamcotton said a year ago:

Whenever I’m flying over an ocean on a clear moonlit night it still very much feels like magic!

myshpa said a year ago:

And now it's even gonna smell like bacon ...

The fat of dead pigs, cattle and chickens is being used to make greener jet fuel

https://www.bbc.co.uk/news/science-environment-65727664

gwern said a year ago:

This should always be the first explanation when you're speculating about something involving ChatGPT or Bing Sydney, if you aren't directly using the API and comparing temp=0 responses to identical canned prompts. There are lots of smaller models & filters and prompts involved, all of which are much easier to update than the behemoth underneath, and which do get updated frequently without notice. People jump the gun constantly on assuming it's the biggest model which changed.

robbintt said a year ago:

What is your hypothesis for the layout/diagram of models feeding the gpt-4 output for chatgpt/gpt-4 vs api/gpt-4?

TeMPOraL said a year ago:

Which one? The response wasn't clear on this - they could've meant gpt-4, or gpt-4-0314, or both. gpt-4-0314 is the "pinned" model that is not supposed to change, so I can believe it didn't change. gpt-4 is supposed to receive updates, so... are they saying it didn't?

Note that API users using Bring Your Own Key tools/chat frontends probably default to gpt-4, and not the pinned gpt-4-0314.

sundarurfriend said a year ago:

Disappointing that GPT seems to be getting more Bard-ish, rather than the other way around. Bard would refuse the request for the simplest of things ('what is kanban' got a "As an LLM I cannot..." response on its first day), which was one of the main things that gave a bad first impression of it. Now, perhaps in an inevitable game of corporate cover-your-ass, OpAI also seems to be going the same way.

ribosometronome said a year ago:

Are there any 'complex tasks' you've had refused you can share?

insanitybit said a year ago:

That sounds like it could just be a cap applied to free users. That's hardly the same as a new model.

throwuwu said a year ago:

I subscribe and I use GPT-4

furyofantares said a year ago:

He specifically says the models are changing all the time in ChatGPT

https://twitter.com/OfficialLoganK/status/166447660465806951...

throwuwu said a year ago:

It has and so has the YouTube algo. You can try the verbatim setting with search to see that there has been some monkeying going on but even that doesn’t fix everything. Maybe that’s due to spam sites but it just seems less effective.

subsubzero said a year ago:

It has and its gone downhill alot in the past 4-5 years. This is due to the need to keep an ever increasing stream of income flowing to google with a mostly static number of users and "fixing" this issue by piling more ads into users search results.

awegio said a year ago:

Then just use an Ad Blocker. The actual problem is SEO spam though.

tines said a year ago:

> glean

I think the word you're thinking of is the noun "gleam" which means a kind of lustrous shine, rather than the verb "glean" which means to harvest the remainder of something or to collect in small parts.

coffeebeqn said a year ago:

It’s magic! But it’s also wrong a lot. These days if I ask anything important I’ll also have to Google the answer and I’ll also ask chatgpt if it’s sure about some of the details

svnt said a year ago:

The dev refers to the model used in API access.

ruszki said a year ago:

Three is no ChatGPT-4, just ChatGPT and GPT-4. The tweet is about GPT-4.

reaperman said a year ago:

> just ChatGPT

There's obviously a ChatGPT-3.5 and a ChatGPT-4. Actually 4 different ChatGPT-4's: Normal, Browsing, Plugins, Code Interpreter.

kkdkdod said a year ago:

Generative AI is cool, but ChatGPT is a clear incremental improvement over previous GPT releases, and is not the advent of AGI like many of its proponents here and elsewhere are saying. Hallucinations are an important indicator that the language model doesn't have anything like "understanding" and is thus no closer to AGI than earlier models.

throwuwu said a year ago:

It might be better for the corpus to be expanded with higher quality text that is produced on demand from contractors. There are also large pools of data yet to be accessed. One giant source would be podcast transcripts but there are many others.

selcuka said a year ago:

> higher quality text that is produced on demand from contractors.

Who, in turn, will use LLMs to generate that text.

no_wizard said a year ago:

I think this would be cost prohibitive, though maybe not, however -

it could just save every answer its given and scan text for it. If they're a match it could just not index it, right?

digging said a year ago:

Doesn't work if more than 1 bot exists...

data_maan said a year ago:

there's still the pre-2023 data to train it on ... and then augment with handpicked stuff.

aliston said a year ago:

Is it true that the safeguards are considered part of the model? I had assumed that the "safeguards" that limit certain types of responses in ChatGPT were separate from the actual language model.

kmod said a year ago:

My understanding is that most of the safeguards are inside the model, but there are some safeguards that are outside the model. In particular if you ask the API to generate copyrighted data it will, but then the connection will mysteriously break after the first few words which I assume is a separate system watching the responses.

helpfulclippy said a year ago:

It seems to me that there’s lots of room to change stuff that profoundly affects the range of responses without altering the base model. The prompt template alone seems like a place outside the model where we’ve seen safeguards get implemented, and other stuff that affects the usefulness of a model’s responses.

weird-eye-issue said a year ago:

ChatGPT is not the same thing as the underlying model. ChatGPT is just a UI over the model. The tweet was about the model.

jeremyjh said a year ago:

He said the API hasn't changed. But what about the Chat website?

jacquesm said a year ago:

Exactly my question... I wonder if the pre and post processing of the chat interaction isn't what is driving the perceived differences.

aeternum said a year ago:

I wouldn't read too far into the tweet. They do tell us when it changed, and the bottom of the page clearly states : ChatGPT May 24 Version

The release notes are producty and not very technical so it is difficult to tell what actually changed.

anticensor said a year ago:

The inhibitive ("moderation") model is separate from the generative ("chatbot") model. They work in tandem.

onlyrealcuzzo said a year ago:

My immediate gut reaction at the original post about GPT-4 get SUBSTANTIALLY worse was that...

It didn't. People were just noticing LLMs still have a long way to go after using them more.

I was shocked going through the thread that all of the popular comments were confirmations that it did, in fact, get MUCH worse.

It's nice to see from OpenAI that it didn't...

ethbr0 said a year ago:

Aren't there archived transcripts with prompts now?

Seems like we need "model transparency" and log implementers to flag drift, a la RFC 9162 / Certificate Transparency.

throwuwu said a year ago:

I guess we could go back through our histories and resubmit the same prompts a few times. Wouldn’t be a fair comparison since we only have 1 sample from the old version.

greenknight said a year ago:

My understanding, is that ChatGPT... has a prompt that goes before your question (and previous answers) that set up the stage for how it should respond. It could be that the safeguards they have put in place, sit in this prefix prompt state... rather than GPT4... and that you would get the normal answers via the api rather than via ChatGPT.

jiggawatts said a year ago:

The employee is saying the API version didn't change, but the Chat web UI did definitely change.

MuffinFlavored said a year ago:

There's no way. I've been using it basically every day since it launched and it was only the first time ever I recently got a "that's too complex for me to go into detail, here's an overview" response.

throwuwu said a year ago:

The moderation endpoint is separate for the API so I’d imagine it’s the same for chat. The model could be the same while they change:

Moderation

Temperature / top-p

System prompt

Some other internal system we aren’t aware of

CSMastermind said a year ago:

Right but people are complaining about their experiences with ChatGPT.

If you say, "ChatGPT has gotten noticeably worse" and they respond "nothing about our APIs have changed" then it would be reasonable to interpret that as them saying that nothing about ChatGPT has changed.

When in reality many things might have changed about ChatGPT.

furyofantares said a year ago:

The tweet is a response to an API user saying GPT-4 got worse, and the tweet is clearly talking about the API and the model - the context of the HN megathread mostly talking about ChatGPT isn't present in the tweet that's being replied to.

There's a lot of confusion around this, but nothing appears to be caused by doublespeak to me. Confusion about which GPT/ChatGPT anyone is reporting about has been pretty ubiquitous for a while.

stavros said a year ago:

He says "the models are static", which means that, if you keep using the same model, you will keep getting the same results.

pclmulqdq said a year ago:

Which is not what anyone else is referring to when they say that GPT-4 has gotten worse.

furyofantares said a year ago:

What about the API user he's directly replying to in the tweet, who asked if the model changed?

smcin said a year ago:

Then the HN title is grossly misleading.

denverllc said a year ago:

Here's an example of a "technically true" but deceptive statement: he says that he has no financial interest in OpenAI. But there's no way that someone couldn't be the head of a huge, influential organization and not be able to use it to obtain power. For instance: at this point, Microsoft pretty much owns it (they have 100% access to the models and could cut it off at any time). Is it merely coincidence that Microsoft agreed to purchase power from the fusion company he backed?

Another example: I've noticed that a lot of times I'll get "network errors" while chatting with ChatGPT. However, this is while I'm SSH'd into a remote machine with zero latency. I think they realized that they could shift the blame from their server capacity issues to "network errors" because that made the consumer feel like it was their fault, not OpenAI's.

Another example: One of the developers of EleutherAI talked about how they tried to implement OpenAI's original model based on their paper. They were struggling to get it to work. So they actually talked to researchers who wrote the paper and discovered that a lot of what was in the paper wasn't what they did at all.

Worst of all, he is trying to create a crypto token. That would disqualify almost anyone in my book.

itake said a year ago:

A recent bit is about his push for AI regulation for safety. but the real reason is probably to build a moat to protect his business.

anthlax said a year ago:

Call for regulation then threaten to pull out of EU because of regulation?

californical said a year ago:

ChatGPT isn’t just using plain GPT-4, it’s using a specific pre-prompt and possibly additional fine-tuning (which I’m not sure about) compared to the plain GPT-4, available through API.

Someone saying ChatGPT4 is just saying in shorthand, the ChatGPT model based off GPT-4

dontupvoteme said a year ago:

There is also 100% some regex-style filtering logic which kills the output if it matches certain criteria.

It kicks in for famous book openings (e.g. tale of two cities), but only in English, regardless of what prompt you use to request it - so not part of the model at all but rather a filter on the output.

No idea if it's used for other purposes.

greatpostman said a year ago:

Lots of evidence pointing in that direction. Suppression of logic (censorship) leaks into other areas of cognition

fragmede said a year ago:

Yes, so just like Apple's real value is from Foxconn's manufacturing factories which they don't own, OpenAI's real value is from Sama, the Kenyan firm that did the RLHF annotating.

I'm wondering what the legal agreement is for Google Classrooms. In elementary schools across the world, kids write 3rd grade essays in history class or whatever and the teacher grades them. Those essays and grades are all in a database. Does Google have the opportunity to train Bard across that dataset?

chasd00 said a year ago:

i'm not in the AI world but i've been curious how the level of effort compares between inventing and writing the model vs finding, tagging, and curating the training. Is creating the training data analogous to inventing a complete schooling curriculum from scratch? I bet that takes a long freaking time.

MacsHeadroom said a year ago:

Much of GPT-4's training data was made by GPT-3.5. Info on the synthetic data in GPT-4's pretraining is starting to leak, including a reference to "Synthetic-Data(2)" in a recent OpenAI paper.

In the near future, if not already, practically all training data for state of the art models will be synthetic.

In the words of OpenAI CEO Sam Altman, they have "boostrapped" and are "past the synthetic data generation event horizon."

koonsolo said a year ago:

Of course you can't really compare it to a real senior engineer, but it has a lot of traits that come close or even surpass seniors:

1. It picks very good variable names

2. Clean, nicely structured code

3. Vast amount of knowledge where even a senior engineer still needs to look up stuff (I don't know about you, but I have a terrible memory. ChatGPT clearly has excellent memory about API's, regexes, etc.). It's the "I don't even have to look at the docs or StackOverflow for this" type of knowledge.

4. It's fast, like very fast, like "I don't even have to think about it and just type it out perfectly like a maniac".

jsight said a year ago:

I'd love it if junior engineers organized code as well as ChatGPT. I'd also love it if all engineers stopped mixing spaces and tabs.

kristopolous said a year ago:

That can mostly be fixed with conventional prettify tools. No modern ai required

alfalfasprout said a year ago:

Add a linting step to your CI to catch this stuff?

fragmede said a year ago:

Do you have a style guide?

ChatGPT, on spaces vs tabs: https://chat.openai.com/share/b2b0be49-54f9-4f73-a753-3edbd1...

nbardy said a year ago:

It has depth as well. You just need to copy in the relevant docs.

It’s writing fault tolerant data scraping scripts for me I don’t know how to write myself.

AtNightWeCode said a year ago:

That is why most developer jobs are safe. Tasks from stakeholders often contains only a title.

flir said a year ago:

Rubber ducking in the age of WFH is definitely a use I've found for it.

chasd00 said a year ago:

i don't use it that much for code but when i do it's for a specific function or method. I describe the inputs, the logic i want, and the outputs i need. So basically i have it worked out in my head I just let chatgpt type it out for me. Any mistakes it makes are pretty easy to catch when using it this way.

behnamoh said a year ago:

"Language as a tool of thought" is a technology we often neglect we are equipped with.

dkersten said a year ago:

This is essentially what rubber ducking is.

RadiozRadioz said a year ago:

Asking it directly is a good way to get a response containing version numbers that look correct, but not a good way to actually indicate which versions it's been trained on.

A better method would be to look at the language features it's using and infer from there. Or, better still, look at which versions were out when the training data was collected (which I believe is September 2021, don't quote me on that).

grenoire said a year ago:

Is there a risk of it hallucinating what new features could do instead of basing information on having seen them before?

o1y32 said a year ago:

It does not actually mean much. These chatbots can easily output JavaScript using var where it absolutely should not. It being aware of the latest standards does not mean it can properly use the new syntax or features -- it emits a form of whatever crappy/legacy code it was trained on.

dontupvoteme said a year ago:

Oh that's interesting, I didn't think about asking it directly about what version it thinks is current. I wonder if it can hallucinate here?

It's really irksome when it tries to use functions which were renamed or removed. This could be detected automatically (and possibly remapped in some cases)

I got mostly the same versions as you, both on chatgpt3/4, using english.

If you ask it in other languages the minor version also seems to change often.

Finnish gives you dates, and code-davinici-edit-001 is 1 major release behind on almost everything.

Tässä on luettelo ohjelmistojen viimeisimmistä vakaiden versioista:

Python: 3.9.5 (30.4.2021) JavaScript: ECMAScript 2021 (23.3.2021) ECMAScript: ECMAScript 2021 (23.3.2021) Java: JDK 17 (28.9.2021) C++: C++20 (20.2.2020) C#: .NET 6 (8.11.2022) Ruby: 3.0.2 (24.8.2021) Swift: Swift 5.5 (20.9.2021) Go: 1.17 (16.8.2021) Rust: 1.54.0 (27.5.2021) TypeScript: 4.4 (28.7.2021) PHP: 8.0.9 (29.7.2021) Kotlin: 1.5.31 (26.8.2021) Scala: 2.13.6 (17.2.2021) R: 4.1.0 (18.5.2021) Perl: 5.34.0 (30.5.2021) ... Sinatra: 2.1.0 (10.4.2021) .NET Core: 6.0 (8.11.2022) ASP.NET: 5.0.10 (19.8.2021)

BeefWellington said a year ago:

All it is doing is responding with things that look like accurate languages and version numbers. They may be accurate but are absolutely not representative of what it has been trained against.

polytely said a year ago:

But this is just a plausible looking answer, it is still a text generator, this has no bearing on reality per se.

chasd00 said a year ago:

i just asked it "what's the output of python -v" and the little sample code reported python 3.9.2.

flir said a year ago:

I got:

    Python 3.8.5 (default, Jan 27 2021, 15:41:15)
    [GCC 9.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.

And a lot of junk text.

PaulHoule said a year ago:

Yeah, maybe it is "wearing a thong" since it's developers were so concerned about social acceptability.

I would say though that it is no mean feat for the Emperor to get away with being naked in public, it takes power and the ability to wield it.

Similarly ChatGPT has a few competences that add up to people perceiving it is able, one of which is the ability to come up with plausible and satisfying answers whether they are right or wrong (often using shortcuts) and another one is getting people to engage with these answers.

krger said a year ago:

>It is the second most intelligent thing on the planet at creating text, even with its flaws.

This also puts it in last place.

michaelt said a year ago:

By this, do you mean 'the second most intelligent thing' after humans?

shrimpx said a year ago:

A dog understands body language and subtle cues, participates in social hierarchy and community, has self-reflection, dreams, etc.

shrimpx said a year ago:

> you have to admit the emperor is at least wearing a thong

And a lot of people are getting turned on by it.

PaulHoule said a year ago:

"More often than not" is compatible with "correct 70% to 80% of the time".

swores said a year ago:

Your previous comment was a bit confusingly worded, I, and presumably the commenter you just replied to, read it as "more than 70-80% of the time" it struggles; rather than your intended meaning of it struggling to do better than correct 70-80% of the time.

BulgarianIdiot said a year ago:

Just ignore this fella, GPT-4 is massively useful. But it takes intelligence to get intelligence out of it.

visarga said a year ago:

I think this is our path - become good at working with AI, for jobs or if there are no jobs, to directly support ourselves. Supposing we will be able to use tools and AI to build things and make our own means.

PaulHoule said a year ago:

The tweet is an OpenAI employee who is responding to people who have fallen out of love.

I've seen transcripts of people interacting with ChatGPT who were obviously seduced by it and in a very giddy state, having so much fun because ChatGPT was playing an extended "game" with them that it didn't bother them at all that ChatGPT was spouting wrong answers.

A major complaint I've had about the social sphere is that I seem to get the same result if I am 20% right or 50% right or 80% right or 95% right or 99.8%, it is just exhausting and I can never be good enough and I'm frankly envious that people see more of a glimmer of light behind that thing's "eyes" than they do behind mine.

The core thing about neurotypicality isn't so much that they get the wrong answers but that they get the same answer whether it is right or wrong. For a long time I thought the basis of the "language instinct" is a derangement about reasoning with uncertainty that causes the grammar representation to collapse into a low-dimensioned subspace which is learnable with a limited amount of data. I wouldn't be surprised at all if other animals could beat us at rock-scissors-paper or poker if they could understand the rules of the game. The success of LLMs might give us some insight in this area although they are working with so much more data that Chomsky's old "poverty of the stimulus" argument might not apply.

wouldbecouldbe said a year ago:

It's more that the balance in usefulness vs waste of time shifts into the other direction after a few times spending long time getting it right and just doing it yourself anyway.

BulgarianIdiot said a year ago:

I've spent 20 years explaining my code to a rubber duck and getting great use out of that duck. No one will ever convince me that it's less useful now when the duck actually has read the Internet and has useful suggestions and ideas back to share. Even if it makes shit up on an occasion.

danielbln said a year ago:

Maybe the reverse is true? To provide faster performance they dumb down gpt4 (quantization?) that serves the chatgpt UI but the API is full power albeit slow.

jonplackett said a year ago:

I think it’s just prioritising ChatGPT.

The reason that’s my guess is that the API is occasionally much faster and then goes slow again. It’s a bit all over the place which leads me to suspect it’s based on demand

dangerlibrary said a year ago:

It cuts off in September 2021, supposedly - that's why I pointed out the month of the relevant sqlite release.

tedsanders said a year ago:

Think of GPT like a document simulator. If 90% of documents on the web talk about an old version of a library, then GPT-4 is likely to use that old version.

When I asked it write me a Chrome extension, it used Manifest V2, which was already slated for deprecation. When I asked for a V3 extension specifically, it was happy to comply. So even if a new release came out before Sep 2021, if it hasn't had time to become the dominant version in code examples around the web, GPT-4 may still use older versions.

skilled said a year ago:

Maybe more pay attention to my comment,

> super major

Also, it sounds like you have no idea that feeding the API back what it has written improves it tenfold for complex tasks.

Enjoy your 90 second requests though.

dontupvoteme said a year ago:

by preventing it from outputting one of the most famous sentences in English literature? well done lads

hilariously if you throw s/worst/blurst/ on to your request it can output the forbidden dickens n-gram. does this constitute a jailbreak? or is the entire thing utterly emblematic of the modern technolegal mess of things since dickens is squarely and quintessentially in the Public Domain

wg0 said a year ago:

Has OpenAPI themselves announced or someone has objectively and verifiably proved that the UI and API invoke two different models?

sebzim4500 said a year ago:

Not directly, but the model version displayed in the UI changes every week or so while the api version has apparently been stable.

textninja said a year ago:

It still takes a surprising amount of labour and artistry to get an AI to give you exactly what you want (see also: Pareto principle). Consumer expectations scale proportionally to technological progress, so demand for premium assets and brand differentiators won’t be going away anytime soon; the state of the art has advanced but stock photo business will advance right along with it.

geraldwhen said a year ago:

I’d say ai art lets me get to a point where I can justify paying someone. It’s hard to understand a game without decent visuals, even if they’re filled with ai turds.

boppo1 said a year ago:

Thanks!

sebzim4500 said a year ago:

I went through a bunch of old prompts and the response was definitely worse now than it was then.

Since it's not determinstic though, it's hard to draw conclusions. Especially since the sample size was 10ish.

guraf said a year ago:

You're right of course that some randomness is inherent, but you can adjust the "temperature".

With a low enough temperature you get essentially the same output every time, with a at most just minor words swapped.

drexlspivey said a year ago:

That’s not really an issue with AI in general, you can make it deterministic by tuning some parameters if you want (temperature in chatGPT, passing a seed in midjourney etc)

jdironman said a year ago:

This sounds exactly like what is happening now that you mention it.

ars said a year ago:

You're supposed to click "new conversation" when changing topics.

Jach said a year ago:

> Should I start a new conversation with you for a new topic or can you switch to a new topic in the same conversation just fine?

> You can continue within the same conversation for a new topic. There's no need to start a new conversation. Feel free to ask about a different topic, and I'll do my best to assist you.

IIAOPSW said a year ago:

Just like with a real general intelligence!

flir said a year ago:

Hm. Maybe they've backported some nanny code from Sydney?

iliane5 said a year ago:

AFAIK it's pretty standard practice not to expose the "raw" LLM directly to the user. You need a "sanity loop" where user input and the output of the LLM is checked by another LLM to actually enforce rules and mitigate prompt injections, etc.

TeMPOraL said a year ago:

> Collective confusion in the thread suggests something has changed, but the most OpenAI will attest to is that the API is unchanged and the models are static. And this may well be true

Here's the thing. Re-read the exchange you quoted:

>> The API does not just change without us telling you. The models are static there.

>> This is good to know. That means GPT-4 has been static since March right? 0314?

>> Correct

Does that "Correct" mean all GPT-4 models have been static since March, or does it only cover the gpt-4-0314, which is a single, specific model? gpt-4-0314 is the static snapshot model, hence 0314 in the name; it exist as a stable base, while gpt-4 was intended to be updated over time.

So I feel OpenAI may be dodging here. That "Correct" may just mean "the gpt-4-0314 model was not updated since March 14", which is, like, the very reason this model exists in the first place.

Important point: if you're using API pay-as-you-go access, unless you wrote the very tools you use with that API, you're most likely using gpt-4, and not gpt-4-0314.

rrrrrrrrrrrryan said a year ago:

> if you're using API pay-as-you-go access, unless you wrote the very tools you use with that API, you're most likely using gpt-4, and not gpt-4-0314.

Do you know of any alternative ChatGPT UIs like chatbotui.com or typingmind.com that let you specify the model gpt-4-0314 ?

This feature alone would be enough for me to switch.

avereveard said a year ago:

And anyway my program used the api with a temperature of zero and it still started acting erratically. They are not telling it straight.

californical said a year ago:

Yeah I’m having the same experience. The replies have gotten faster in the last few days, at the expense of quality.

It’s noticeable because I used to be able to read each word as it was printed from the GPt-4 output, but now it goes far too fast for me to keep up with.

Kinda sucks to not have transparency at all into the black box, they’ve strayed so far from “open” at this point that it’s comedic

gwern said a year ago:

Components which are likely there and which could be affecting quality: the invisible-to-the-user 'system prompt' (probably with few-shot examples), retrieval from history (not necessarily exclusively your own, as a way to augment few-shot), cascaded models with a very cheap 'turbo' model to try to answer first, possibly regular finetunes of the main chat model (note that OP only specifies the API model hasn't changed, but the live chat models seem to change frequently), and a post-generation filtering model trying to reject offensive outputs.

We know MS Bing Sydney is doing at least 3 of those (prompt, cascade with Megatron, and finetuned rejection classifier for post-output filtering) on top of its GPT-4-finetune, so it's not a stretch to figure that OA is doing similar things.

throwuwu said a year ago:

That’s pretty easy to avoid if you have them work in an office on company provided machines on a network that blocks access to LLM sites. Not exactly rocket science here.

kikokikokiko said a year ago:

The dataset of internet content pre-2022 will be regarded in the future just like low-background steel. Any content generated post the release of the first generally accessible LLMs will be considered radioactive.

z3c0 said a year ago:

I'm of the same mind, and I believe they're keeping their ear to the ground to address any jailbreaks and loopholes. I have tricked GPT-4 into spitting out text it shouldn't have, only to have it dance around the same prompts less than 24 hours later. This "the model is the same" response seems like a deliberate deflection meant to mask the mechanical turk that ChatGPT is becoming.

danpalmer said a year ago:

ChatGPT is also the fine tuning and prompting of the model. It’s a distinct set of weights from “raw” GPT-4/etc, it’s just not a foundational model.

weird-eye-issue said a year ago:

No, ChatGPT isn't a model. gpt-4 is a model, gpt-3.5-turbo is a model, text-davinci-003 is a model. ChatGPT is a user interface.

It has a very basic prompt on top of the existing models. There is no additional fine tuning involved.

danielbln said a year ago:

I could not figure out what everyone was talking about yesterday as my experience with gpt4 has not degraded at all, but I'm using a third party client via API.

throwuwu said a year ago:

I tried comparing the API response against the chat on the same question a few times. There isn’t a huge difference but I’d pick the responses from the API over the ones from chat. Hard to say though, could be RNG.

flir said a year ago:

Which 3rd party client, if you don't mind me asking? I'm looking to move, and the space isn't mature enough yet for there to be a clear leader.

Capricorn2481 said a year ago:

That is not what that means

pclmulqdq said a year ago:

What made you think that the person he replied to meant to ask about the GPT-4 model/API instead of ChatGPT-on-GPT-4? The latter is what a lot of people refer to when they say "GPT-4." People do not always communicate with infinite precision.

furyofantares said a year ago:

The tweet specifically refers to the model "Seems like OpenAI updated the model" and if you look at their profile you'll see "CEO @HyperWriteAI, @OthersideAI - I make AIs do the impossible." as well as a pinned tweet announcing that they're building a personal assistant/agent on top of this stuff.

I didn't check this part but the OpenAI employee very likely follows them and knows all this context.

MacsHeadroom said a year ago:

That is the moderation API, which you can actually block from the front-end (with javascript). Look up "ChatGPT DeMod."

dontupvoteme said a year ago:

Are you sure? I ran it by the moderation endpoint and both the partial output "It was the best of times," and the full thing give me e-06 or lower. This also doesn't give you any notification like other filters do.

I always forget how to see this stuff on the network tab of devtools since it's websocket obfuscated or such (befitting openai), but i have the feeling they're just killing the process directly on their end

jsight said a year ago:

Maybe I could get chatgpt to write those pipeline steps for me. :)

oarsinsync said a year ago:

You can and you should. It’s this kind of busywork I farm out to GPT these days and it’s great at it. I suck at it, so it saves me an hour to focus on what I’m good at!

moffkalast said a year ago:

Well GPT 4 consistently defaults to returning 2 space indents, so clearly it has an opinion on the matter.

RadiozRadioz said a year ago:

Yes, good point, I suppose we would have to weigh the probability of that against it fudging a number. I would assume inventing a new feature is harder, but who's to say; all this LLM stuff comes down to probability.

hereonout2 said a year ago:

Ha! So far it seems four posters here have asked similar questions and got four different python versions.

I don't know if we can treat the gpts in this way and expect reliable answers. We just get pretty good answers right up to the point that we don't.

flir said a year ago:

Agreed, I think this is where its "stochastic parrot" nature shines through. All those version numbers are probably equally well represented in internet text.

BulgarianIdiot said a year ago:

This is our path yes. All paths end, eventually, but oh well.

JohnFen said a year ago:

I don't know. The value of rubber-ducking is that you have to break the problem down into very simple terms to explain it. It's the explaining it part that is magic. That value goes away (or is greatly diminished) if the rubber duck responds with anything other than a request for further explanation.

BulgarianIdiot said a year ago:

You have to break down a problem to GPT just like you would to a human, and if it misunderstands you or asks you, you'd have to elaborate just the same. It's modeled after us, after all. The rubber duck is a stand-in for a human. GPT also is. But just a vastly better one.

dangerlibrary said a year ago:

My point (which may not have been clear) was that when I explicitly told it that the versions of SQLite I'm using supports "returning," it still returned incorrect code. It did not use "returning", and repeatedly, even after being corrected, grabbed the row count instead of the column value.

I understand that it's a language model - my point was that in my experience, the depth of training data just isn't there for alternative implementations. It can do the things you ask one way, maybe two or three, but if you know how you actually want something done you're probably better off writing it yourself (for now).

JimmyAustin said a year ago:

GPTs are not trained on themselves, and what little knowledge they have of their own capabilities is limited to the discussion of it in their training set. This is a hallucination.

Jach said a year ago:

I might believe you if we were 'talking' directly to the model, however this is a chat box and clearly something is aware of usage questions and how to answer them. If you've been granted plugin access, enable some and ask it what plugins are enabled and how to use them, 'it' will tell you.

ars said a year ago:

Well, if you were talking to me about something, and midway with no warning or other signal, changed topics I would also get confused.

With humans there is usually a non-verbal signal letting the other person know you changed topics.

anonzzzies said a year ago:

> if you were talking to me about something, and midway with no warning or other signal

This happens all the time in a social group setting though. Not much confusion ensues.

bradknowles said a year ago:

Not if you're my wife.

Sometimes I spend several minutes of a conversation trying to figure out which thing she's talking about, because we've already covered like twenty different topics in the last ten minutes and she seems to just switch around at random and with no warning.

Sometimes I get enough context that I can make that swap or stack pop, but many times not.

IanCal said a year ago:

Could be related to the system prompt.

danielbln said a year ago:

We've been using https://github.com/cogentapps/chat-with-gpt for a while, and overall happy with it, but its development is mostly dead.

JohnFen said a year ago:

Of course.

The part I was responding to was this:

> when the duck actually has read the Internet and has useful suggestions and ideas back to share

I think that if it does that, it's legitimately less useful as a rubber duck. I'm not saying it isn't useful -- it's just not useful as a rubber duck anymore.

vinnyvichy said a year ago:

Similarly, but not quite, I often find browsing new submits on HN more useful than simply going through the front page. Just because, uh, front page is decided by statistics, that is, the community's neurotypicals

furyofantares said a year ago:

It's clear that it's aware that it's a chat bot, it's not clear at all that it's aware of usage questions and how to answer them. Its answer being wrong seems to be evidence that it isn't.

ars said a year ago:

I think wives believe their husbands can read their mind, and they get annoyed when that doesn't happen :)

But yah, this is exactly what I mean - without some cue, neither humans, nor machines, can tell when you switch conversation.

story

OpenAI Employee: GPT-4 has been static since March(twitter.com)

304 Comments:

Loading...

story

OpenAI Employee: GPT-4 has been static since March(twitter.com)

304 Comments:

OpenAI Employee: GPT-4 has been static since March (twitter.com)

OpenAI Employee: GPT-4 has been static since March (twitter.com)