It'll be interesting to see where this goes. Amazon has had ML-generated garbage books for years now, and I assume they haven't taken them down because they make money even when they sell garbage.
Maybe there's so much garbage coming in now that they finally have to do something about it? I feel for people trying to learn about technical topics, who aren't aware enough of this issue to avoid buying ML-generated books with high ratings from fake reviews. The intro programming market is full of these scam books.
I was thinking about buying an air fryer. My search came up with cookbooks specific to that air fryer, and I was intrigued. I found a good 5-star book, but then I found that ALL the 5-star reviews were submitted the same day.
I complained, but Amazon defended the book as legitimate, and since I hadn't purchased it, they would not take any action. (to be honest, I assume frontline customer service reps don't have much experience or power)
So I purchased it, complained, got a refund and then they were able to accept my complaint (after passing the complaint higher in the food chain).
Seriously, how hard was it amazon? I guess they're starting to notice.
Take a look at air fryer cookbooks - there are books specific to most makes and models. But everything is ML copypasta all the way up and down - the title, the recipes and the reviews all seem to be generated garbage.
Garbage books are used for money laundering.
You buy books using stolen credit cards and such.
I knew guy who made "generated" text books in 2010. He would absorb several articles, and loosely stitch them into chapters with some computer scripts and from memory. In a week he would produce 400 pages on new subject. It was mostly coherent and factual (it kept references). Usually it was the only book on market about given subject (like rare disease).
Current auto generated garbage is very different.
> … I assume they haven't taken them down because they make money even when they sell garbage.
I’d be surprised if this is the case. The money they make is probably a rounding error compared even just to other Kindle sales. Much more likely is that they haven’t seen it as a big enough problem - and I’m willing to bet it’s increased multiple orders of magnitude recently.
Behind the Bastards did a 2 part episode on how these "books" are preying on children and busy parents: https://www.iheart.com/podcast/105-behind-the-bastards-29236...
> Maybe there's so much garbage coming in now that they finally have to do something about it?
It seems like this is preventative action rather than reactionary, as they say that there hasn't been an increase in publishing volume, "While we have not seen a spike in our publishing numbers..."
I thought it was more so filled with low quality mechanical turk garbage books.
In my opinion, all we learn over time is that we need gatekeepers (publishing houses in this case). The general public is a mess.
I think we will see tidal waves of 'not-so-good' AI-generated content. Not that AI can't generate or help generate 'good' content, but it will be faster and cheaper to generate 'not-so-good'.
These waves will mainly be in places in which we are the product. And those waves could make those places close to uninhabitable for folks who don't want to slosh through the waves of noise to find the signal.
And in turn that perhaps enables a stronger business model for high quality content islands (regardless of how the content is generated) - e.g. we will be more willing to pay directly for high quality content with dollars instead of time.
In that scenario, AI could be a_good_thing in helping to spin a flywheel for high quality content.
Assuming not too many people die eating mushrooms while we're waiting: https://www.theguardian.com/technology/2023/sep/01/mushroom-...
Except they shouldn't be islands. Unify/standardise the payment mechanism, make it frictionless and only for content consumed. There's no technical reason you shouldn't see an article on hn or wherever, follow the link and read it and pay for it without having set up and pay for a subscription for the entire publication or jump through hoops. It should be a click at most.
There will always be a place for subscriptions, but people want the hypertext model of just following a link from somewhere and there is absolutely no technical reason for that to be incompatible with paying for content. The idea that ads are the only way to fund the web needs to be challenged, and generative AI might just provide the push for that to finally happen.
Or maybe there will be no such crisis and it'll just make the whole thing even more exploitative and garbage-filled.
While not exactly the same, the invention of the printing press caused a lot of controversy with the Catholic Church. With the printing press, people could mass produce and spread information relatively easily. I'm sure a lot of it was considered "low quality" (also heretical). Seems like we're going through similar growing pains now. Yes I know it's different, but it rhymes.
I really dislike the comparison. The printing press democratized knowledge. The LLM destroys it. LLM output is perfect white noise. Enough of it will drown out all signal. And the worst part is that it’s impossible to distinguish it from real human output.
I mean think about it. Amazon had to stop publishing BOOKS because it can no longer separate the signal from the noise. The printing press was the birth of knowledge for the people and the LLM is the death.
You're implying that what is being produced has actual value, the problem is they're acting in patently bad faith. Weep not for the spammers.
> the invention of the printing press caused a lot of controversy with the Catholic Church
The example is from the sixteenth century, but the printing press is from the seventh century.
I don't think the Catholic Church bothered to take any notice at all?
The rhyme has a lot to do with how existing power structures handle a sudden increase in the amount of written text generated. In this comparison, they both try to apply the breaks. Banned books didn't work well for the Catholic Church. I think increasing QA for Amazon might actually help their book business. Of course, a book seller has a greater responsibility to society than to make money.
The incentive system is completely different. The new AI generated content is for a quick buck, just spamming out content because $1 x 10,000 is a lot.
If it was written with the aid of AI, that's different. At least someone tried to make something good and just used avalible tools to enhance the quality.
>similar growing pains
For what it's worth, these 'growing pains' took the form of the wars of religion in Europe, which in Germany killed up to 30% of the population, that's in relative terms significantly worse than the casualties of World War I and II. So maybe the Catholic Church had a point
How do we...
I'm not entirely sure how to word this question.
How do we make sure that most of the people we talk to are at least humans if not necessarily the person we expect them to be? And I'm not saying that like a cartoonish bad guy in a movie who hates artificial intelligence and augmented humans.
How do I not get inundated by AI that's good at trolling. How do I keep the social groups I belong to from being trolled?
These questions keep drawing me back to the concept of Web of Trust we tried to build with PGP for privacy reasons. Unless I've solicited it, I really only want to talk to entities that pass a Turing Test. I'd also like it if someone actively breaking the law online were actually affected by the deterrence of law enforcement, instead of being labeled a glitch or a bug in software that can't be arrested, or even detained.
It feels like I want to talk to people I know to be human (friends, famous people - who might actually be interns posing as their boss online), and people they know to be human, and people those people suspect to be human.
I have long term plans to set up a Wiki for a hobby of mine, and I keep getting wrapped around the axle trying to figure out how to keep signup from being oppressive and keep bots from turning me into an SEO farm.
This is only a problem for someone terminally online. The vast majority of people talk to their friends and coworkers in person.
Meet people in real life. This problem is trivially solved by just using meatspace.
Alternatively for sign ups, tell them to contact you and ask. Chat with them a moment. Ask them about their hobbies and family.
There is some irony in Sam Altman bringing us the cause (AI) and purported solution (Worldcoin) for your problem at the same time.
we don't, check Boltzmann brain https://en.m.wikipedia.org/wiki/Boltzmann_brain
See also: "Tom Lesley has published 40 books in 2023, all with 100% positive reviews"
I remember that one - interestingly the amazon link it goes to shows only 3 books now, all that look real, not the 40 that I remember seeing before.
So I guess Amazon is doing something even though I regularly hear complaints from authors that they allow blatant piracy all the time
It seems Amazon cares more about polluting search results in Kindle than polluting the search results in their own e-commerce business. I think low-effort books generated by AI are much less detrimental than sketchy physical products being shipped to your door in 2 days or less.
It's probably about volume rather than quality. Sketchy copycat product lines are still hard limited by the number of factories and shipping operations in existence, while sketchy AI-generated books can easily keep growing exponentially in number for a while.
The title of this story doesn’t seem to match the content. This seems like a proactive move to prevent individual publishers from spamming many many submissions - and even then, they’re willing to make exceptions.
> While we have not seen a spike in our publishing numbers, in order to help protect against abuse, we are lowering the volume limits we have in place on new title creations. Very few publishers will be impacted by this change and those who are will be notified and have the option to seek an exception.
Livestreams where artists show their creative process and use the streaming platform to immediately sell the thing they produced, just to prove it had human origins.
This is the future
We have realtime filters, avatars, translators, TTS, etc. Soon, all of this will be "good enough" to mimic the proposed solution.
You're only kicking the can down the road.
>> We require you to inform us of AI-generated content (text, images, or translations) when you publish a new book or make edits to and republish an existing book through KDP. AI-generated images include cover and interior images and artwork. You are not required to disclose AI-assisted content.
>AI-generated: We define AI-generated content as text, images, or translations created by an AI-based tool. If you used an AI-based tool to create the actual content (whether text, images, or translations), it is considered "AI-generated," even if you applied substantial edits afterwards.
AI-assisted: If you created the content yourself, and used AI-based tools to edit, refine, error-check, or otherwise improve that content (whether text or images), then it is considered "AI-assisted" and not “AI-generated.” Similarly, if you used an AI-based tool to brainstorm and generate ideas, but ultimately created the text or images yourself, this is also considered "AI-assisted" and not “AI-generated.” It is not necessary to inform us of the use of such tools or processes.
This is really interesting. I imagine that AI-generated art / illustrations for books mostly-text is a pretty compelling thing for authors, for all the same reasons that AI-generated text is of value for non-authors. I wonder how this line will work out in practice.
This doesn’t seem surprising. Half of my YouTube ads these days are for some kind of AI+Kindle-based get rich quick scheme.
About time, YouTube is full of videos about making eBook's with ChatGPT. e.g "Free Course: How I Made $200,000 With ChatGPT eBook Automation at 20 Years Old" https://www.youtube.com/watch?v=Annsf5QgFF8
Strategically, AI generated content is a boon for platforms like Amazon.
1. The more content there is, the more you can't reliably get good stuff without reviews, the more centralized distribution platforms with reviews and rankings are needed.
2. Even if people are making fake books for money laundering, Amazon gets a cut of all sales, laundered or not.
Just like Yahoo's directory once upon a time though, and Movie theaters, the party gets ruined when most people learn they can use AI to generate custom stories at home and/or converse with the characters and interact in far more ways than currently possible. Content is going from king to commodity.
amazon's reviews and rating are completely garbage and have been for some time
This sounds like a commendable move by Amazon. I especially like the idea of requiring disclosure of use of "AI".
Here's a pretty good article about the problem with AI generated books. "AI Is Coming For Your Children" 
Why do people read contemporary books is something I can’t really get my head around. There’re so many classics to keep people busy for life - and are 100% guaranteed to be insightful and pleasurable.
Should people stop telling new stories? A century from now the best books of today will be classics. Books can act as a time capsule of a certain time and place and mode of life. And that has value.
Contemporary books are just new classics. It is like asking why read :)
There’s a distinct demographic in the contemporary-fiction-reading community, as can be seen in corners of Goodreads or Instagram, that demands new fiction to tell the stories of groups not covered, or supposedly unfairly covered, in that classic literature: LGBT, BIPOC, the working class, etc. In fact, they might even deny that the classics are “insightful and pleasurable” due to these social concerns.
What's the name of the law where the longer something has already been around, the longer it will likely stay around in the future?
I've found that it definitely applies to books. Starting at a ~20 year horizon is a surprisingly good filter for quality.
Yes, and there's been a drop in quality since then too. The 1800-1940s really saw literature as the high water mark for quality media and it shows.
Finding deeply valuable and high quality books is much rarer in today's crop of authors. The best minds are rarely making the medium of literature their highest good, but are instead chasing dollars and relations with the rich and famous.
I think the risk of reading a suboptimal book is not greater than the risk of not allowing myself to be exposed to different voices.
One of the best books I read last year was the story of the rescue of the football team that was trapped in a flooded cave in 2018 – written by cave diver Rick Stanton, who found the team and led the rescue. How would that account have been written into a book before it happened?
This is just a tip of the iceberg, compared to what we are heading into with the web. Very concerning.
I would go long the value of genuine human writing, aka the 'small web'.
Gee, I sure hope people don't just lie about it...
It doesn’t matter. It’s garbage content and immediately recognizable as being AI generated.
It is absolutely possible to write a good article or even a good book with AI, but at least for now it’s just as hard, if not harder, than doing it without AI.
But of course people trying to make a quick buck won’t put in the required effort, and they likely don’t even have the ability to create great or even good content.
If KDP required an ISBN it would cut down on the garbage books. In the US at least, ISBN's cost money.
they're not that much but you can just get an Australian one for free
So, what are the actual limits?
Finally, I hope those garbage books will slightly decrease from there.
Mushroom picking guides on AI "what could possibly go wrong"
How do we even know this entire comment thread isn't polluted with AI?
Maybe it doesn't matter. The quality of the work matters more than the process of actualization.
In a practical sense: AI generated stuff is crappy and often subtly wrong and it can be generated faster than human generated content. So it becomes untenable to even search for good information.
It seem that as a society we are coming to realize that enabling anyone to do anything on their own and at anytime isn't the best of ideas.
Verifiability and authenticity matter and are valuable. Amazon has long had a problem of fake reviews. This issue with kindle books seems an extension of that. Massive centralized platforms like Amazon makes fraud more likely and is bad for the consumer.
The "decentralization" that we need as a society is not in the form of any crypto based technical capability but simply for the size of the massive players to be reduced so competition can reemerge and give consumers more options on where and how to spend their dollars. Other E-book stores may just pop up that develop relationships with publishers and disallow independent publishing if amazon were forced to be broken up.
I hope the FTC can begin finding a strategy to force some of these massive corporations to split making it more likely for there to be more competition.
I'm the author of Python Crash Course, the best selling introductory Python book for some time now. Years ago, someone put out a book listing two authors: Mark Matthes and Eric Lutz. That's just a simple juxtaposition of my name and Mark Lutz, the author of O'Reilly's Learning Python. The subtitle is obviously taken from my book's subtitle as well. I assume the text is an ML-generated mess, but I haven't bought a copy to verify that.
I used to comment on reviews for books like these explaining what was happening, but Amazon turned off the ability to comment on reviews a long time ago.
I've spoken with other tech authors, and almost all of us get emails from people new to programming who have bought these kinds of books. If you're an experienced programmer, you probably know how to recognize a legitimate technical book. But people who are just starting to learn their first language don't always know what to look for. This is squarely on Amazon; they have blocked most or all of the channels for people to directly call out bad products, and they have allowed fake reviews to flourish and drown out authentic reviews.
I think the best way to recognize a legitimate tech book is... visit a Barnes and Noble. If it's a publisher or series you can find printed on the shelf, books are legit.
Unfortunately online market "platforms" are pretty much widely untrustworthy for any sort of informational purposes.
Why don't beginners start at Python.org, though? It's such a great resource to learn the language.
- it's free, unlike books
- always up-to-date, unlike even the best book after a few months
- easy to choose: heck, there's only one official documentation! No chance of making a mistake here!
i stopped frequenting the dev.to community because the average quality of articles just got so low it stopped being worth my time
Ugh. I hate the, "You're not a customer yet so our CRM system won't let me talk to you."
And what happens when my problem is that your system won't let me place an order?
False Negatives and False Positives are always connected. On the other side of the equation, there are plenty of bad actors who will casually flag their competitors to score a quick win. Crime doesn't like to go uphill - raising the stakes for feedback lowers the prevalence of bad actors.
I think that's a different issue. Amazon has thorny problems with takedowns. Company A trying to get rival company B's listing taken down probably happens 100's of times a day. I believe Amazon uses "proof of purchase" kinda like a CAPTCHA or proof of work - an extra hoop to jump through to reduce the volume of these things they have to adjudicate.
CRM should never mean Sales Prevention as a Service.
> Seriously, how hard was it amazon? I guess they're starting to notice.
It's not hard. It's a cost center, and they're in the business of making money - not providing the best service.
They're biggest risk has always been the perception they peddle fraudulent simulacrums of worthy products.
It’s the same across all big tech. The size/volume for complaint handling doesn’t scale. It’s either filtered out by some machine learning algorithm or some poor person in a 3rd world country getting paid next to nothing who reviews the complaints so quality isn’t of importance.
There been a recent influx of scammers on Facebook local groups. Air con cleaning, car valeting, everyone’s calling out the scammers in the comments yet when you click report to FB the response is we have reviewed the post and it has not breached our guidelines, would you like to block the user.
If I don't get where I want to be with the front door customer service within a decent amount of time, I have always had good success contacting [email protected]. Their executive support team gets back quickly via email or phone and they really seem to care.
I wonder if that means the Feds made a phone call to Jeff on his private line and said we need to have a little chat.
We can track money laundering when there are X fake books. We can't when there are 10X fake books.
I wouldn't even consider that generated. That's like where useful content and copyright infringement overlap on a Venn diagram.
> That's like where useful content and copyright infringement overlap on a Venn diagram.
That sounds like a description of LLM-generated content to me ;-)
For several years now, Amazon KDP will block books whose content is already available on the web. I have printed a few books whose content was either CC-BY or public domain due to its age, and in each case my book was automatically blocked in the early stages. I had to submit an appeal that was reviewed by a person in order to proceed.
explains the CouchDB Book from OReily from that time.
Do people still use CouchDB? Blast from the past!
I think what we’re seeing here is a symptom of the broader and more fundamental problem of trust in society. We’ve gone from a very high trust society to a very low trust society in just a few decades. We, as technology people, keep searching (desperately) for technical solutions to social problems. It’s not working.
Because technology never was the solution for social problems, it's a solution to the few people getting very rich problem.
The standards for filtering internet data have dropped badly.
Amazon and Google both abuse their filtering systems on a daily basis to effect social change.
We need new companies built with policies to keep the filtering systems rigid, effective and unchanging. We need filterkeepers.
I’m good with Amazon and Google over some unknown. I don’t want some right wing shit to be my gatekeepers.
Such systems just result in content that is terribly bland, or worse, intentionally limited to push specific political narratives.
I'd rather have a much more diverse and interesting set of content to choose from, even if some of it might not be to my liking, and even if I'd have to put some effort into previewing or filtering before I find something I want to consume.
Some people value their time, energy, and money more. I can appreciate that you do not as we all have choices but I imagine that most people would disagree.
Common foraging rhetoric is that you need two independent sources asserting that a wild food is edible. Ones that cite neither each other or the same chain of citations. And preferably a human who says, "I've been eating these for years and no problems." or scientists who did recent blood work to make sure you aren't destroying your organs by eating .
In a world with fake books, it would be quite easy for two books to contain the same misinformation or mis-identification (how many times have I found the wrong plant in a google image search? More times than I care to count). Two fake books putting the wrong mushroom picture next to a mushroom because they were contiguous on some other page and you have dead people.
 In the ten years since I started working with indigenous plants, wild ginger (asarum caudatum), has gone from quasi-edible to medicinal to don't eat. More studies show subtler wear and tear on the organs (wikipedia lists it as carcinogenic!) and it is recommended now that you don't eat them at all, even for medicinal purposes. I'm not sure I own a foraging or native species book younger than 5 years, and many are older.
Damn had no idea about wild ginger. That is a bummer.
> There's no technical reason you shouldn't see an article on hn or wherever, follow the link and read it and pay for it without having set up and pay for a subscription for the entire publication or jump through hoops. It should be a click at most.
People have been saying this and building startups on this and having those startups crash and burn for decades.
It's not a technical problem. It's a psychology problem.
Paying after you've read an article doesn't provide the immediate post purchase gratification to make it an inpulse purchase . The upside of paying for an article you've already read is more like a considered purchase . But the amount of cognitive effort worth putting into deciding whether or not to pay for the article is often less than the value you got from the article itself. So it's very hard for people to force themselves to decide to commit to these kinds of microtransactions. See also .
It's just a sort of cognitive dead zone where our primate heuristics don't work well for the technically and economically optimal solution. It's sort of like why you can't go into a store and buy a stick of gum.
I'm a bit confused here. I never said the click would be after reading the article. You would need to pay to read.
Edit: Ah, I did say
> see an article on hn or wherever, follow the link and read it and pay for it
That wasn't supposed to be a chronological sequence of events, but I see I accidentally implied that. Apologies for the confusion.
>It should be a click at most.
Welcome to new and interesting ways to defraud people over the internet for money school of thought.
At least with Amazon it's a "one and done shop" of who I spent my money with when I bought something.
Imagine tomorrow with your click to pay for random links on the internet you suddenly have 60,000 1 cent charges. They all appear to go different places and to get a refund you need to challenge each one.
It sounds like the digital version of the CD scam. https://viewing.nyc/nyc-scams-101-dont-get-fooled-by-the-cd-...
I think you're imagining this would be open to random individual bloggers, but that wouldn't solve the quality / clickbait / AI generation problem. Sure, individuals could scam, but they could also produce clickbait, low effort crap.
The context of this discussion is the high quality, paid, edited writing that is currently behind site-wide subscription paywalls at sites like the New York Times, Wall Street Journal, Financial Times, Economist, etc. It would be great to lower the barrier to entry for individual writers as far as possible, and maybe even include some sites that are run more like blogging platforms, but there would always have to be content standards and some degree of editorial control for reasons other than avoidance of scams, and with those things in place avoidance of scams is a non-issue because you're dealing with organisations that are trading on reputation. The New York Times isn't going to be defrauding its readers (and neither is Medium if it comes to that).
> The printing press democratized knowledge
That's true, but it also allowed protestant "heretics" to propagate an idea that caused a permanent schism with the Catholic church, which led to centuries of wars that killed who-knows-how-many people, up to recent times with Northern Ireland.
(Or something like that, my history's fuzzy, but I think that's generally right?)
I thought it was a king wanting a divorce, and as he couldn't get it from the catholic church, created his own.
Long before that in 1054, was the East - West Schism that split the Catholic and Orthodox churches. https://en.wikipedia.org/wiki/East%E2%80%93West_Schism
> LLM output is perfect white noise.
Not even close to white noise. White noise, in the context of the token space, looks like this:
auceverts exceptionthreat."<ablytypedicensYYY DominicGT portaelight\- titular Sebast Yellowstone.currentThreadrition-zoneocalyptic
which is literally the result of "I downloaded the list of tokens and asked ChatGPT to make a python script to concatenate 20 random ones".
No, the biggest problem with LLMs is that the best of them are simultaneously better than untrained humans and yet also nowhere near as good as trained humans — someone, don't remember who, described them as "mansplaining as a service", which I like, especially as it (sometimes) reminds me to be humble when expressing an opinion outside my domain of expertise, as it knows more than I do about everything I'm not already an expert at.
Specific example: I'm currently trying to use ChatGPT-3.5 to help me understand group theory, because the brilliant.org lessons on that are insufficient; unfortunately, while it knows infinitely more than I do about the subject, it is still so bad it might as well be guessing the multiple choice answers (if I let it, which I don't because that would be missing the point of using a MOOC like brilliant.org in the first place).
> Amazon had to stop publishing BOOKS because it can no longer separate the signal from the noise.
That's because they are trying very hard not to check what they are selling, hoping that their own users and a few ML algorithms can separate the signal from the noise for them. It seems to me that the approach is no longer working, and they should start doing it by themselves.
I really feel like you can't have used any advanced LLMs if you legitimately think the out put "perfect white noise". The results that you can get from an LLM like GPT-4 are incredibly useful and are providing an enormous amount of value to lots of people. It isn't just for generating phony information to spread or having it do your work for you.
I get the most value out of asking for examples of things or asking for basic explanations or intuitions about things. And I get so much value from this that I really think the printing press is the most apt comparison.
The problem is advanced LLMs are controlled by large corporations. Powerful local models exist (in part thanks to Meta's generosity oddly enough) and they're close to GPT-3.5, but GPT-4 is far ahead of them and by the time other models reach to that point whatever OpenAI or Antropic, Meta etc. have developed behind closed doors could be significantly better. In that case open models will be restricted to niche uses and most people will use the latest model from a giant corp.
So it is possible that LLMs will centralize the production and dissemination of knowledge, which is the opposite of what people think the printing press did. I hope I'm wrong and open models can challenge/overtake state of the art models developed by tech giants, that would be amazing.
What you say is not in conflict with AI-generated content being white noise. Even if you find some piece of AI-generated content useful, it is still white noise if it is merely combining pieces of information found in its dataset and the result is posted online or published elsewhere. There is no signal being added in that process, and it pollutes the space of content. Humans are also prone to doing this, but with the help of AI, it becomes a much larger issue.
"Signal" would mean new data, which is by definition not possible via LLMs trained on publicly available content, since that means the data is already out there, or new and meaningful ideas or innovations beyond just combining existing material. I have not seen LLMs accomplish the latter. I consider it at least possible that they are capable of such a feat, but even then the relevant question would be how often they produce such things compared to just rearranging existing content. Is the proportion high enough that unleashing floods of AI-generated content everywhere would not lower the signal-to-noise ratio from the pre-AI situation?
> the worst part is that it’s impossible to distinguish it from real human output
Doesn't that make human content look bad in the first place?
If we can't distinguish a Python book written by a human engineer or by ChatGPT, how can we demonstrate objectively that the machine-generated one is so much worse?
That argument might work for content which serves a purely informational purpose, such as books teaching the basics of programming languages, for instance, but it doesn't work for art (e.g. works of fiction) because most of the potential for a non-superficial reading of a work relies on being able to trust that there is an author that has made a conscious effort to convey something through that work, and that that something can be a non-obvious perspective on the world that differs from that of the reader. AI-generated content does not have any such intent behind it, and thus you are effectively limited to a superficial reading, or if were to instist on assigning such intent to AI, then at most you would have one "author" per AI model, which additionally has no interesting perspectives to offer, simply those perspectives deemed acceptible in the culture of whatever group of people developed the model, no perspective that could truly surprise or offend the reader with something they had not yet considered and force them to re-evaluate their world view, just a bland average of their dataset with some fine tuning for PR etc. reasons.
The problem is not that no one can distinguish it. It's that the intended audience (beginners in Python in your example) can't distinguish it and are not able to easily find and learn from trusted sources.
We can distinguish it. That's what publishers and editors do. It's also what book buyers for book chains used to do. Reviewers, writing for reputable publications, with their own editors and publishers, as well.
Humans, examining things, and putting a reputation that matters on the line to vouch for it.
The fact that Amazon doesn't want to have smart, contextually aware humans look at and evaluate everything people propose to offer up for sale on their storefront doesn't mean it can't be done. Same as how Google doesn't want to look at every piece of content uploaded to YouTube to figure out if it's suitable for kids, or includes harmful information. That's expensive, so they choose not to do it.
The LLM does democratize knowledge, but you have to be the user of the LLM, not the target of the user of the LLM.
The LLM is the most powerful knowledge tool ever to exist. It is both a librarian in your pocket. It is an expert in everything, it has read everything, and can answer your specific questions on any conceivable topic.
Yes it has no concept of human value and the current generation hallucinates and/or is often wrong, but the responsibility for the output should be the user's, not the LLM's.
Do not let these tools be owned, crushed and controlled by the same people who are driving us towards WW3 and cooking the planet for cash. This is the most powerful knowledge tool ever. Democratize it.
Asking a statistics engine for knowledge is so unfathomable to me that it makes me physically uncomfortable. Your hyperbolic and relentless praise for a stochastic parrot or a "sentence written like a choose your own adventure by an RNG" seems unbelievably misplaced.
LLMs (Current-generation and UI/UX ones at least) will tell you all sorts of incorrect "facts" just because "these words go next to each other lots" with a great amount of gusto and implied authority.
> but the responsibility for the output is the user's, not the LLM's.
The current iteration of the internet (more specifically social media) has used the same rationality for its existence but at a level, society has proven itself too irresponsible and/or lazy to think for itself but be fed by the machine. What makes you think LLMs are going to do anything but make the situation worse? If anything, they’re going to reenforce whatever biases were baked into the training material, of which is now legally dubious.
> and can answer your specific questions on any conceivable topic
Yeah, I mean, so can I, as long as you don't care whether the answers you receive are accurate or not. The LLM is just better at pretending it knows quantum mechanics than I am.
For a librarian, they’re confidently asserting factual statements suspiciously often, and refer me to primary literature shockingly rarely.
If you asked the Church back then, they would tell you that the printing press was the death of truth, because to them only the word of god was truth, and only the church could produce it.
It's all just a matter of perspective.
Yes, right now it looks like white noise, just like back then it looked like white noise which could drown out the religious texts. But we managed to get past it then and I'm sure we'll manage now.
This is an astoundingly bad take. Surely you aren't trying to suggest that original, factual, human-authored content has no more inherent value than randomly generated nonsense?
I'd argue that giving a group with unique thoughts and ideas a voice is different than creating a noise machine.
The printing press made books cheap relative to hand copied books, but they were still expensive for most people.
Before the printing press two books cost around the same as a 2 story cottage.
Afterwards a couple books would be about a month of wages for a skilled worker.
That greatly limits ones ability to drown out anything with books.
> The printing press democratized knowledge.
Not for centuries. Due to the expense of the technology and the requirement in some locations for a royal patent to print books, the printing press just opened up knowledge a bit more from the Church and aristocracy to the bourgeoisie, but it did little for the masses until as late as the 1800s.
A big part of this is that literacy didn’t come to the masses until the 1800s. But in England and the Netherlands you had (somewhat) free press by the late 1600s and early 1700s.
I'm reminded of the Library of Babel
I was told publishers dont promote a good book anymore these days. They ask how many instagram followers do you have?
Maybe the self-publishing and BoD will decline in the long term due to ML white noise and publishers are a sign of quality again.
You could argue that speech is literally noise that drowns out the signals of your environment. If you just babbled, it would be useless, but instead you use it intelligently to communicate ideas. LLM output is a new palette with which humans can compose new signals. We just have to use it intelligently.
Prompt engineering is an example of this. A clever prompt by a domain expert can prime an LLM interaction to yield better information to the recipient in a way that the recipient themselves could not have produced on their own.
People comparing the AI bullshit spigot to the printing press are clowns.
It used to be that a scribe would painstakingly copy a manuscript, through the process absorbing the text at a deep level. This same scribe could then apply this knowledge to his own writing, or just understand and curate existing work. The manual labor required to copy at scale employed many scribes, who formed the next generation of thinkers.
With the press, a greasy workman can churn out hundreds of copies an hour, for whichever charlatan or heretic palms him enough coin. The people are flooded with falsehoods by men whose only interest in writing is how many words they can fit on a page, and where to buy the cheapest ink.
The worst part is that it is impossible to distinguish the work of a real thinker from that of a cheap sophist, since they are all printed on the same rough paper, and serve equally well as tomorrow's kindling.
Where are the good AI-generated books that serve as the positive side of this development?
>> So maybe the Catholic Church had a point
Is that really the take-away? If the Catholic Church had not been so belligerent, those wars would not have been needed. Now that we are past that time, we should surely be thanking those combatants who helped disseminate knowledge in spite of the Church whose interest was in hoarding it.
I think that's a pretty bad reading of history frankly. The Church didn't hoard knowledge, in fact they were arguably the primary preservers of knowledge and disseminator of it, through the monastic tradition in Medieval Europe. Many thousands of which were destroyed during the religious wars, which is a common theme as far sectarian wars go. They are first and foremost destroyers of knowledge.
More importantly I certainly wouldn't want to live through that period for any reason, and much less repeat it. If an ordinary printing press caused that much chaos I'm not sure I want to figure out what one on steroids is going to do
That was the solution that came to mind to me too, but it doesn't work either.
Even if you're never online and only talk to people in person... over time those people will be increasingly informed by LLM-generate pseudo-knowledge. We aren't just training the AIs. They're training us back.
If you want to live in a society where the people you interact with have brains mostly free of AI-generated pollution, then I'm sorry but that world isn't going to be around much longer. We are entering the London fog era of the Information Age.
I don't trust my friends for medical advice. Some of them trust me for plant advice, and they really probably shouldn't. I am very stove-piped.
We have two and a half generations of people right now most of whom think "I did the research" means "I did half as much reading as the average C student does for a term paper, and all of that reading was in Google."
And Alphabet fiddles while Google burns. This is going to end in chaos.
> "I did the research" means "I did half as much reading as the average C student does for a term paper
What's the alternative? No one who says that is saying they did original research, they're saying they searched around and got what they believe to be at least a consensus among the body of experts they trust.
Like I agree the problem sucks but I have no idea what a solution looks like. For fields someone is totally unfamiliar with they simultaneously don't have enough knowledge to evaluate the truth of a claim nor the knowledge to evaluate if someone is qualified and trustworthy enough to believe them. It's turtles all the way down -- especially because topics of any interest you can find as many experts as you care to of whatever qualification you demand making all sorts of contradictory claims.
> This is only a problem for someone terminally online.
Is it? Even those whose social life is entirely IRL, they still have to increasingly interact with various businesses, banks, healthcare providers, the government, and often more distant collegues through online services. Do I want these to go through LLM chatbots? No. Can I ensure that I'm speaking to an actual human if the communication is text based? Not really.
This is a problem for anyone who is not actively vigilant about the information they consume. A family member (who I would not describe as "terminally online") came to me today in a panic talking about how some major event had just occurred and how social order was beginning to collapse. I quickly glanced at the headlines on a few major news outlets and realized that they just saw some incendiary content designed to elicit that reaction. I calmed them down and walked them through a process they could use to evaluate information like that in the future, and they were a little embarrassed.
The concern isn't necessarily for you. It's for the large swaths of people who are less equipped to filter through noise like this.
Using meatspace doesn't solve the problem, using meatspace exclusively solves the problem. And it's not a great one given, you know, how much of the world "happens" online now.
It's what ad men do. Point out there's a problem, offer you the solution.
>shows only 3 books now
Those appear to be by different authors with similar names: https://www.amazon.com/s?k=%22tom+lesley%22
Amazon has no reason to give a shit about piracy on KDP: they make money either way. But having a load of AI generated garbage on your platform makes it far less valuable. You want your stolen books to actually be good. :P
Possibly it's the author removing them at the first one star rating to keep their author score high?
Allowing the use of tools to modify the contents erases any clear distinction between the categories.
That’s really weird. People are making all kinds of books and stories. And stories are relevant to their time. The matrix wouldn’t be written in 1900, a tale of two cities wouldn’t be written in 1200, …
It is true though that if you have a culturally diverse set of friends and are open to their experiences and opinions, a lot of “the classics” start to smell bad. Imagine being black and reading Grapes of Wrath. You might think the situation of the main characters as humorous or infantile, considering how relatively fortunate they are.
> What's the name of the law where the longer something has already been around, the longer it will likely stay around in the future?
The Lindy effect.
> It’s garbage content and immediately recognizable as being AI generated.
It's also recognizable by its sheer volume. An "author" who submits several new books every day is clearly not doing their own writing. The AI publishing scam relies on volume -- they can't possibly win on quality, but they're hoping to make up for that by putting so many garbage books on the market that buyers can't find anything else.
I'm not sure. Ghostwriting exists, and a person (or organization) with enough money could easily pay enough ghostwriters to output at a more than human pace.
> It doesn’t matter. It’s garbage content and immediately recognizable as being AI generated.
Is it? How do you immediately recognize a book as AI generated before buying it, if the author isn't doing something silly like releasing several books per day/month? And even after you buy a book, how can you distinguish between the book just being terrible and the book being written with extensive use of AI? I don't believe AI can write good books, but I would still like to distinguish those two cases, since the former is just a terrible book, which is perfectly fine, while the latter I would like to avoid. I don't want to waste my limited time reading AI content.
>It is absolutely possible to write a good article or even a good book with AI, but at least for now it’s just as hard, if not harder, than doing it without AI.
How hard is it though, to create a shitty book with AI, that Amazon can't detect was written with AI?
Yea, but the Turning Test is actively being assaulted. Soon we won't know the difference between an uninspired book written by an AI and an uninspired book written by a human.
Then it's good for fiction. Lots of demand for fiction.
> If it's a publisher or series you can find printed on the shelf, books are legit.
Not even that is a guarantee, there have been cases of rip-offs making it through a bunch of book-on-demand services.
All "marketplaces" allowing third parties unlimited, unmonitored access to product listings suffer from that issue.
also, just doing your research on any platform other than Amazon helps.
Many beginners do start at python.org. However, if you don't know anything about programming, and you don't know someone who can answer all the little questions that come up, it's really hard to learn from documentation alone. Even the official Python tutorial is fairly inaccessible to many people who are trying learn a language for the first time.
Almost every Python author I've spoken with recognizes that no one resource works best for everyone. We each write to offer our particular take on a subject, and hope to find an audience that our perspective resonates with. I've never steered people away from documentation; in fact one of my goals is to steer people to the sections of documentation that they're ready to make sense of. One of my end goals is that people no longer need me as a teacher. That was my goal as a classroom teacher, and it's one of my goals as an author.
The idea that there are no mistakes in official documentation is pretty unrealistic. Technical documentation has certainly improved over the last decade or so, but it will never be perfect. Most of us recognize that some areas of programming are better handled by third party libraries. In a similar way, there will always be room for learning resources that are maintained outside of official documentation sources.
I didn't claim the official docs have no mistakes.
Since there's only one documentation, beginners can't get wrong with which docs to use.
Ad opposed to books, which have tons of bad choices available (hence the current discussion).
Are you suggesting people just go read the documentation like an encyclopedia? I don’t know a single person who got their start programming by doing that - just about everyone wants some sort of guide to help lead them in good directions.
I did. On Windows, Python had (still have?) a good offline help. And it included a nice getting started tutorial. The only book I had was “The C Programming Language”. But they ignited my interest enough to start researching, and I landed on the "Site du Zero" (now OpenClassrooms) platform. The web was sparser, but better, in these days (2010).
That's more or less exactly how I learned to program. From books, with a few friends. Only after it got to a certain level and I started frequenting more places where we met other people working with computers some of which were professional programmers.
I still have some of them. They've aged surprisingly well.
Do the official docs even have tutorials? I'd send beginners to Khan Academy instead.
Yeah, https://docs.python.org/3/tutorial/index.html. But I would say it is good for those who already know another programming language and not for complete beginners.
I guess book authors don't like my perspective...
dev.to is blocked on HN for this reason (try submitting a dev.to link; it won't appear under New.)
There's an old thread where dang explains that it's blacklisted (along with many many other sites) due to the consistently poor article quality.
> try submitting a dev.to link; it won't appear under New.
I think you'll see it if you're logged in and have showdead turned on.
Conversely if you post something sophisticated there it will likely bomb. A bunch of emojis and explaining JS closures for the hundredth time. Does well!
It should be a term of service that you’re not allowed to interfere with other customer’s listings.
If I found out one of the tenants on my multi tenant system was trying to mess with another’s, I would be livid.
The great thing about filtering is that you don't have to hear the screams.
These accidents play out in slow motion until someone corners you at a family reunion and asks why their friends can't create accounts and when you ask them how long they say "months".
You'd think... but in a growing b2b company, the CRM is where sales get prevented under a certain threshold. heh.
LLMs only ever accidentally generate useful content. They fundamentally can't know whether the things they're outputting are true, they just tend to be, because the training data also tends to be.
Yay, politics in my business soup. That'll generate a quality outcome for my customers!
The politics are ephemeral, the results matter.
Human decency transcends your asinine, barely disguised political talking points.
> Some people value their time, energy, and money more.
More highly "curated" media providers have almost always been the least-efficient, most-costly, and least-satisfying for me.
Buying physical books at a bookstore has typically been a costly waste of time, with the selection being poor, and it requiring time, money, vehicle wear, etc., to actually get to the store.
Public libraries are often worse in terms of selection, and thanks to the ones where I am being funded via taxation, I'm stuck paying for them even if I don't use them.
Online and ebook sellers are somewhat better, although they can still be costly, and the delivery of physical books can take some time.
I've had much better success finding fiction and non-fiction content by doing some searches and seeing which random websites, forums, and other less-"curated" online resources I happen to run across.
It has been the same for video media, too.
OTA TV is relatively cheap, but the selection is so limited as to make it useless.
Cable and satellite TV have upfront costs, and then ongoing costs, plus a relatively limited selection of content available at any given time.
Paid online streaming providers have a cost, obviously, and I've found the selection to be quite poor.
Movie theatres are extremely costly for what you get, have a tremendously limited selection, and also involve significant travel and time costs.
Tape and disc rentals no longer exist today where I am, aside from public libraries. They had per-rental costs, late fees, travel costs, and very limited selection. As stated before, I pay for the library even if I don't use it.
YouTube, on the other hand, gives me a much better experience than the more "curated" providers. With just a minute or two of searching, I can find hours and hours worth of content to watch each evening, I can view this content with almost no delay, the cost is minimal, and the content is far more entertaining and informative than the more "curated" options.
Avoiding "curated" media providers has saved me a lot of time, energy, and money, in addition to providing me with much more enjoyable and useful content.
Isn’t it. It went from somewhere around 10th on my planting list to, “when I get really bored”.
> You would need to pay to read.
Now you're talking about a paywall.
Many news organizations are going in that direction but with subscriptions.
Doing it where you pay for each article is also psychologically hard. Since you are actually spending some money, the choice requires some mental effort. But since you haven't read the article, it's a choice whose value is very hard to estimate.
1. The value is hard to predict.
2. The cost is low.
3. The perceived maximum value is also likely low.
Are also sort of a worst case for the heuristics our brains have evolved to apply. It's difficult to get our brains to even put the mental effort into making the choice whether or not to buy the article.
A paywall is around an entire site. You have to pay for the whole thing or not at all, and you have you do it separately for every site. Subscriptions to the NYT, WSJ, Economist and the FT would cost a fortune, and I would read a minute fraction of the content. As a result I don't have subscriptions for any of them, and none of them get a penny from me. With a common system, paid with a single click per article, I and many others would happily rack up significant tallies reading individual articles across all these publications. It's a win/win.
I don't buy the argument that people wouldn't be bothered. A popup with a balance, cost for the article and yes/no button would be far less mental effort than I already spend finding how to refuse consent on tracking popups, and about the same effort as required for those that simply click the button to grant consent. If that were too much to expect people to do nobody would be reading any articles in the EU at present.
Henry VIII created the Church of England in 1534 for the purposes of granting himself an annulment. Most histories count Martin Luther's 95 Theses as beginning of the Reformation in 1517 (a crisp date for a less-than-crisp event; Luther did not originally see himself as protesting the Roman Catholic Church). The Protestant Reformation was a heterogeneous movement from the beginning.
Protestantism started in Germany with Martin Luther nailing his theses to a church door. Henry's reproductive problems came later and where only sort of related.
Not really, no. It was Luther who kick-started Protestantism. Henry VIII attempted to supplant the Pope, and kind of slid into Protestantism by accident.
That was the case just for the anglican church, which is only one "part" of the reformation.
Precisely. I spent weeks learning about cybersecurity when GPT-4 first came out, as I could finally ask as many stupid questions as I liked, get detailed examples and use-cases for different attacks and defenses, and generally actually learn how the internet around me works.
Now it refuses, because OpenAI's morals apparently don't include spreading openly available knowledge about how to defend yourself.
Scary. I have also been using it to generate useful political critiques (given a particular theoretical tradition, some style notes, and specific articles to critique, it's actually excitingly good). What if OpenAI decides that's a threat? What reason do we have to think that a powerful institution would not take this course of action, in the cold light of history?
how do you know what you learnt wasn't completely made up gibberish?
Aren't there already bad Python books written by humans?
I bet ChatGPT can come up with above-average content to teach Python.
We should teach beginners how to prompt engineer in the context of tech learning. I bet it's going to yield better results than gate-keeping book publishing.
There are, but it used to take actual time and effort to produce a book (good or bad), meaning that the small pool of experts in the world could help distinguish good from bad.
Now that it’s possible to produce mediocrity at scale, that process breaks down. How is a beginner supposed to know whether the tutorial they’re reading is a legitimate tutorial that uses best practices, or an AI-generated tutorial that mashes together various bits of advice from whatever’s on the internet?
Another great contribution would be fine-tuning open source LLMs on less popular tech. I've seen ChatGPT struggling with htmx, for example (I presume the training dataset was small?), whereas it performs really well teaching React (huge training set, I presume)
If beginners in Python programming are not capable of visiting python.org, assuming they are genuinely interested in learning Python, it would be very questionable how good their knowledge on the subject can really be.
I've seen many developers using technologies without reading the official documentation. It's insane. They make mistakes and always blame the tech. It's ludicrous...
My mind is blown that someone gets so little value out of an LLM. I get over software engineering stumbling blocks much faster by interrogating an LLM's knowledge about the subject. How do you explain that added value? Are you skeptical that I am actually moving and producing things faster?
My mind is also blown by how much people seemingly get out of them.
Maybe they’re just orders of magnitude more useful at the beginning of a career, when it’s more important to digest and distill readily-available information than to come up with original solutions to edge cases or solve gnarly puzzles?
Maybe I also simply don’t write enough code anymore :)
This happened to me looking up am obscure c library. It just confidently made up a function that didn't actually exist in the library. It got me unstuck but you can really fuck yourself if you trust it blindly.
I agree with you but at what point does it change? Aren’t we all just stochastic parrots? How do we ourselves choose the next word in a sentence?
In my view, one big learning from LLMs is that yes, more often than not we are just stochastic parrots. And more often than not that's enough!
But sometimes we're more than that: Some types of deep understanding aren't verbal or language-based, and I suspect that these are the ones that LLMs will have the hardest time getting good at. That's not to say that no AI will get there at all, but I think it'll need something fundamentally different from LLMs.
For what it's worth, I've personally changed my mind here: I used to think that the level of language proficiency that LLMs demonstrate easily would only be possible using an AGI. Apparently that's not the case.
If you wish to make an apple pie, first you must make the universe from scratch. (carl sagan)
We can generate thoughts that are spatially coherent, time aware, validated for correctness and a whole bunch of other qualities that LLMs cannot do.
Why would LLMs be the model for human thought, when it does not come close to the thoughts humans can do every minute of every day?
Aren't we all just stochastic parrots, is the kind of question that requires answering an awful lot about the universe before you get to an answer.
We use languages to express ideas. Sentences are always subordinate to the ideas. It's very obvious when you try to communicate in another language you're not fluent in. You have the thought, but you can't find the words. The same thing happens when writing code, taking ideas from the business domain and translating it into code.
God dammit please stop comparing these things to brains. Stop it. It's not even close.
Even if a human expert responds about something in their domain of expertise, you have to think critically about the answer. Something that fails 1% of the time is often more dangerous than something that fails 10% of the time.
The best way to use an LLM for learning is to ask a question, assume it's getting things wrong, and use that to probe your knowledge which you can iteratively use to prove the LLM's knowledge. Human experts don't put up with that and are a much more limited resource.
In other words they behave like a human?
That's Wittgenstein's argument.
No not at all, I'm not sure why you would even think that.
As I read it, your parent comment suggests that the distinction in quality and utility between human-authored and AI-generated content is merely "a matter of perspective", i.e. that there is no real distinction, and that they're both equally valuable.
If you actually meant something else, you should probably clarify.
The discussion here is that we're not able to distinguish them.
If we cannot distinguish, I'd argue they have similar value.
They must have. Otherwise, how can we demonstrate objectively the higher value in the human output?
They can be distinguished. They are just becoming more difficult to. Its slightly-more difficult, but also the amount of garbage is overwhelming. AI can spit out entire books in moments that would take an individual months or years to write.
There are lots of fake recipe books on amazon for instance. But how can you really be sure without trying the recipes? It might look like a recipe at first glance, but if its telling you to use the right ingredients in a subtly-wrong way, its hard to tell at first glance that you won't actually end up with edible food. Some examples are easy to point at, like the case of the recipe book that lists Zelda food items as ingredients, but they aren't always that obvious.
I saw someone giving programming advice on discord a few weeks ago. Advice that was blatantly copy/pasted from chat GPT in response to a very specific technical question. It looked like an answer at first glance, but the file type of the config file chat GPT provided wasn't correct, and on top of that it was just making up config options in attempt to solve the problem. I told the user this, they deleted their response and admitted it was from chatGPT. However, the user asking the question didn't know the intricacies of "what config options are available" and "what file types are valid configuration files". This could have wasted so much of their time, dealing with further errors about invalid config files, or options that did not exist.
A piece of human-written content and a piece of AI-written content may have similar value if we cannot distinguish between them. But if you can add the information that the human-written content was written by a human to the comparison, the human-written content becomes significantly more valuable, because it allows for a much deeper reading of the text, since the reader can trust that there has been an actual intent to convey some specific set of ideas through the text. This allows the reader to take a leap of faith and put in the work required to examine the author's point of view, knowing that it is based on the desires and hopes of an actual living person with a lifetime of experience behind them instead of being essentially random noise in the distribution.
I'm not a native English speaker, but ChatGPT answers in each interaction I had with it sound bland. And I dislike the bite-sized format of it. I'm reading "Amusing Ourselves to Death" by Neil Postman and while you may agree or disagree with his take, he developed it in a very coherent way, exploring several aspects. ChatGPT's output falls into the same uncanny valley as the robotic voice from text to speech software, understandable, but no human does write that way.
ChatGPT as an autocompletion tool is fine, IMO. As well as generating alternative sentences. But anything longer than a paragraph falls back to the uncanny valley.
If you ask LLM something you know you can distinguish noise from good output. If you ask LLM something you don’t know then how do you know if the output is correct?
There are cases where checking is easier than producing the result, e.g. when you ask for a reference.
I can't distinguish between pills that contain the medicine that I was prescribed and those than contain something else entirely. Therefore taking either should be just as good.
If they were of similar value would there be a problem with the deluge?
I think the jury is still out on whether an LLM produces ideas any more or less unique than most humans. :)
Even at their most prolific, a ghostwritten author still probably wouldn't publish more than one or two books a month. Beyond that point, you're just competing with yourself. (For instance, young adult series like Goosebumps, The Baby-Sitters Club, or Animorphs typically published a book every month or two.)
Publishing multiple books per day is out of the question. That's beyond even what's reasonable for an editor to skim through and rubber-stamp.
> That's more or less exactly how I learned to program. From books
What kind of books? The person you're replying to is arguing in favor of books, but saying that the documentation in particular is not a good one to start with.
I think you are assuming disagreement where there is none.
You may consider it asisine and political, but the results are measurable. Amazon delivers AI generated whatever and a better filter system doesn't do that.
Politic and insult to your hearts content.
The results are quantifiable and qualitatively measurable. You'know, like, science.
What do you want me to say?
If Amazon can fix the flood of garbage, then good, I don't care. I'll shut up.
All the politics is coming from your side of the table, and all the discussion about measurable results is coming from my side of the table. Whatever happens, let's make that distinction razor sharp and clear.
> Subscriptions to the NYT, WSJ, Economist and the FT would cost a fortune, and I would read a minute fraction of the content. As a result I don't have subscriptions for any of them, and none of them get a penny from me.
I would question whether that is actually the economically rational choice. Even if you read only a fraction of the content, the value you receive from the fraction you do read may still justify the cost. You probably only watch a fraction of all videos on Netflix yet still don't mind paying for it.
Also, there is a positive value in supporting a journalistic institution because you also derive benefit from other people reading their articles because it means you get to live in a better-informed world.
> With a common system, paid with a single click per article, I and many others would happily rack up significant tallies reading individual articles across all these publications. It's a win/win.
Now you're talking micropayments. People have also been trying those startups since the 90s. They have basically all failed. There is Twitch Bits, I guess, but I don't know how often that's used or whether I'd call the kinds of content that Twitch's structure incentivizes to be a good thing.
Not all problems can be solved with technology. Or, at least, with technology that only works if you assume that humans are rationally economical spherical actors in a vacuum.
The same way you know that the things you learn from a person isn't made up gibberish: You see how well it explains a scenario, how well it lines up with your knowledge and experience, and you sample parts to verify.
how do you know they didn't learn from the garbage generator too?
Because it works.
Personally I don't subscribe to the "best practices" expression. It implies an absolute best choice, which, in my experience, is rarely sensible in tech.
There are almost always trade-offs and choosing one option usually involves non-tech aspects as well.
Online tutorials freely available very rarely follow, let's say, "good practices".
They usually omit the most instructive parts, either because they're wrapped in a contrived example or simplify for accessibility purposes.
I don't think AI-generated tutorials will be particularly worse at this to be honest...
I'm very far from the beginning of my career, but maybe I see a point in your comment, because I frequently try technologies that I am not an expert in.
Just yesterday, I asked if Typescript has the concept of a "late" type, similar to Dart, because I didn't want to annotate a type with "| null" when I knew it would be bound before it was used. Searching for info would have taken me much longer than asking the LLM, and the LLM was able to frame the answer from a Dart perspective.
I would say that that information neither "important to digest" nor "readily available."
Ah yes, gathering information in a particular unfamiliar area probably describes it better.
For me, it's been able to give very good answers when they were within the first few Google results when searched for using the proper terms (but the value is in giving you these terms in the first place!).
For questions from my field, it's been wildly hallucinating and producing half-truths, outdated information, or complete nonsense. Which is also fair, because the documentation where the answers could be found is often proprietary, and even then it's either outdated or outright wrong half of the time :)
I am not the person to whom you replied. I understood their comment to be about paradigms shifting through social awareness of the limits and opportunities of new technology.
It can be both true that right now predominantly low quality content emanates from LLMs and at some future time the highest quality material will come from those sources. Or perhaps even right now (the future is already here, just unevenly distributed).
If that was their reasoning, I tend agree. The equivalent of the Catholic Church in this metaphor is the presumption human-generated content's inherent superiority.
LLMs are inherently approximations of collective knowledge. They will never be better than their training sets. It's a statistical impossibility.
Suggesting clarification to suit your imaginary inferences seems puzzling. The parent post pointed out that perspectives on authorship have a historical precedent, I didn’t see the value judgement your reading suggested.
> Some examples are easy to point at, like the case of the recipe book that lists Zelda food items as ingredients
As an aside, the case you're thinking of was a novel, not a recipe book. Still embarrassing, but at least it was just a bit of set dressing, not instructions to the reader.
> I saw someone giving programming advice on discord a few weeks ago. Advice that was blatantly copy/pasted from chat GPT in response to a very specific technical question.
This, on the other hand, is a very real and a very serious problem. I've also seen users try to get ChatGPT to teach them a new programming language or environment (e.g. learning to use a game development framework) and ending up with some seriously incorrect ideas. Several patterns of failure I've seen are:
1) As you describe, language models will frequently hallucinate features. In some cases, they'll even fabricate excuses for why those features fail to work, or will apologize when called out on their error, then make up a different nonexistent feature.
2) Language models often confuse syntax or features from different programming languages, libraries, or paradigms. One example I've heard of recently is language models trying to use features from the C++ standard library or Boost when writing code targeted at Unreal Engine; this doesn't work, as UE has its own standard library.
3) The language model's body of "knowledge" tends to fall off outside of functionality commonly covered in tutorials. Writing a "hello world" program is no problem; proposing a design for (or, worse, an addition to) a large application is hopeless.
> The language model's body of "knowledge" tends to fall off outside of functionality commonly covered in tutorials. Writing a "hello world" program is no problem; proposing a design for (or, worse, an addition to) a large application is hopeless.
Hard disagree. I've used GPT-4 to write full optimizers from papers that were published long after the cutoff date that use concepts that simply didn't exist in the training corpus. Trivial modifications were done after to help with memory usage and whatnot, but more often than not if I provide it the appropriate text from a paper it'll spit something out that more or less works. I have enough knowledge in the field to verify the corectness.
Most recently I used GPT-4 to implement the paper Bayesian Flow Networks, a completely new concept that I recall from the comment section on HN people said "this is way too complicated for people who don't intimately know the field" to make any use of.
I don't mind it when people don't find use with LLMs for their particular problems, but I simply don't run into the vast majority of uselessness that people find, and it really makes me wonder how people are prompting to manage to find such difficulty with them.
They can indeed distinguish them, I agree. So why the fuss?
I think the concern is that bad authors would game the reviews and lure audiences into bad books.
But aren't they already able to do so? Is it sustainable long term? If you spit out programming books with code that doesn't even run, people will post bad reviews, ask for refunds. These authors will burn their names.
It's not sustainable.
It doesn't need to be sustainable as one author or one book. These aren't real authors. Its people using AI to make a quick buck. By the time the fraud is found out, they've already made a profit.
They make up an authors name. Publish a bunch of books on a subject. Publish a bunch of fake reviews. Dominate the search results for a specific popular search. They get people to buy their book.
Its not even book specific, its been happening with actual products all over amazon for years. People make up a company, sell cheap garbage, and make a profit. But with books, they can now make the cheap garbage look slightly convincing. And the cheap garbage is so cheap to produce in mass amounts that nobody can really sort through and easily figure out "which of these 10k books published today are real and which are made up by ai".
It takes time and money to produce cheap products at a factory. But once these scammers have the AI generation setup, they can just publish books on loop until someone ends up buying one. They might get found out eventually, and they will have to pretend to be a different author, and they just repeat the process.
What’s the fuss about spam? You can distinguish it from useful mail? What’s the fuss about traffic jams? You’ll get there eventually.
The LLM allow DDoS attack by increasing the threshold needed to check the books for gibberish.
It’s not like this stream of low quality did not exist before, but the topic is hot and many grifters try LLMs to get a quick buck at the same time.
It’s sustainable if you can automate the creation of amazon seller accounts. Based on the number of fraudulent Chinese seller accounts, I’d say it’s very likely automated or otherwise near 0 cost.
I totally agree. So why are people so worried about books being written by ChatGPT?
These pseudo-authors will get bad reviews, will lose money in refunds, burn their names.
It's not sustainable. Some will try, for sure, but they won't last long.
There's too many names and it's too cheap to do this.
The equilibrium shifts to making it much harder to find good books, and that was already hard enough.
Book buyers should give themselves primarily by who's the author, I think.
Choose a book from someone that has a hard earned reputation to protect.
There is bootstrapping process of learning which authors in that field have good reputation before you know anything about the field. That is being disrupted by LLMs as well, though.
Really. Are you comparing a complex chemical analysis required to attest the contents of a pill to reading text?
It depends, is the text of a technical nature? How exactly is one to know they're being deceived if, to take one of the examples that has been linked in this discussion, they receive a mushroom foraging guide but the information is actually AI-generated?
Can't the deluge be delusional or an overreaction at best?
I wasn't assuming it, but I felt like directly asking wouldn't make my comment any clearer.
The problem is "That's more or less exactly how I learned to program." sounds like it's a response to the question in the comment you replied to. Which would mean there's either disagreement or you didn't understand the question. And the way you mentioned other people sounded like you might be interpreting "guide" as a human guide.
I guess instead you were skipping the question and talking about the second sentence?
In that case I recommend quoting which sentence you're replying to in a situation like this.
You are literally describing the fundamental problem of truth in philosophy and acting as if it's different because a computer is involved at one step in the chain.
You first check who published it. Is the author an expert in the matter with years, perhaps decades in the industry?
Heck, we always did that since before GPT.
Good authors will continue to publish good content because they have a reputation to protect. They might use ChatGPT to increase productivity, but will surely and carefully review it before signing off.
"We" certainly did not "always" do that before.
Really? You buy books without searching anything about who wrote it?
If yes, well, there's the problem then. It's not AI, but the lack of guidance and research skills in support of the process of choosing a book.
Is it about "me" specifically? Anyway, how do I know the biography of the author I find isn't also AI-generated at this rate? Or that the purported author actually wrote the book? Your solution still ultimately depends on there being non-generated information somewhere down the line.
I thought it would be pretty obvious that one should look for biographical data from external, independent sources. If the person has earned a reputation in any industry, they'll probably have articles on respectable publishers, would have presented in conferences, maybe even have patents, etc. Just Google their name. Then Google what's associated with them. If one doesn't find anything solid, discard the book. It takes no more than 5-10 minutes to recognize a solid reputation like that.