Are We Watching The Internet Die?

Sometime this month, Reddit will go public at a valuation of $6.5bn. Select Redditors were offered the chance to buy stock at the initial listing price, which it hasn’t announced yet but is expected to be in the range of $31-34 per share. Regardless of the actual price, I wouldn’t be surprised if Reddit shares quickly fall below the IPO price, based on the fact that Reddit is an absolute dog of a company, losing $90.8 million on $804 million of revenue in 2023 and never having turned a profit. Reddit's S1 (the initial registration form for taking a company public) laughably claims that advertising on the site is "rapidly evolving" and that it is "still in the early phases of growing this business," with "this business" referring to one that Reddit launched 15 years ago.

The Reddit IPO is one of the biggest swindles in corporate history, where millions of unpaid contributors made billions of posts so that CEO Steve Huffman could make $193 million in 2023 while laying off 90 people and effectively pushing third party apps off of the platform by charging exorbitant rates for API access, which in turn prompted several prolonged “strikes” by users, with some of the most popular subreddits going silent for a short period of time. Reddit, in turn, effectively “couped” these subreddits, replacing their longstanding moderators with ones of its own choosing — people who would happily toe the party line and reopen them to the public.

None of the people that spent hours of their lives lovingly contributing to Subreddits, or performing the vital-but-thankless role of moderation, will make a profit off of Reddit's public listing, but Sam Altman will make hundreds of millions of dollars for his $50 million investment from 2014. Reddit also announced that it had cut a $60 million deal to allow Google to train its models on Reddit's posts, once again offering users nothing in return for their hard work.

Huffman's letter to investors waxes poetic about Redditors' "deep sense of ownership over the communities they create," and justifies taking the company public by claiming that he wants "this sense of ownership to be reflected in real ownership" as he offers them a chance to buy non-voting stock in a company that they helped enrich. Huffman ends his letter by saying that Reddit is "one of the internet's largest corpuses of authentic and constantly updated human-generated experience" before referring to it as the company's "data advantage and intellectual property," describing Redditors' posts as "data [that] constantly grows and regenerates as users converse."

We're at the end of a vast, multi-faceted con of internet users, where ultra-rich technologists tricked their customers into building their companies for free. And while the trade once seemed fair, it's become apparent that these executives see users not as willing participants in some sort of fair exchange, but as veins of data to be exploitatively mined as many times as possible, given nothing in return other than access to a platform that may or may not work properly.

This is, of course, the crux of Cory Doctorow's Enshittification theory, where Reddit has moved from pleasing users to pleasing its business customers to, now, pleasing shareholders at what will inevitably be the cost of the platform's quality.

Yet what's happening to the web is far more sinister than simple greed, but the destruction of the user-generated internet, where executives think they've found a way to replace human beings making cool things with generative monstrosities trained on datasets controlled and monetized by trillion-dollar firms.

Their ideal situation isn't one where you visit distinct websites with content created by human beings, but a return to the dark ages of the internet where most traffic ran through a series of heavily-curated portals operated by a few select companies, with results generated based on datasets that are increasingly poisoned by generative content built to fill space rather than be consumed by a customer.

The algorithms are easily-tricked, and the tools used to trick them are becoming easier to use and scale.

And it's slowly killing the internet.

Degenerative AI

After the world's governments began their above-ground nuclear weapons tests in the mid-1940s, radioactive particles made their way into the atmosphere, permanently tainting all modern steel production, making it challenging (or impossible) to build certain machines (such as those that measure radioactivity). As a result, we've a limited supply of something called "low-background steel," pre-war metal that oftentimes has to be harvested from ships sunk before the first detonation of a nuclear weapon, including those dating back to the Roman Empire.

Generative AI models are trained by using massive amounts of text scraped from the internet, meaning that the consumer adoption of generative AI has brought a degree of radioactivity to its own dataset. As more internet content is created, either partially or entirely through generative AI, the models themselves will find themselves increasingly inbred, training themselves on content written by their own models which are, on some level, permanently locked in 2023, before the advent of a tool that is specifically intended to replace content created by human beings.

This is a phenomenon that Jathan Sadowski calls "Habsburg AI," where "a system that is so heavily trained on the outputs of other generative AIs that it becomes an inbred mutant, likely with exaggerated, grotesque features." In reality, a Habsburg AI will be one that is increasingly more generic and empty, normalized into a slop of anodyne business-speak as its models are trained on increasingly-identical content.

LinkedIn, already a repository of empty-headed corpo-nonsense, already lets users write generate messages, profiles and job descriptions using AI, and anything you create using these generative features is immediately fed back into Azure's OpenAI models owned by its parent company Microsoft, which invested $10 billion in OpenAI in early 2023. While LinkedIn is yet to introduce fully-automated replies, Chrome extensions already exist to flood the platform with generic responses, feeding more genericisms into the mouth of Microsoft and OpenAI's models.

Generative AI also naturally aligns with the toxic incentives created by the largest platforms. Google's algorithmic catering to the Search Engine Optimization industry naturally benefits those who can spin up large amounts of "relevant" content rather than content created by humans. While Google has claimed that their upcoming "core" update will help promote "content for people and not to rank in search engines," it’s made this promise before, and I severely doubt anything meaningfully changes. After all, Google makes up more than 85% of all search traffic and pays Apple billions a year to make Google search the default on Apple devices.

And because these platforms were built to reward scale and volume far more often than quality, AI naturally rewards those who can find the spammiest ways to manipulate the algorithm. 404 Media reports that spammers are making thousands of dollars from TikTok's creator program by making "faceless reels" where AI-generated voices talk over spliced-together videos ripped from YouTube, and a cottage industry of automation gurus are cashing in by helping others flood Facebook, TikTok and Instagram with low-effort videos that are irresistible to algorithms.

Amazon's Kindle eBook platform has been flooded with AI-generated content that briefly dominated bestseller lists, forcing Amazon to limit authors to publishing three books a day. This hasn't stopped spammers from publishing awkward rewrites and summaries of other people's books, and because Amazon's policies don't outright ban AI-generated content, ChatGPT has become an inoperable cancer on the body of the publishing industry.

"Handmade" goods store Etsy has its own AI problem, with The Atlantic reporting last year that the platform was now pumped full of AI-generated art, t-shirts and mugs that, in turn, use ChatGPT to optimize listings to rank highly in Google search. As a profitable public company, Etsy has little incentive to change things, even if the artisanal products on the platform are being crowded out by generative art pasted on drop-shipped shirts. eBay, on the other hand, is leaning into the spam, offering tools to generate entire listings based on a single image using generative AI.

The Wall Street Journal reported last year that magazines are now inundated with AI-generated pitches for articles, and renowned sci-fi publisher Clarkesworld was forced to close submissions after receiving an overwhelming amount of AI-generated stories. Help A Reporter Out used to be a way for journalists to find potential sources and quotes, except requests are now met with a deluge of AI-generated spam.

These stories are, of course, all manifestations of a singular problem: that generative artificial intelligence is poison for an internet dependent on algorithms.

There are simply too many users, too many websites and too many content providers to manually organize and curate the contents of the internet, making algorithms necessary for platforms to provide a service. Generative AI is a perfect tool for soullessly churning out content to match a particular set of instructions — such as those that an algorithm follows — and while an algorithm can theoretically be tuned to evaluate content as "human," so can scaled content be tweaked to make it seem more human.

Things get worse when you realize that the sheer volume of internet content makes algorithmic recommendations a necessity to sift through an ever-growing pile of crap. Generative AI allows creators to weaponize the algorithms' weaknesses to monetize and popularize low-effort crap, and ultimately, what is a platform to do? Ban anything that uses AI-generated content? Adjust the algorithm to penalize videos without people's faces? How does a platform judge the difference between a popular video and a video that the platform made popular? And if these videos are made by humans and enjoyed by humans, why should it stop them?

Google might pretend it cares about the quality of search results, but nothing about search's decade-long decline has suggested it’s actually going to do anything. Google's spam policies have claimed for years that scraped content (outright ripping the contents of another website) was grounds for removal from Google, but even the most cursory glance at any news search shows how often sites thinly rewrite or outright steal others' content. And I can't express enough how bad (yet inevitable) the existence of the $40 billion Search Engine Optimization industry is, and how much of a boon being able to semi-automate the creation and optimization of content to the standards of an algorithm that Google has explained in exhaustive detail. While it's plausible that Google might genuinely try and fight the influx of SEO-generated articles, one has to wonder why it’d bother to try now after spending decades catering to the industry.

As we speak, the battle that platforms are fighting is against generative spam, a cartoonish and obvious threat of outright nonsense, meaningless chum that can and should (and likely will) be stopped. In the process, they're failing to see that this isn't a war against spam, but a war against crap, and the overall normalization and intellectual numbing that comes when content is created to please algorithms and provide a minimum viable product for consumers. Google's "useless" results problem isn't one borne of content that has no meaning, but of content that only sort of helps, that is the "right" result but doesn't actually provide any real thought behind it, like the endless "how to fix error code X" results full of well-meaning and plausibly helpful content that doesn't really help at all.

The same goes for Etsy and Amazon. While Etsy's "spam" is an existential threat to actual artisans building something with their hands, it's not actual spam — it's cheaply-made crap that nevertheless fulfills a need and sort of fits Etsy's remit. Amazon doesn't have any incentive to get rid of low-quality books that sell for the same reason that it doesn't get rid of its other low-quality items. People aren't looking for the best, they're looking to fulfill a need, even if that need is fulfilled with poorly-constructed crap.

Platforms likely conflate positioning with popularity, failing to see the self-fulfilling prophecy of an algorithm making stuff popular because said stuff is built to please the algorithm creating more demand for content to please the algorithm. "Viral" content is no longer a result of lots of people deciding that they find something interesting — it's a condition created by algorithms manipulated by forces that are getting stronger and more nuanced thanks to generative AI.

We're watching the joint hyper-scaling and hyper-normalization of the internet, where all popular content begins to look the same to appeal to algorithms run by companies obsessed with growth. Quality control in AI models only exists to stop people from nakedly exploiting the network through unquestionably iniquitous intent, rather than people making shitty stuff that kind of sucks but gets popular because an algorithm says so.

This isn't a situation where these automated tools are giving life to new forms of art or interesting new concepts, but regurgitations of an increasingly less unique internet, because these models are trained on data drawn from the internet. Like a plant turning to capture sunlight, parts of the internet have already twisted toward the satisfaction of algorithms, and as others become dependent on generative AI (like Quora, which now promotes ChatGPT-generated answers at the top of results), so will the web become more dependent and dictated by automated systems.

The ultimate problem is that this morass of uselessness will lead companies like Google to force their generative AIs to "fix" the problem by generating answers to sift through the crap. Amazon now summarizes reviews using generative AI, legitimizing the thousands of faked and paid-for reviews on the platform and presenting them as verified and trusted information from Amazon itself. Google has already been experimenting with its "Search Generative Experience" that summarizes entire articles on iOS and Chrome, and Microsoft's Bing search has already integrated summaries from Copilot, with both basing their answers off of a combination of search and training data.

Yet in doing so, these platforms gain a dangerous hold on the world's information. Google's deal with Reddit also gave it real time access to Reddit's content, allowing it to show Reddit posts natively in search (and directly access Reddit posts data for training purposes). Yet at some point these portals will generate an answer based off of the data they have (or have access to, in the case of Tumblr and Wordpress) rather than linking you to a place where you can find an answer by reading something created by another person. There could be a future where the majority of web users experience the web through a series of portals, like Arc Search's "browse for me" feature, which visits websites for you and summarizes their information using AI.

Right now, the internet is controlled by a few distinct platforms, each one intent on interrupting the exploratory and creative forces that made the web great. I believe that their goal is to intrude on our ability to browse the internet, to further obfuscate the source of information while paying the platforms for content that their users make for free. Their eventual goal, in my mind, is to remove as much interaction with the larger internet as possible, summarizing and regurgitating as much as they can so that they can control and monetize the results as much as possible.

On some level, I fear that the current platforms intend to use AI to become something akin to an Internet Service Provider, offering "clean" access to a web that has become too messy and unreliable as a direct result of the platforms' actions, eventually finding ways to monetize your information's prominence in their portals, models and chatbots. As that happens, it will begin to rot out the rest of the internet, depriving media entities and social networks of traffic as executives like Steve Huffman cut further deals to monetize free labor with platforms that will do everything they can to centralize all internet traffic to two or three websites.

And as the internet becomes dominated by these centralized platforms and the sites they trawl for content, so begins the vicious cycle of the Habsburg AI. OpenAI's ChatGPT and Anthropic's Claude are dependent on a constant flow of training data to improve their models, to the point that it's effectively impossible for them to operate without violating copyright. As a result, they can't be too picky when it comes to the information they choose, meaning that they're more than likely going to depend on openly-available content from the internet, which as I've suggested earlier will become increasingly normalized by the demands of algorithms and the ease of automating the generic content that satisfies them.

I am not saying that user-generated content will disappear, but that human beings cannot create content at the scale that automation can, and when a large chunk of the internet is content for robots, that is the content that will inform tomorrow's models. The only thing that can truly make them better is more stuff, but when the majority of stuff being created isn't good, or interesting, or even written for a human being, ChatGPT or Claude's models will learn the rotten habits of rotten content. This is why so many models' responses sound so similar — they're heavily dependent on the stuff they're fed for their outputs, and so much of their "intelligence" comes from the same training data.

It's a different flavor of the same problem — these models don't really "know" anything. They're copying other people's homework.

As an aside, I also fear for the software code that's created by generative AI products like GitHub Co-pilot. A study by security firm Snyk found that GitHub Copilot and other AI-powered coding platforms, which were trained on publicly-available code (and based on the user's own codebase), can replicate existing security issues, proliferating problems rather than fixing them. NYU's Center for Cybersecurity also found in 2023 study that CoPilot generated code with security vulnerabilities 40% of the time.

These are also the hard limits that you're going to see with generative images and video. While the internet is a giant hole of content you can easily and cheaply consume for training, visual media requires a great deal of significantly more complex data — and that’s on top of the significant and obvious copyright issues. ChatGPT's DALL-E (images) and Sora (video) products are, as I've noted, limited by the availability of ways to teach them as well as the limits of generative AI itself, meaning that video may continue to dominate the internet as text-based content finds itself crowded out by AI-generated content. This may be why Sam Altman is trying to claim that giant AI models are not the future — because there may not be enough fuel to grow them much further. After all, Altman claims that any one data source "doesn't move the needle" for OpenAI.

There's also no way to escape the fact that these hungry robots require legal plagiarism, and any number of copyright assaults could massively slow their progress. It's incredibly difficult to make a model forget information, meaning that there may, at some point, be steps back in the development of models if datasets have to be reverted to previous versions with copyrighted materials removed.

The numerous lawsuits against OpenAI could break the back of the company, and while Altman and other AI fantasists may pretend that these models are an intractable path to the future of society, any force that controls (or makes them pay for) the data that they use will kneecap the company and force them to come up with a way to make these models ethically.

Yet the world I fear is one where these people are allowed to run rampant, turning unique content into food for an ugly, inbred monster of an internet, one that turns everybody's information sources into semi-personalized versions of the same content. These people have names — Sam Altman of OpenAI, Sundar Pichai of Google, Mark Zuckerberg of Meta (which has its own model called LLaMA), Dario Amodei of Anthropic, and Satya Nadella of Microsoft — and they are responsible for trying to standardize the internet and turn it into a series of toll roads that all lead to the same place.

And they will gladly misinform and disadvantage billions of people to do so. Their future is one that is less colorful, less exciting, one that caters to the entitled and suppresses the creative. Those who rely on generative AI to create are not creators any more than a person that commissions a portrait is an artist. Altman and his ilk believe they're the new Leonardo Da Vincis, but they're little more than petty kings and rent-seekers trying to steal the world's magic.

They can, however, be fought. Don't buy their lies. Generative AI might be steeped in the language of high fantasy, but it’s a tool, one that they will not admit is a terribly-flawed and unprofitable way to feed the growth-at-all-costs tech engine. Question everything they say. Don't accept that AI "might one day" be great. Demand that it is today, and reject anything less than perfection from men that make billions of dollars shipping you half-finished shit. Reject their marketing speak and empty fantasizing and interrogate the tools put in front of you, and be a thorn in their side when they try to tell you that mediocrity is the future.

You are not stupid. You are not "missing anything.” These tools are not magic — they're fantastical versions of autocomplete that can't help but make the same mistakes it's learned from the petabytes of information it's stolen from others.

Are We Watching The Internet Die?

Degenerative AI

Edward Zitron

Welcome to Where's Your Ed At!