They're Looting The Internet

Last week, Meta revealed (in a motion trying to dismiss an FTC anti-monopoly lawsuit) that Instagram made an astonishing $32.4 billion in advertising revenue in 2021. That figure becomes even more shocking when you consider Google's YouTube made $28.8 billion in the same period. Bloomberg reports that the app made almost 30% of Meta's entire revenue in the early part of 2022. 96% of Meta's $40.1 billion Q4 2023 revenue came from advertising, and it’s made over a hundred billion dollars a year since 2021, a trend it’s likely to continue based on the fact that the only thing these platforms care about is revenue numbers increasing. Google made $86.3 billion in Q4 2023, with $48 billion of that coming from Google Search and related advertising, up 13% from the previous quarter.

In America, 83% of adults use YouTube, 68% of them use Facebook and 47% of them use Instagram. Each platform boasts over two billion users and, over the last three years, Meta and Google have made over half a trillion dollars in revenue from advertising on these platforms.

I now want you to go on Facebook, scroll down, and see how quickly you hit an advertisement. In my case, after one post from a friend, I was immediately hit with an advertisement for some sort of food supplement, then a series of reels, then a suggested group called "Walt Disney Magic," followed by an ad, followed by a post from a friend.

On Instagram, I saw one post from a person I followed, followed by an ad for the same food supplement, followed by two posts from people I followed, followed by another ad. When I clicked an Instagram story, I saw one post from my friend before an ad for the very same food supplement, another two posts from a friend, and then an ad for a game that features a regular trope of the genre — footage of gameplay that isn't actually in the game.

When I went to YouTube, my first result was an 11-minute-long Taiwanese news video and another video that appears to be in Chinese. In fact, several of the videos on the front of YouTube were in Chinese. I went to Google "why are my Youtube videos in Chinese" and the first result was a Reddit post where several users were, for whatever reason, being served random videos in Chinese. Researching the beginning of this article took me about 30 minutes, because every time I googled something — like what percentage of web traffic goes to Google — I kept being given "authoritative" sources like "Forbes Advisor" (the affiliate marketing arm of Forbes with nothing to do with the magazine) with sources ranging from "Blogging Wizard" to a literal list of website names, with no actual links.

This is the state of the modern internet — ultra-profitable platforms outright abdicating any responsibility toward the customer, offering not a "service" or a "portal," but cramming as many ways to interrupt the user and push them into doing things that make the company money. The greatest lie in tech is that Facebook and Instagram are for "catching up with your friends," because that's no longer what they do. These platforms are now pathways for the nebulous concept of "content discovery," a barely-personalized entertainment network that occasionally drizzles people or things you choose to see on top of sponsored content and groups that a relational database has decided are "good for you."

On some level, it's hard to even suggest we use these apps. The term "use" suggests a level of user control that Meta has spent over a decade destroying, turning Instagram and Facebook into tubes to funnel human beings in front of those who either pay for the privilege of visibility or have found ways to trick the algorithms into showing you their stuff.

It's the direct result of The Rot Economy, a growth-at-all-costs mindset built off the back of immovable monopolies where tech companies profitably punish users as a means of showing the markets eternal growth. In practice, this means twisting platforms from offering a service to driving engagement, which, in Facebook and Instagram's case, meant finding the maximum amount of interruptions that a user will tolerate before they close the app. In Google's case, it meant making changes to search that made advertisements and sponsored links significantly harder to differentiate from "real" search results and allowing the quality of search results to decay to the point that users now rely on TikTok and Reddit instead.

Hi! Please download and listen to my weekly podcast Better Offline. Last week's episode was the best yet - a two-parter covering the four intractable problems holding back AI, and how the AI bubble might burst.

The Rot Machine

Underpinning these ultra-profitable torture-machines is an online advertisement industry built off the back of fucking advertisers and users alike. In the mid-2010s, Facebook "mistakenly" told online publishers that their videos were receiving more engagement than they actually did, leading multiple publishers to "pivot to video," a disastrous industry movement that cost hundreds of reporters their jobs and led to a massive class action suit against Meta. Meta is currently the subject of a class action suit led by Metroplex Communications, which claims that Meta’s inflated metrics lured advertisers away from competing platforms — something it’s been sued for before.

When you align your incentives around "bigger" and "more," you'll take just about anybody's money — like advertisers comparing the COVID-19 vaccine to the holocaust, quack doctors and their phony cancer treatments, scammers selling counterfeit fishing equipment, scammers offering fake discounts for puzzles, and cryptocurrency cons. An investigation from late last year found that a third of advertisements on Facebook Marketplace in the UK were scams, and earlier in the year UK financial services authorities said it had banned more than 10,000 illegal investment ads across Instagram, Facebook, YouTube and TikTok in 2022 — a 1,500% increase over the previous year.

As the platforms begin to decay, things only get worse for the user. Elon Musk's acquisition of Twitter (and outright hostility toward blue chip advertisers) has turned it into the digital equivalent of Downtown Vegas, with seemingly every post replied to by a bot offering "NUDES IN BIO," something that's pretty funny until you realize they're a front for a series of online dating scams. Musk selling "verification" for eight dollars has allowed cryptocurrency scammers to make millions tricking users into connecting their wallets to fund-draining smart contracts, and they're even buying ads on the platform to do so using stolen credit cards. Musk's desperation for ad revenue has even led Twitter to start pushing ads that don't actually say they're ads, leading to advertising watchdog Check My Ads filing a formal complaint with the FTC demanding that it investigates Twitter and enforces its Truth In Advertising standards.

Yet it's foolish to act as if the sorry state of Twitter is entirely different to that of the rest of the web. Instagram is flooded with its own porno bots that engage with regular posts (through likes and replies) as a means of pretending they're real to avoid Meta's automated content moderation measures, which mostly rely on a combination of automation and thousands of contractors in countries like Kenya who make as little as $2.20 an hour to view what WIRED calls "the most hideous content on the internet."

Meta is — much like every other major tech firm — half-assing their approach to moderation, committing human rights violations so that it can spend the smallest amount of money possible to stop the things it needs to stop while failing to maintain any consistent level of quality on the platform. As you'd expect, these lax standards have led to Facebook being flooded with content created with generative AI, with a study out of Stanford and Georgetown revealing how Facebook's algorithm is boosting spam content riddled with misinformation, sending hundreds of millions of impressions to pages that direct people to Wordpress sites crammed full of spammy and scammy ads.

Yet perhaps the most obvious sign of digital decay is visiting most websites on a smartphone. IGN.com, a gaming website with over 300 million monthly views, immediately hits you with two giant ads, and on opening a story about the new Fallout TV show, covers the top quarter of your screen with an autoplaying video ad.

Reach PLC — a publicly-traded, multi-million dollar business that dominates local journalism in the UK, and owns three of the most widely-read national newspapers — is notorious for its aggressive approach to monetization. Its websites have been described as an “over-monetised mess” and “impossible to navigate,” with the “poor digital experience” named as a partial contributor to its declining financial fortunes. If you open a local UK news website (especially one owned by Reach) with the stock iPhone Safari app, you’ll be met with no shortage of page-covering ads that appear mid-article, and ads that redirect you to an external website without any warning. Again, while you’re mid-article.

Even the giants haven’t resisted the temptation to screw their users. CNN, one of the most influential news publications in the world, hosts both its own journalism and spammy content from "chum box" companies that make hundreds of millions of dollars driving clicks to everything from scams to outright disinformation. And you'll find them on CNN, NBC and other major news outlets, which by proxy endorse stories like "2 Steps To Tell When A Slot Is Close To Hitting The Jackpot."

These “chum box” companies are ubiquitous because they pay well, making them an attractive proposition for cash-strapped media entities that have seen their fortunes decline as print revenues evaporated. But they’re just so incredibly awful. In 2018, the (late, great) podcast Reply All had an episode that centered around a widower whose wife’s death had been hijacked by one of these chum box advertisers to push content that, using stolen family photos, heavily implied she had been unfaithful to him. The title of the episode — An Ad for the Worst Day of your Life — was fitting, and it was only until a massively popular podcast intervened did these networks ban the advert.

These networks are harmful to the user experience, and they’re arguably harmful to the news brands that host them. If I was working for a major news company, I’d be humiliated to see my work juxtaposed with specious celebrity bilge, diet scams, and get-rich-quick schemes. And they’re ultimately illustrative of where the internet is today.

The optimistic, respectful and trusting approach to legislation around online platforms has led to an internet riddled with decay and pain, one that incentivizes mining human beings like veins of ore. The modern internet was built on a social contract that said that big tech gave us services "for free" in exchange for the nebulous concept of "data," which largely took the form of the content and connections we made and the information that came out as a result. This contract was both assumed and extremely easy to enter into, meaning that there was never any attempt to regulate its terms — not simply what can and cannot be done to a user, but what the user will continue to receive in return for the experience itself.

As a result, these platforms were (and are) a form of bait-and-switch, the underpinning philosophy of Cory Doctorow's "Enshittification" theory, where platforms build massive monopolies based on offering good, useful services, and then slowly turn the screws on the customer to seek ever-growing profits. Yet as I've noted before, I feel that enshittification misses one crucial thing — that these companies aren't doing this out of a lack of profitability or failure of their business model, but because the modern internet has become somewhere between a social experiment and a human mining operation.

Charlie Warzel framed this well in a recent piece in The Atlantic, describing the overall techscape as a form of hostage negotiation. Interactions with tech companies are no longer a purchase or two-way contract, but a series of trades of information long after we've purchased the actual product. Every single interaction with tech now requires us to share our email address, to accept a subtle form of tracking (don't worry, it's "anonymous"!), or to share personal information that can and will be leaked. It's trite at this point to say that human beings themselves are now the product, but it's more obvious and painful than ever when you look at the state of our deeply dystopian internet.

Tech companies have found every imaginable way to monetize every imaginable thing we do, all based on the idea that they're providing us with something in return. And when you really think about it, they haven't provided a service at all. Twitter, Facebook, Instagram and Google are platforms that only have as much utility as the content they host, which is created by billions of (mostly) unsupported and unpaid users. The tradeoff was meant to be that these platforms would make creating and hosting this content easier, and help either surface it to a wider audience or to quickly get it to the people we cared about , all while making sure the conditions we created and posted it under were both interesting and safe for the user.

Yet the state of the internet is now far simpler: the cost of using free platforms is a constant war with the incentives and intentions of the platforms themselves. We negotiate with Instagram or Facebook to see content from the people we chose to follow, because these platforms are no longer built to show us things that we want to see. We no longer "search" Google, but barter with a seedy search box to try and coax out a result that isn't either a search engine-optimized half-answer or an attempt to trick us into clicking an ad. Twitter, in its prime, succeeded by connecting real people to real things at a time when the internet actively manufactures our experience and interactions with others.

The core problem lies in the fact that these platforms don't really create anything, and their only value exists in making an internet of billions of people small enough to comprehend. Like seemingly every problem with a capitalist society, the internet has become dominated by powerful forces that don't contribute to the product that enriches them. As a result, they have either no concept of nor interest in "quality," just "more," making them extremely poor arbiters of what "good" looks like. This inevitably leads to products that suck more as they become more profitable, because the machine they've built is a profit excavator dressed as a service.

I'd argue that this makes Google, and by extension executives like Sundar Pichai and Google Search lead Prabhakar Raghavan, some of the greatest villains in business history. While one can't forget about the damage done by Meta and Mark Zuckerberg's failure to maintain an honest platform, allowing Google Search to decay so severely for any reason — let alone a profit-centric one — is actively damaging to society, and was an entirely intentional act perpetrated by people like Raghavan, the former head of Google's ads division who took over search not long after his predecessor sounded a "code yellow" about Google's advertising encroaching on search results.

If you need to see exactly how bad things are getting, spend some time on Google News. For months I’ve been tracking Tech Gate, a site that Google News regularly cites for the biggest stories in the tech industry, growing it to a modest yet not insignificant 18,000 unique monthly visitors according to data from SimilarWeb. The problem is that Tech Gate isn’t a real site — it’s entirely made up of stolen articles, like this article that copies Wccftech, or this entirely plagiarized Cointelegraph-branded article. Hey, remember a month ago when Google promised it was updating search to surface better, human-authored content? Remember when I told you it was full of shit? I want to scream!

Anyway.

By allowing — and encouraging — search engine optimization (SEO), Google handed matches to arsonists and pointed to the most flammable parts of the internet. The existence of SEO is inevitable, but Google should never have encouraged these people — it should have only set clear standards about what not to do and punished failures to comply heavily, except doing so would mean less content on Google (since there wouldn’t be as much of an incentive to create cheap SEO-centric content, like the millions of “what time is the superbowl” articles that appear each year).

It’s not so much that Google is negligent or incompetent, but actively hostile to users, as demonstrated by the company’s decision to cancel its contract with Appen — an Australian business that employs a significant chunk of Google’s outsourced search quality raters. These are the people who tell Google if a search result is high quality. While they can’t influence individual page rankings, they’re an important quality control mechanism on something that’s largely impervious to public sentiment.

As an aside, it’s worth noting that these raters — of whom about one-third work in the US — are employed under genuinely deplorable conditions. Last year, Appen workers protested outside Google’s Mountain View campus demanding basic benefits, like health insurance, paid sick leave, and parental leave. These workers are often woefully underpaid, with one group only achieving a raise to $14.50 per hour in 2022.

As much as Google has historically liked to crow about the generous benefits and workplace perks offered to employees, its reliance on poorly-paid contractors is an open secret. In March, the company terminated its contract with Cognizant — a sprawling Indian IT contractor, one of the “Big Five” tech body shops, along with the likes of Wipro, Infosys, Tata, and HCL — which, in turn, led to the firing of roughly 50 YouTube Music workers in Austin, Texas.

These YouTube Music workers earned as little as $19 per hour and received “minimal benefits,” with many forced to work multiple jobs in order to survive. Illustrating the precarities of their employment, the workers only found out they were let go while addressing Austin City Council about their working conditions.

Curiously, both groups — the Appen raters and the YouTube Music staffers — were laid off after successfully unionizing and protesting against the company. I’m sure that’s just a coincidence. Google wouldn’t be evil, right?

We justifiably loathe Elon Musk for destroying Twitter, but we should have a hundred times the bile for Larry Page, Sergey Brin and Sundar Pichai. All three have unquestionably damaged our ability to access knowledge through their actively harmful approach to maintaining a portal that billions use to find answers to everything.

And after decades of profiting off of platforms that make billions of people create things for free, they've found their next big sting — stealing the content itself, and selling it back to us using artificial intelligence.

Eating The Internet

As I wrote in my last newsletter and went over in the last episode of my podcast Better Offline, the large language models (LLMs) underpinning the generative AI boom require incredible amounts of data. As a result, every single LLM-based application — ChatGPT, Google Gemini, Anthropic's Claude, Meta's LLaMA, to name but a few — is trained on an indeterminately-large portion of the internet made up of both Common Crawl (a 250-billion page repository of web content) and what appears to be as much of the rest of the internet as they're able to download, legally or otherwise. After profiting handsomely from being the middleman between content creation and internet users, big tech is in the process of looting the internet as a means of training models that it hopes can replace human beings themselves. And I'm not being remotely dramatic.

While Meta already makes over a hundred billion dollars mining and selling our data while constantly interrupting us with sponsored content, that just isn't enough, and it’s now training its generative AI models using billions of our Facebook and Instagram posts. Google has now paid Reddit $60 million dollars to train on its data, and both OpenAI and Midjourney have struck deals with both Tumblr and Wordpress to train its models on their blogs. DocuSign is training its generative AI models with user data, and OpenAI allegedly transcribed over a million hours of YouTube videos as a means of training its latest "GPT-4" model, and Google couldn't get angry about it because it was doing exactly the same thing.

A Washington Post investigation from last year found that Google's T5 and Meta's LLaMA models trained on Google's C4 data set had ingested 15 million different websites including everything from pirated eBook websites to the entirety of free blogging platform Medium. It's reasonable to assume that models like ChatGPT were trained on similar-sized datasets, and while we can't tell exactly what they trained on, I'd argue the New York Times' lawsuit against the company successfully proves that GPT-4 and other models were trained on a great deal of its content.

While OpenAI, Google and Meta would like to claim that these are "publicly-available" works that they are "training on," the actual word for what they're doing is "stealing." These models are not "learning" or, let's be honest, "training" on this data, because that's not how they work — they're using mathematics to plagiarize it based on the likelihood that somebody else's answer is the correct one. If we did this as a human being — authoritatively quoting somebody else's figures without quoting them — this would be considered plagiarism, especially if we represented the information as our own.

LLMs are a globally-perpetuated act of theft taking place in broad daylight, to the point that OpenAI has told England's House of Lords that "it would be impossible to train today's leading AI models without using copyrighted materials." Every single tech company making a LLM is stealing, justifying it by using the previous model of the internet where everything published online was there for the taking and conflating access to content with ownership in the process.

The generative AI boom has exactly the same stench as the metaverse, and it's happening for exactly the same reason — the people making the products are not building things for human beings, but to show the markets that they'd continue to grow. Google and Meta do not make great products. Sam Altman has founded one company, Loopt, which he somehow sold for $40 million despite it never gaining traction. Sundar Pichai's resumé includes a short stint at McKinsey and serving on the board of Magic Leap, a borderline-fraudulent augmented reality company. Meta is incapable of creating products without stealing or acquiring them, and has a far longer history of destroying every startup it touches. These people aren't innovators, or creators, or even service providers — they're thieves and landlords insulated by weak regulation and markets that have become disconnected from the concept of good business. OpenAI is no different. After all, it’s effectively a subsidiary of Microsoft.

And when you have an industry piloted by people who don't make products for people, you don't create anything useful, and you're doomed to make the same stupid, obvious mistakes.

Despite the billions of investment and media headlines, it is surprisingly difficult to describe what ChatGPT does. Generative AI allows you to generate lots of stuff from a prompt, allowing you to pretend to do the research much like LLMs pretend to know stuff. It's good for cheating at papers, or generating lots of mediocre stuff, which makes it, if we're honest, kind of like a very advanced calculator that can create words as well as numbers. Every time I write something like this I get sent emails telling me what "generative AI can do," and the answer is always some sort of contrived excuse to use tech rather than a useful integration with a specific purpose.

As another aside: Even the claims of what generative AI has done need to be taken with a grain — or a spoonful — of salt. You might have caught wind of Devin, the “world’s first AI software engineer,” which is purportedly able to tackle entire software engineering tasks on Upwork.

Except these claims fall apart at the first bit of scrutiny. First, AI can’t handle one of the most essential tasks of software engineering, which is collaboratively discussing the requirements of a project and how to implement the technology. Second, it appears Devin was provided a cherry-picked task, rather than finding one on his own, as the developers behind the tool claimed.

But more fundamentally, Devin just wasn’t all that good. His code didn’t meet the basic requirements of the project — which included providing documentation on how the solution worked and how to deploy it — and his code was clunky, inelegant, and used outdated approaches to basic tasks. Worse, he took significantly longer to complete the task than an actual human would.

Admittedly, Devin took so long because it spent hours — literally hours — trying to identify and resolve bugs in its own sloppy code. I’d say it has the engineering ability of a first-year CompSci undergraduate, but that would be unfair to first-year CompSci undergraduates.

It's the hallmark of a tech industry dedicated to creating problems that it charges you to solve, a sickly beast borne of venture capital and a lack of innovation. When you're rich and powerful, you no longer face real problems, and as a result fail to consider the solutions that would measurably improve a person's life. Sundar Pichai and Mark Zuckerberg don't worry about bills, or face actual busywork, or real challenges — they get paid hundreds of millions of dollars to come up with ways to express growth to hedge funds.

Yet I believe generative AI is going to lead several of these companies to ruin. As I wrote (and broadcast) recently, LLMs are running out of training data, and the desperation to find more will lead to significant legal consequences when it's revealed exactly how much they stole and from whom they stole it from. As these generative AI companies become more desperate, they'll (intentionally or otherwise) ingest AI-generated content, which causes a degenerative training effect called "model collapse" (also known as Habsburg AI) that destroys the model's ability to deliver useful answers. By all accounts, this is already happening, with Adobe's Firefly AI accidentally ingesting generative art that found its way into their Adobe Stock pool of stock images.

LLMs also tend to hallucinate, a virtually-unsolvable problem where they authoritatively make incorrect statements that creates horrifying results in generative art and renders them too unreliable for any kind of mission critical work. Like I’ve said previously, this is a feature, not a bug. These models don’t know anything — they’re guessing, based on mathematical calculations, as to the right answer. And that means they’ll present something that feels right, even though it has no basis in reality. LLMs are the poster child for Stephen Colbert’s concept of truthiness.

Generative AI's energy demands are unsustainable, as are its compute demands, and even if you somehow got past all of these intractable issues, you face several far simpler questions: What use is all of this generative AI? Why is it necessary? Why does it have to happen? What is the killer app that makes any of this worthwhile? And why the fuck, for the third time in three years, is the tech industry trying to cram something down our throats that doesn't appear to help us?

The answer is obvious: because they believe it'll make them money.

Generative AI is catnip for big tech — a narratively-satisfying way of expressing how big their companies will grow for investors that don't invest in companies because they make things for real people, but for the signals they issue to Wall Street speculators. LLMs are a way to sell software to enterprises and people alike, while also driving theoretical billions into tech stocks, which in turn makes the tech industry feel special and important. It's a new "thing" that tech can do that feels like the future, that sort of makes sense, that allows people who barely read anything and don't do any real work to dream of ways to get rid of the people who do.

Yet I believe that they're all disconnected from any actual value creation. Google, Meta, Microsoft and OpenAI don't create anything — they're built off of supporting other people doing things, and have spent decades abstracting themselves away from any kind of labor. These aren't the companies that made Google Search, or Facebook, or Word. They're data brokers by different names that happen to sometimes sell software.

What's so dangerous about AI for these companies is that despite all the hype, AI neither solves any real problems nor actually makes them any money. OpenAI may make billions, but it has yet to turn any kind of profit, and its operating costs vastly surpass any revenue it gets from licensing its technology or selling premium memberships.

Google can't find a way to make money off of its generative search product (which still, by the way, hallucinates) and is thinking about charging for it — the actions of a company that knows that none of this generative nonsense actually matters. Further evidence that the company is doubling down on AI is its recent decision to elevate Liz Reid, who most recently led Google’s AI search team — called Search Generative Experience (SGE) — to the top search role in the company.

As part of this shuffle, Google has also moved Cheenu Venkatachary, one of its senior AI engineers, to lead the teams responsible for ranking and search quality. Nothing — and I repeat, nothing — good will come from this.

I think that society is turning on tech as a result. Everybody knows how bad the internet is, and everybody knows how much money these companies make. People are still angry about cryptocurrency and the metaverse, they're still deeply pissed off that they were lied to, and they're far more aware of when they're being fed a line of shit than Mark Zuckerberg, Sam Altman, Sundar Pichai, and the rest of their ilk realize.

They're Looting The Internet

The Rot Machine

Eating The Internet

Edward Zitron

Welcome to Where's Your Ed At!