7/4/2024

How tech giants cut corners to harvest data for AI

The New York Times published a bombshell report on how, behind the scenes, large AI companies find a (sometimes illegal) way to harvest content to train their large language models, the basis of generative AI.

The funniest part is Google turning a blind eye to OpenAI transcribing 1 million hours (!) of YouTube videos to power GPT-4 because Google itself was doing the same for Gemini. (The practice violates the terms of use of YouTube.)

It gets better. Two days earlier, YouTube CEO Neal Mohan told Bloomberg that the use of videos by OpenAI to train Sora, its astonishing AI video generator, would be against the platform’s terms of use.

7/4/2024

Hiding my face on the internet

Two unrelated events — the birth of someone’s baby close to me and a site that generates random ugly avatars — led me to rethink the exposure to which I submit, almost always voluntarily, on the internet.

My face is in several places. Back there, before the facial recognition algorithms and the generative AIs, I thought it would be good to show the face to pass… credibility? Confidence? I don’t know. Maybe it wasn’t even a necessity as it’s today, because we didn’t have AIs that wrote convincing gibberish. Simpler times.

I started thinking about removing some of the photos from my face from the internet. I deleted some photos of obvious places that are under my control, such as social media and my website, and found that it takes time for search engines and some platforms to “notice” the update or even delete the images. I was reminded that, also in a very literal sense, the internet doesn’t forget.

Then I realized that I posted dozens of videos on YouTube showing my face. Maybe it’s a lost case.

Still, I decided to use a silly avatar where possible: an orange ball with a smiling face. You can see it on this site’s footer.

Before that, I tried to use one of the ugly avatars I mentioned up there. I replaced my picture with it on Telegram and right after I got an email from someone thinking my account was hacked.

5/4/2024

Almost no one cares whether your site is on social media

In March 2024, I ran an experiment in my Portuguese-written blog: I stopped distributing its content on social media (Mastodon, mostly) and messaging apps (Telegram and WhatsApp channels). It has a small following in a few places — ~2,9k on Telegram, ~450 on WhatsApp and three Mastodon profiles (two with autopost) that sums ~5k followers.

The result was that… little has changed.

The blog got ~107k unique visitors who viewed ~172k pages. Compared to the average of the previous six months, the March figures were 33.7% and 30.3% higher, respectively.

The reason for this increase, however, was an uncontrollable external player: Google. On March 27th, I posted a link in our readers forum of a Brazilian viral anonymous Google spreadsheet with reports of bad companies to work for. Google, for any inexplicable reason, put this link in front of many pairs of eyes, and almost 38k people arrived at my blog in the few days remaining in March.

(This created tragicomic situations, such as people posting anonymous reports of toxic companies in the comments of the blog and one that threatened to sue me if I didn’t take down the spreadsheet.)

Continue reading »

2/4/2024

The end of Google Podcasts is in June if you're not in the US

In September 2023, Google announced the end of Google Podcasts in favor of YouTube Music. At some point after that, an exact date for the closure appeared: April 2nd, 2024.

It turns out that this deadline only applies to the United States. In a post on the Google forums for podcasters, dated March 18th, a company employee, Cory Peter, explains that Google Podcasts will be discontinuing in the rest of the world on June 24th. Those who trust the Google’s app to listen to podcasts have almost three months to export their subscriptions to another app.

27/3/2024

Mastoot is my new Mastodon favorite app

I don’t expect much from a Mastodon app: just on that’s lean and stable; a no-frills approach.

For some reason I can’t explain, until recently I hadn’t tried Mastoot. (I suspect I confused it with Mast, which I tried and was horrible; people need to think of more original names for these apps.)

Mastoot, developed by Bei Li, is… simple, as it states on its App Store description. It has no advanced features nor is it very customizable — you can change the icon and accent color, and customize the sideways sliding actions. And that’s it.

For a while, Mastoot was slightly neglected by its creator. Not anymore. Coincidence or not, Bei Li said that Mastoot development began again, and he’d “like to start with minimal and prioritize features driven by user feedback this time”. Also, he will “implement features at a relatively slower pace to ensure quality”.

As it’s right now, Mastoot is a delight to use. Oh, and it’s free.

22/3/2024

Plain text email

In the mid-1990s, a war was waged in the email inboxes of those who were already online. It was at this time that HTML email arrived, creating heated discussions in BBSs, mailing lists, and IRC channels.

It’s very likely that most — maybe, you — don’t know what I’m talking about. Let’s go back, so we can all be on the same page.

Email messages can be sent in plain text (text/plain), just like files saved in Notepad, or in HTML (text/html). In this second format (or “MIME type”), the messages are created as if they were pages of a website, which opens a Pandora’s box, I mean… many possibilities, such as rich formatting and images mixed with the text.

HTML email has some obvious disadvantages, such as less security due to hiding links and loading remote media. An incidental problem is that, unlike web browsers, email clients/apps do not follow web standards — each one renders HTML differently, which makes the design of newsletter layouts, for example, a hellish endeavor.

Another problem with HTML is that messages in this format are heavier, because they have invisible parts (headers and the HTML code itself) and visible (images, in particular) that the pure text counterpart doesn’t have.

Nowadays, this may not be a problem, thanks to the ubiquitous fast internet connections. In the 1990s, with the very slow links of dial-up, it mattered.

Continue reading »

« Previous 2 of 12 Next »