Categories
BLOG

craigslist bongs

Craigslist’s Devious Strategy to Prevent Scraping

The bell character is not your friend

In the early 2010s, screen scraping was all the rage. Everyone wanted to scrape data from websites, and I did a ton of scraping projects.

Lots of sites set up elaborate systems to prevent scraping — like populating data on the page using Javascript onloads, since Mechanize and other scraping tools didn’t do JS.

I remember that Craigslist, though, had the most devious strategy. If you tried to scrape their site and had the wrong user-agent, they didn’t just send back a null result.

Instead, they sent back megabytes of a single character repeated over and over — the bell character!

This meant that if you were trying to print the resulting data to the console for debugging on Linux, it would ring your motherboard bell over and over again, thousands of times. On many systems, this was a blocking function, so you would end up with a frozen computer making a constant “BONG BONG BONG BONG” sound from its internal speaker, which couldn’t be muted or switched off. You couldn’t even CTRL + C.

The only solution was to restart the whole thing, or wait out hours of dinging.

It was a devious strategy, and definitely discouraged scraping the site. You could find ways around it, but mess up once and print to the terminal, and you got…DING DING DING.

In the early 2010s, screen scraping was all the rage. Everyone wanted to scrape data from websites, and I did a ton of scraping projects. Lots of sites set up elaborate systems to prevent scraping —…