:Ben Metcalfe Blog

Archive
January, 2011 Monthly archive

Poison by ˙Cаvin 〄Ever since the first RSS feeds were published there have been the issue of nasty, spammy people sucking up those RSS feeds and reposting the content on their own nasty, spammy blogs (splogs). The are many approaches to dealing with the the problem – friendly (emailing to ask them to take things down and desist), legal (eg DMCA, but only works for US based sites), technical (eg blocking based on black lists but that is a pain) and editorial (eg short-form RSS, which sucks).

One way not to deal with the problem is to remove your RSS feeds altogether – which, it is rumored, local blog network Gothamist (home of SFist) is considering doing in order to concentrate on the distribution of their proprietary content apps instead. I’m confident that is an extremely flawed strategy, but I digress.

My girlfriend Violet Blue runs a highly successful blog, tinynibbles.com (warning: content very NSFW), which suffers immensely from splogs republishing her content without permission. As I look after her server and the technical operations for her empire of sites, I decided to see if I could help solve this problem in a different way.

What I am about to go through is a tutorial on how you can really try to hurt someone who is leaching your RSS feed – to the extent that it damages and potentially destroys their splog operation. I am not a lawyer but I do not believe any of what I am about to go through is illegal – although I’ll admit that it is naughty.

In a nutshell…

…what we are going to do is intercept the requests from the target’s server for our RSS feed and divert them to a ‘poisoned’ RSS feed that contains both content warnings but also javascript that when rendered on their website will take over their page, rendering their site and advertising useless for anyone that comes to visit them. If you wanted to go further, you could also use this method to try to execute shell commands on their server, although at this point things become legally murky and ethically questionable.

This tutorial assumes you have some basic site admin skills, can access your logs and can set a .htaccess file.

So here goes…

Step 1: Identify your target

Chances are you’ve discovered someone republishing your content via a Google search or a trackback from the splog to your site. The first thing to do is to get the IP address of the site. Most splogs will request your feed from the same server as they serve their webpages from so this makes it easy to identify them when they come to visit your site to pull down your RSS feed. I’m going to assume that my target has the ip address 123.123.123.123

Step 2: Search your logs

Search your logs for any access to your site by this ip address. You might want to try:

$ grep "123.123.123.123" /var/log/access_log

where 123.123.123.123 is the ip address of the splog and /var/log/access_log is the path + filename of your web server’s access logs.

Hopefully you will have found some matches:

123.123.123.123 - - [16/Jan/2011:14:03:51 -0500] "GET /feed HTTP/1.1" 200 - "http://www.mysite.com/feed" "Mozilla/4.8 [en] (Windows NT 6.0; U) (880701279)"
123.123.123.123 - - [16/Jan/2011:15:57:13 -0500] "GET /feed HTTP/1.1" 200 - "http://www.mysite.com/feed" "Mozilla/4.8 [en] (Windows NT 6.0; U) (1416539927)"
123.123.123.123 - - [16/Jan/2011:20:31:40 -0500] "GET /feed HTTP/1.1" 200 - "http://www.mysite.com/feed" "Mozilla/4.8 [en] (Windows NT 6.0; U) (686799288)"
123.123.123.123 - - [16/Jan/2011:23:52:38 -0500] "GET /feed HTTP/1.1" 200 - "http://www.mysite.com/feed" "Mozilla/4.8 [en] (Windows NT 6.0; U) (2099013304)"
123.123.123.123 - - [17/Jan/2011:02:26:34 -0500] "GET /feed HTTP/1.1" 200 - "http://www.mysite.com/feed" "Mozilla/4.8 [en] (Windows NT 6.0; U) (1475562814)"

It’s worth pointing out this will not work if you directly link your RSS feeds to a 3rd party site like Feedburner, because the request from the splog never reaches your server. At this point sadly there is little you can do, as Google (Feedburner’s parent company) do not give you control to serve different content to arbitrary ip addresses. If you want to use a service like Feedburner, consider offering publicly an RSS url on your server that 302 redirects to Feedburner – achieving the same result while maintaining control of requests.

Step 3: Build the poisoned RSS feed

We are going to create a separate RSS feed that we will redirect the splog’s requests to. If they are creating a new page/blog post for every item in your feed, our new poisoned RSS feed will force their server to generate pages containing what we want to say.

At this point you need to decide how far you want to take things:

  • Display a content warning explaining that they are reproducing your content without permission and you are unhappy about it
  • Display images from TubGirl and other Shock Sites
  • Hijack their page’s DOM and redisplay the page. Anyone accessing their site will only see your content, with all adverts and other links removed.
  • Attempt to run commands on their server – eg attempt to delete files, elevate user permissions, purge the database, etc.

For my situation I decided to go for the first 3.

To create the poisoned RSS feed, you could save out your own current RSS feed and use that as a template. Replace the obvious text in each item with what you would like to say and save it back to your server. Alternatively you could just use my poisoned PHP script on Github.

My script will make their request’s IP address and other HTTP details appear at the footer of each page along with a tracking string so you can search in Google for any other places the are publishing too. It will also try to inject JavaScript that will manipulate the DOM so that when they or anyone else visits their site only your message will appear. Finally, the script outputs 10 identical items, each with a random GUID so that more pages are created each time the splog revisits as it will think each item is new each time.

As a bonus you can also set it to email you when someone access the poisoned feed.

Step 4: Intercept the splog request

The simplest way to divert requests for your RSS feed by the splog, and divert them to the poisoned RSS feed is to put the following into the top of your .htaccess file:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^123\.123\.123\.123$
RewriteRule ^(.*)$ http://www.mysite.com/poisonedrss.php
</IfModule>

Again, where 123.123.123.123 is the splog’s ip address.

Step 5: Sit back and wait

You can now sit back and wait until the splog requests your content again, at which point it will be directed to your poisoned feed. The splog will go on to ingest the poisoned content in it.

What will happen is that the splog will take each of the items in the feed and convert them to individual pages. Your poisoned content will get ingested into their pages, where if they are not running a correct level of character escapes, Javascript and other code will get executed when the end-user visits the page.

Read More

Back in October 2010 the BBC announced that BBC Backstage - the developer platform and open data project I had created with Tom Loosemore and James Boardwell back in 2004 - would be closing at the end of the year.

It was sad news, but one that was both expected and appropriate. The project set out to do big things:

  • introduce a large and buerocratic media organization to the concepts of open data,
  • share that data with 3rd party developers in order to let them find new and experimental uses for it
  • foster internal and external innovation practices that were new, chaotic and sometimes challenging to an old encumbant.

But I think its fair to say that on the whole, the project met its goals and expectations.

As a by-product I think BBC Backstage, and the community that formed around it, also helped kick-start the fledgling London Startup community that we have today. What was then called “The London New Media Scene”, primarily because of the agency orientated slant of the London industry at the time, influenced a generation of non-commercial hackers and NTK subscribers to become entrepenurial and start building startups.

With BBC Backstage winding up, the BBC has produced a wonderful retrospective, “Hacking the BBC”, which I had the honour of being interviewed for. You can download a copy here (pdf) or see below.

The closure of BBC Backstage is certainly a sad day for me, but at the same time I’m confident that it was time to do it. The challenge for the BBC is maintaining the concept of open data and external innovation – and weaving it through the entire fabric of the organization. They claim that is something that is happening, and I think there are good people there championing the notion – but I think the BBC still has some way to go before that box can be really ticked.

You can read Jemima Kiss’s coverage on the Guardian’s website or you can check out a few photo memories I have of the project:

A very flush-faced looking me launching the project at OpenTech 2005 (photo by Natalie Downe)
Ben Metcalfe and the launch of BBC backstage

The BBC Backstage Team winning a New Statesman Award for innovation, 2006
New Statesman Award 06

and of course, cheekily snapping Tom Loosemore in a suit:
IMG_1893

Read More