One way not to deal with the problem is to remove your RSS feeds altogether – which, it is rumored, local blog network Gothamist (home of SFist) is considering doing in order to concentrate on the distribution of their proprietary content apps instead. I’m confident that is an extremely flawed strategy, but I digress.
My girlfriend Violet Blue runs a highly successful blog, tinynibbles.com (warning: content very NSFW), which suffers immensely from splogs republishing her content without permission. As I look after her server and the technical operations for her empire of sites, I decided to see if I could help solve this problem in a different way.
What I am about to go through is a tutorial on how you can really try to hurt someone who is leaching your RSS feed – to the extent that it damages and potentially destroys their splog operation. I am not a lawyer but I do not believe any of what I am about to go through is illegal – although I’ll admit that it is naughty.
In a nutshell…
…what we are going to do is intercept the requests from the target’s server for our RSS feed and divert them to a ‘poisoned’ RSS feed that contains both content warnings but also javascript that when rendered on their website will take over their page, rendering their site and advertising useless for anyone that comes to visit them. If you wanted to go further, you could also use this method to try to execute shell commands on their server, although at this point things become legally murky and ethically questionable.
This tutorial assumes you have some basic site admin skills, can access your logs and can set a .htaccess file.
So here goes…
Step 1: Identify your target
Chances are you’ve discovered someone republishing your content via a Google search or a trackback from the splog to your site. The first thing to do is to get the IP address of the site. Most splogs will request your feed from the same server as they serve their webpages from so this makes it easy to identify them when they come to visit your site to pull down your RSS feed. I’m going to assume that my target has the ip address 123.123.123.123
Step 2: Search your logs
Search your logs for any access to your site by this ip address. You might want to try:
$ grep "123.123.123.123" /var/log/access_log
where 123.123.123.123
is the ip address of the splog and /var/log/access_log
is the path + filename of your web server’s access logs.
Hopefully you will have found some matches:
123.123.123.123 - - [16/Jan/2011:14:03:51 -0500] "GET /feed HTTP/1.1" 200 - "http://www.mysite.com/feed" "Mozilla/4.8 [en] (Windows NT 6.0; U) (880701279)"
123.123.123.123 - - [16/Jan/2011:15:57:13 -0500] "GET /feed HTTP/1.1" 200 - "http://www.mysite.com/feed" "Mozilla/4.8 [en] (Windows NT 6.0; U) (1416539927)"
123.123.123.123 - - [16/Jan/2011:20:31:40 -0500] "GET /feed HTTP/1.1" 200 - "http://www.mysite.com/feed" "Mozilla/4.8 [en] (Windows NT 6.0; U) (686799288)"
123.123.123.123 - - [16/Jan/2011:23:52:38 -0500] "GET /feed HTTP/1.1" 200 - "http://www.mysite.com/feed" "Mozilla/4.8 [en] (Windows NT 6.0; U) (2099013304)"
123.123.123.123 - - [17/Jan/2011:02:26:34 -0500] "GET /feed HTTP/1.1" 200 - "http://www.mysite.com/feed" "Mozilla/4.8 [en] (Windows NT 6.0; U) (1475562814)"
It’s worth pointing out this will not work if you directly link your RSS feeds to a 3rd party site like Feedburner, because the request from the splog never reaches your server. At this point sadly there is little you can do, as Google (Feedburner’s parent company) do not give you control to serve different content to arbitrary ip addresses. If you want to use a service like Feedburner, consider offering publicly an RSS url on your server that 302 redirects to Feedburner – achieving the same result while maintaining control of requests.
Step 3: Build the poisoned RSS feed
We are going to create a separate RSS feed that we will redirect the splog’s requests to. If they are creating a new page/blog post for every item in your feed, our new poisoned RSS feed will force their server to generate pages containing what we want to say.
At this point you need to decide how far you want to take things:
- Display a content warning explaining that they are reproducing your content without permission and you are unhappy about it
- Display images from TubGirl and other Shock Sites
- Hijack their page’s DOM and redisplay the page. Anyone accessing their site will only see your content, with all adverts and other links removed.
- Attempt to run commands on their server – eg attempt to delete files, elevate user permissions, purge the database, etc.
For my situation I decided to go for the first 3.
To create the poisoned RSS feed, you could save out your own current RSS feed and use that as a template. Replace the obvious text in each item with what you would like to say and save it back to your server. Alternatively you could just use my poisoned PHP script on Github.
My script will make their request’s IP address and other HTTP details appear at the footer of each page along with a tracking string so you can search in Google for any other places the are publishing too. It will also try to inject JavaScript that will manipulate the DOM so that when they or anyone else visits their site only your message will appear. Finally, the script outputs 10 identical items, each with a random GUID so that more pages are created each time the splog revisits as it will think each item is new each time.
As a bonus you can also set it to email you when someone access the poisoned feed.
Step 4: Intercept the splog request
The simplest way to divert requests for your RSS feed by the splog, and divert them to the poisoned RSS feed is to put the following into the top of your .htaccess
file:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^123\.123\.123\.123$
RewriteRule ^(.*)$ http://www.mysite.com/poisonedrss.php
</IfModule>
Again, where 123.123.123.123
is the splog’s ip address.
Step 5: Sit back and wait
You can now sit back and wait until the splog requests your content again, at which point it will be directed to your poisoned feed. The splog will go on to ingest the poisoned content in it.
What will happen is that the splog will take each of the items in the feed and convert them to individual pages. Your poisoned content will get ingested into their pages, where if they are not running a correct level of character escapes, Javascript and other code will get executed when the end-user visits the page.