:Ben Metcalfe Blog

Archive
APML

(Ben Metcalfe is a founding member of the DataPortability Workgroup – which promotes and encourages the implementation of open-standards and open-access to data using technologies such a OpenID, OAuth, Microformats, APML and more)

DataPortability.org logo

Over at DataPortability.org, we’ve been sitting on some BIG news for the passed few days that I can finally blog about…

Google, Facebook and Plaxo have joined the Data Portability Workgroup.

It’s a massive and exciting breakthrough that we’re thrilled about. Data Portability is about true interoperability and data exchange (both between social networks and other apps we use). It’s breathtaking to see these companies sign up and align themselves with that ideal.

I’m also stoked to have some amazing people represent each of these companies on the group. Joseph Smarr will represent Plaxo (who I also work with on the OpenSocial committee), Brad Fitzpatrick will represent Google (a major coup seeing as he helped create OpenID ) and Benjamin Ling will represent Facebook (Benjamin is also ex-Google).

I’m on-site at MySpace today so can’t blog further reaction right now, but reaction can be found from Marshall Kirkpatrick at Read/WriteWeb and Duncan Riley at TechCrunch.

You can also join the public Google Group for Data Portability.

Read More

(disclosure: I am currently helping MySpace – a Facebook competitor – with their platform strategy)

Around this time last year Facebook released a feature on profile pages called “News Feeds”, which allowed a user to see all of the updates and interactions their friends were doing with the site.

Esteemed industry observers such as danah boyd (whose work I very much admire) immediately took issue with it. In her essay ‘Facebook’s “Privacy Trainwreck”: Exposure, Invasion, and Drama’ danah concluded:

“Facebook says that the News Feed is here to say. This makes me sad. I understand why they want to provide it, i understand what users are tempted by it. But i also think that it is unhealthy, socially disruptive, and far worse for the users than the lurking employers ready to strike down upon thee with great vengeance for the mere presence of a red plastic cup.

Facebook lost some of its innocence this week. Even when things return to “normal,” a scar will persist. Yet, the question remains: what will the long-term social effects of this “privacy trainwreck” be?”

Well, that ‘News Feed’ feature is pretty much became corner-stone of what we’ve all grown to love as as the ‘social graph’. It’s the textbook example of how we view and interact with our social graphs and derive the value from them.

It’s actually fair to say that danah’s views on the lost privacy from this feature were mainly accurate, and yet it’s a feature most of us enjoy using and consider positive and useful. Whole industries are forming to capitalize on making the most out of the data created with what was once called ‘a trainwreck’.

What’s so different about Beacon?

And so it is with the above example in mind that I am very curious around the pushback Facebook’s Beacon received.

It’s been very interesting to read the reaction to it – mainly that people feel it is an invasion of privacy, especially online activist group MoveOn.org who described it as a “glaring violation of (Facebook’s) users’ privacy” (sounds familiar).

How is this any different to Facebook’s “News Feeds” feature (ie Social Graph)? In fact the social objects being modeled by Beacon are inanimate retail items rather than other people, as per “News Feeds”. With the “News Feed” feature both the friended and the profile owner potentially had their privacy lost. With Beacon it’s just the profile owner. Britney Spear’s latest album is not going to be embarrassed if the world (or just my friend circle) learns I just bought it.

The one thing which I do dislike is the way in which the data is collected, especially that it is off-site via remote javascript loads. But once the data is on my Facebook profile I find it hard to argue that it is any less deserving to be exposed (within my circle of friends) then who I just friended or which photos I just uploaded, etc.

And if the social graph has let me discover cool, relevant, new applications to add to my profile based on the apps my friends are adding then why can’t it let me discover new purchases I might like to buy based on what my friends are buying?

Facebook is a commercial company that needs to generate revenue, and in many ways this seems far more useful than plain-old blanket advertising coverage. I too want to dislike Beacon, yet the logic required seems difficult to muster.

The one thing that I would like to see from Facebook is a way to export this purchasing data, in the same way that I would like Facebook to be more open with the social graph data too. It’s my data after all. APML would be a great edition here.

Read More

I’m pleased to announce that the NewsGator family of products are going to support APML!

More on Read/WriteWeb. Welcome NewsGator to the workgroup!

Read More

Hot on the tails of my in-depth post addressing Tom Morris’ issues with APML, I’ve been meaning to ‘back up’ and write a higher level “introduction to APML”.

Well Marjolein Hoekstra has written a superb introduction to APML and I guess done the hard work for me. :) Turns out Marshal Kirkpatrick has picked this up and linked to it from his article on recommendation engines over on ReadWrite/Web too.

If you’re still not up to speed on APML and attention profiles in general, please hop on over to her blog

Also the APML.org website has just had a redesign + refresh too! Check it out.

Read More

(Disclosure: I am a member of the non-profit APML Workgroup, which facilitates the development of the APML specification)

It’s been great to see the momentum and interest build around APML – Attention Profile Markup Language (if you don’t know what it is, check out the wiki).

Tom Morris has an interesting take on the space, primarily in response to a post by friend and former colleague (and current captain of the good ship backstage.bbc) Ian Forrester on his concept of an “APML Lite” (not currently connected with the APML Workgroup).

In his post, Tom raises a number of issues and concerns around the attention markup space – which I feel would be useful to address. However, just for the sake of those not completely across what APML is trying to do, let me define two key points:

  • Attention is the term given to the entire scope of what you consume and ‘pay’ interest to – be it websites, books, songs, etc.
  • Attention Profile is a metadata payload of that attention, in the form of keywords (or themes) and weightings, which help score how much attention you pay a given keyword. The idea is that a system tracking your attention could generate such a profile which could then be easily ported to another application and processed accordingly. (APML is a proposed XML format for this payload)

Ok, so Tom starts out with the fundamental question about the validity of Attention Profiles:

“The problem I see is that I am not sure what the point is of attention formats. I can see the point of attention, sure. That’s easy. But for me, attention is a set of algorithms which sit above the data layer. When building applications, you try hard to separate out the business process from the database.”

I’m going to assume Tom means to ask “what the point is of attention profile formats?” as the purpose of complete attention formats is to distribute entire attention payloads across systems (which he advocates throughout his post and implies straight of the bat by mentioning the concept of separating business process from database).

“…attention is a set of algorithms which sit above the data layer.”

Well, as mentioned above, technically attention is not the algorithms that sit above the data – it is the data itself. And that data tends to be heavy (imagine a file listing every website you ever visited or every song you ever listened to, each time it was played).

The primary purpose of attention profile formats are to empower the end-user with something of value that they can easily move around the ecosystem. Something that isn’t unmanageably huge.

APML is a way to reflect the product of the very algorithms he mentions. For example, different attention keepers who you allow to track your attention could come to very different conclusions about your attention interests based on the same data. Attention tracker #1 could conclude that you like “football” and “London”, attention tracker #2 could conclude from the same dataset that you actually like “Arsenal” (a specific football team) and “Islington” (a specific region of London).

And don’t forget the granularity to this regard is not just the keyword itself but the weighting too.

Now, assuming that attention tracker #2 has produced a better and more accurate profile for you, APML gives you the opportunity to export that higher-value profile elsewhere. If you had to export the entire dataset to another system you could end up with the new system using a similar inferior algorithm to attention tracker #1 and you would be stuck with crappy profile and perhaps crappy recommendations.

Tom questions this concept:

“A different attention tracker is meant to trust this, even though the process that is used to calculate it may as well have been Mystic Meg’s bloody tarot cards.”

Well, making reference to the example above, in terms of generating the profile it’s up to the user to pick and choose which services they feel produce the best quality results for them – just like you have to decide whether Google or MSN Search give you better search results. However if a user has decided that a given exported profile is accurate then, yes, a recipient attention tracker is meant to trust this file – after all it’s been given the user’s seal of approval.

Obviously APML is just a proposed format, and agnostic from whether one provider is better than another, but it’s not unreasonable to assume that the user would know whether they’re going to be exporting a good profile or not – a service should be showing their profile in the primary interface and also making accurate recommendations. And if Mystec Meg ever produces an attention service and a user wants to export a profile from her then why should they not be able to so (no matter how poor it might be)? There’s the wider, more common, issue here about the user’s right to data portability from silos.

“We can own our attention data all we like, but we need open attention algorithms too, if we want to do anything truly useful with it.”

I’m a proponent of open-source and open-data, and to a fair degree that extends to algorithms. But I’d have to disagree that attention data is only ‘truely useful’ if the algorithms that process that data are ‘open’. For a start, some of the most useful algorithms around – such as Google’s search algorithm – is anything but open yet highly useful.

But crucially, another key use of APML, as mentioned above, is to programmatically reflect the product of these algorithms – which gives you the benefit of them in an environment where the vendor maintains a proprietary secret sauce algorithm. The philosophical debate as to whether vendors should maintain secret sauce/proprietary anything is beyond the scope of this document, and frankly a notion we all have to work around with regardless of whether we agree with it or not. So APML actually helps you when you are dealing with an ecosystem of proprietary algorithms.

Collaborative filtering vs keywords

All of this may, however, be missing Tom’s fundamental question – and that is the keyword approach.

“The problem with hitching data formats to specific use cases is that nobody knows what the use cases will be.”

He’s right, APML is assuming that the ingesting attention engine is going to be keyword based – but that’s because keywords are becoming a pretty common currency for attention profile data. I would beg to differ that we don’t know what the use cases are. Just thinking about the projects I am personally involved, I am advising Orange on a personalized homepage and recommendation service which makes heavy use of keywords as part of its unique selling proposition. I’ve been involved in, and aware of, a fair degree of keyword-orientated work at the BBC too.

“Ideally, an attention engine would be able to pull in data like who I’m talking to, what products I’ve bought on sites like Amazon, what music I’m listening to, who and when I add people to social networking services, and then make rules-based guesses as to how to direct my attention to further my goals.”

“… in RDF, we have a way to represent all the data in a format that could quite feasibly scale up. Through GRDDL, XSLT and microformats, we have a relatively straight-forward process to move data in. What we get for very little work is the potential of a relational database where all the relationships are URLs.”

From these two quotes I get the impression Tom is orientating his thoughts and aspiration about a different attention reccomendation model – perhaps something like collaborative filtering (“people who bought book x also bought book y and book z”, “people who visited link a also visited link b and link x”, etc). To be fair, this is also yet-another, albeit different, use case and so if Tom won’t be drawn on any I’m slightly at a loss as to how this one is any more valid that any other.

However there are some thoughts on this.

Firstly, there is already a specification for exporting entire raw attention datasets of urls – Technorati’s attention.xml. The possibility to do a fair chunk of what Tom is advocating has already been around (with his proposed ‘full on data’ approach) for some time. And it’s fair to say no one has really done anything with it. From talking to various people involved with the specification, I think it’s fair to say that Technorati have moved on from it.

(In fact, their consumer proposition these days is about keywords, funnily enough.)

One of the aspirations, I believe, of the APML Workgroup is to produce something that is ready to be implemented in the consumer space rather than build specifications and formats for the sake of computer science.

Keep everything, including the kitchen sink

In many ways what Tom suggests is the ‘keep everything-and-the-kitchen-sink model’, the lossless model where nothing is lost or left behind – and I think his primary beef is actually not with APML but with the notion that a ‘lossy’ keyword model is a good (or at least valid) model in the attention space.

Only time will tell which is more successful, but so far there are no successful consumer-orientated implementations of attention.xml or anything like what he is describing. And I question whether consumer-orientated services will need a user’s entire raw attention data to give them an accurate recommendation.

There are more complicated debates, too, like traversing objects – deciding that I like “Arsenal” as an attention concept from my urls and then recommending me books or friends in a social network with similar interest – you can’t do that accurately with the kitchen-sink model (unless you convert to keywords, and then you have profiles and thus APML…)

It’s early days for APML, but already I can see many examples where such an approach as a far more likely chance of adoption and it is for that reason I am supporting APML.

Read More