about semantic web, software architecture and life in general

Category: Social Networks

2007-10-04

FOAF for Social Network Portability

When Dopplr announced importing your social network from other sites the single most requested feature in comments was FOAF import (with CSV import being second).

Note: Please add your feedback. Comments are working, but may be held for moderation and will appear after being approved.

So - what’s the use of FOAF for Social Network Portability?

1. Existing FOAF data

LiveJournal is one of the largest sources of FOAF data and every user automatically gets a FOAF profile (http://username.livejournal.com/data/foaf/, same works for communities). It creates a large amount of data which other application can make a use of. Exporting FOAF is a simple task and other community sites may start exporting it should they see a good reason to do so (which is what this article aims to address).

You don’t have to look at “raw” data, though. Let a computer do the job and take a look at a more human-friendly rendering of FOAF (and navigate the data by following links indicated with arrows) using one of the many RDF browsers. This will show you what data are in there, but can still be quite nerdy.

In the end it will be other applications and sites like Dopplr that can use FOAF and other structured data to provide better services to its users. E.g., Apple Safari RSS reader uses FOAF to display a list of person’s friends when browsing her LiveJournal feed (a screenshot here).

P.S. If you want to be aware of web pages containing FOAF, SIOC and DOAP machine-readable information while browsing the web, the Semantic Radar extension for Firefox may be for you.

2. What about hCard + XFN ?

Many may think that vCard or hCard + XFN is the only / best choice for this. As a format for representing structured data they are a good thing, but at the same time are only and not always the best solution.

Most of the entries in a list of Services with hCard+XFN supporting friends lists start with “login and …".

A public hCard+XFN usually will not contain enough data to identify your friends. It would need to provide information for linking their identities across sites, such as an email address. hCard expresses it in clear text and this is something that spammers will like but your friends may not.

You could require a user to enter a login name and a password to access a private profile with all the required details. (An alternative is to only use public information, but this is mainly limited to exact nickname matches.) Will you always trust a site enough to give it passwords for all other sites you want to port your data from? Really?

Also, once the sites are talking directly, they don’t necessarily need an HTML-based data format and can as well use the APIs provided by these sites.

3. Useful properties of FOAF

FOAF has some useful properties that make it particularly interesting for social network portability.

FOAF was created with identification of objects in mind. It is built on RDF, which is a generic format for linked data on the web. As such, it gives us some flexibility on how pieces of data can be distributed across the web. We’ll use that later. (But enough about RDF)

Take a look at any LiveJournal FOAF profile (or a human-readable rendering of it). You will notice there some basic information about the page (title, feed, …) and quite a lot of information about its author. This includes core FOAF properties (name, homepage, birth date, image, interest(s), …) and LiveJournal extensions to it (city, country, school, bio, …). foaf:knows allows to link together people and their friends.

There vocabularies may use different namespaces, but FOAF + RDF allows to freely use them together and make sense of this data. What matters it that this is some rich data applications can work with.

4. Identifying objects in FOAF

We need a way to identify people (your friends) based on the public FOAF profile. While you can express an email address in FOAF we don’t want to do that on a public web page. Take a look at foaf:mbox_sha1sum instead.

It is a unique hash generated from person’s email address (using SHA1 algorithm). There is a one-way correspondence between an email address and its SHA1 hash, but no way to recreate an email address using a hash.

This makes foaf:mbox_sha1sum very useful when you need to identify a person by her email address, but do not want to put this email on the web in clear text. Many LiveJournal profiles (but not all) contain a foaf:mbox_sha1sum. This information can be used by other sites to show you if your friends are already registered there. All they have to do is compare email SHA1 hashes of registered users (which can be easily calculated) with those of your friends.

P.S. There are other ways to identify a person (e.g., via a homepage URL), but let’s concentrate on email and its hash now.

5. What’s stopping Dopplr ?

If people are asking for FOAF import and if this format is worth considering, what is stopping Dopplr and other social media / network sites?

Best if they can share insights into what the real problems are. While waiting for comments, here’s what I think: it’s practical data access issues. It makes sense to start with large sources of FOAF data and look how they are structured.

LiveJournal is probably the largest of them. Notice that a LJ FOAF profile contains a lot of information about a person itself, but not that much info about friends. Fear not - reference to every friend contains a link (rdfs:seeAlso) to the full FOAF profile and you can follow it to retrieve all the details required.

And that is a problem. To check for all your friends a site will need to make (n+1) HTTP requests where n = a number of your friends. LiveJournal policy for bots requires not to make more than 5 requests per second. Doing those many requests takes time and bandwidth and may be something that the sites you are migrating want to avoid.

6. Solution

How to make LiveJournal FOAF data more useful?

Let’s just take all the properties needed (foaf:mbox_sha1sum in this case) from friends’ FOAF profiles and copy them to where they are referenced to in your FOAF profile. Remember the flexibility of FOAF and RDF? That makes it perfectly valid - just copy’n'paste.

This would require a simple change at LiveJournal’s side, but would make using this data much more efficient - now just 1 HTTP request is needed to move a network of all your friends to a new site.

What about other FOAF data sources? Many already contain information needed to identify their contants. For example, every person in FOAF profile of Tim Berners-Lee has an email address, its SHA1 hash or a unique identifier (URI which is another option how to identify objects) assigned. These sources are rich enough for our needs, but may not provide the critical mass needed to get the ball rolling.

That’s why we need to get better FOAF data from large social media sites. And to get a clear understanding of how FOAF can be best used for social network portability.

Related links

“Thoughts on the Social Graph” by Brad Fitzpatrick

… will add suggested links here …

Comments

I would love to hear from you - is this information interesting or useful? Is it all wrong, perhaps? Do you want to add something or ask a question? Go ahead! :)

P.S. This is not an attack on microformats. Some of the things described (e.g., a hash of the email address) can be easily added to them, if needed. Data can also be converted from one data format to another. The goal of this article is to provide some information about using FOAF for social network portability. I hope you will find that it has some power we can use.

2007-10-01

Permalink 23:17:28, Categories: Links, Software Development, Social Networks   English (EU)

TTL: Today's Top Links

More than 50% of “Second Life” inhabitants are from Europe

From a report by comScore, dated 4 May 2007, titled “comScore Finds that ‘Second Life’ Has a Rapidly Growing and Global Base of Active Residents". Also by them: “Social Networking Goes Global” - shows growth dynamics of a number of most popular social networking sites.

Linden Labs to Eurpoean Union Residents: It’s a Tax Time

“Second Life” EU residents now have to pay VAT (~20%). That’s what many learned from direct emails sent to premium subscribers.

Gallery of Adobe remedies

eBooks are often distributed as PDFs with a bitter taste of DRM added. This means that an ebook you’ve bought may be locked to a particular device and most probably you won’t be able to use it with Linux. Don’t we have a right to use the eBooks purchased on any device we own? This gallery lists some remedies to Adobe PDF encryption and DRM.

If you can print PDF to a PostScript file then these links can also be useful for converting it back to a DRM-free PDF (did not succeed applying these suggestions in my case though, let me know if they work for you):
Anon: Adobe eBooks to PDF (anon-ebook-to-pdf.txt)
Making protected PS files distillable (convertable back to PDF) (200610_pdf_ps_hacking.html)


Programming Digital Media - Making and modifying digital media by writing custom software for Mac OS X

Education materials. Found via stumbling upon An Introduction to PostScript, a part of this course.

Scripting Tools for Scientific Computing

Course materials from Dept. of Informatics, Univ. of Oslo - Simula Research Laboratory - April 2003

Python module that parses palm files

Date: 04/08/2004 - Version: 0.5.It has been a while since I last used a Palm PDA. But those 10k+ applications built for PalmOS, mostly free- and share-ware, were something many other platforms could wish for. One of my favourites is DateBk4 (and follow-up versions).

2004-10-21

Permalink 19:39:11, Categories: Social Networks   Latvian (LV)

Group Dynamics - Resources

Forsyth's Group Dynamics Resources Page:

Many interesting resources pointing to the articles and research about social networks, groups and group dynamics.

del.icio.us: captsolo/social-networks

2004-10-15

Permalink 18:39:53, Categories: Semantic Web, Social Networks   Latvian (LV)

Daily reading

Wired 12.10 : "Point. Shoot. Kiss It Good-Bye" - an article on photo annotation in Wired Oct'04 issue. By David Weinberger. A theme which gets lots of attention in the Semantic Web community.

Your hard drive is overflowing with gazillions of digital pics. DSC00234.jpg might as well be labeled DON'T_KNOW_DON'T_CARE.jpg. The quest to build the photo archive of the future.

Life with Alacrity: Tracing the Evolution of Social Software - the history of the "social software" from 1940s to the present day and the future.

Some more:
1. Urchin RSS aggregator [sf.net]
and a related announcement to www-rdf-interest

2004-09-30

Permalink 18:14:47, Categories: Social Networks   Latvian (LV)

Orkut overload?

Looks like Orkut is lagging quite heavily. I get lots of errors and delays and friends are complaining they cannot even register in the site.

Is Orkut facing scalability issues?

Honestly, I think it scales quite well for its size (compared with some local friendster sites which apparently were not planned even for tens of thousands). It would be interesting to know what is Orkut's architecture and how it tries to ensure the scalability.

BTW what is the number of Orkut users ATM?

:: Next Page >>

captsolo weblog

See also:

:: Next Page >>

September 2010
Mon Tue Wed Thu Fri Sat Sun
 << <   > >>
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Search

Gallery

www.flickr.com
captsolo's items Go to captsolo's photostream

Misc

Syndicate this blog XML

powered by
b2evolution
Page served in 4.248 seconds

Valid XHTML 1.0! Valid CSS! Valid RSS! Valid Atom!