Web 2.0 and the Security of Your Data
Lots of blog buzz about Facebook shutting-down Scoble’s account for running a script against Facebook, thus violating the site’s ToS. It appears that an ill-conceived experimental Plaxo Pulse script that used screen-scraping to retrieve email addresses is the culprit. I empathize with Scoble and given the facts, also think Facebook was justified in suspending his account. This post, however, is about a bigger, related issue that the event highlights. (No, it’s not about who owns the data either.)
I am concerned about the general complacency and casual attitude that people generally have about Web 2.0 data security. In this case, an extremely tech savvy individual allowed a test script from another (supposedly) tech savvy company to be executed against production data. That’s insane, no?
Not quite…it’s no different than downloading a beta app from a website and allowing it to run on your desktop. You really have no idea how it’s going to fudge-up your machine and your data, but you probably do it anyway once you determine the risk is acceptable.
In both situations, it comes down to Trust, Ignorance and Recklessness or a combination thereof. While there are now adequate protections available for desktop apps so your data can be reasonably protected, the same is not true for Web 2.0 apps. It’s the Wild West out there. With the proliferation of API’s, widgets and mash-ups, you have no idea where your data is being stored, who has access to it and what apps that are accessing it are doing with it. None. And you don’t have a prayer of a chance of ever finding out.
We need standards not only for open, cross-site access to user data,
but we also need standards that provide for how the data is persisted
by different sites. I don’t have the time to read the ToS for each
site, and even if I did, I have no way of verifying that what they say
about the privacy and security of my data is actually what is happening
in the data center. With a high probability, I can assure you that it
is seldom as air-tight as the ToS legalese would have you believe.
DataPortability.org appears to be a good start for enabling access to ones data, but that is only one half of the equation. In a distributed online world, we need standards that provide transparency about how the data is being stored, verifiable means for ensuring that ToS are being adhered to by sites and auditability about when/by whom/how our data is accessed. We need bread crumbs associated with user data no matter where it is persisted and this information needs to be accessible to us in an easy, centralized location regardless of where the data is stored. Better yet, we need a standard means of encrypting our data across multiple sites. This is a tall order, and it is unlikely to happen anytime soon, but it will happen. There will be a standard because users will demand it and sites that don’t provide it will see their users leave en masse.
Today, it’s not a big deal for most people because their online information consists of photos and videos, but the more important stuff is still in silos. Our bank has some info, the credit card company has some and the travel company has some. How long before these silos also start having API’s? Wesabe is already doing it. Others will follow. Very soon, it’s going to be way more personal information than just pictures of your kid at the family picnic that are going to be strewn across the web.
We either have to give up most or all of our privacy or figure out a way to protect it as the distributed web evolves.