« Internet-Centrism 3 (of 3): Tweeting the Revolution (and Conflict of Interest) | Main | My favourite post... »

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451d3b369e2014e8bd8be15970d

Listed below are links to weblogs that reference Data Anonymization and Re-identification: Some Basics Of Data Privacy:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

RAD

This is all very perturbing :-)

So can we be for stricter personal information laws yet support the release of the WikiLeaks documents?

Does harm not enter the equation? I think this is at the heart of Tim O'Reilly's lament. He sees demographic data as a tool for good.

KEE

You should also look at this:
http://www.ipc.on.ca/images/Resources/anonymization.pdf

A response to Ohm et al.'s argument.

tomslee

Thanks for that reference. It's good to see Canada giving privacy and security issues a high profile and I'll definitely be reading it.

Now if only we would actually collect some data to treat with privacy through... oh I don't know... a compulsory long-form census for example, maybe we'd get somewhere.

tomslee

KEE: I read the paper with interest. It is clear there is a valid concern here - that health professionals and researchers may be unnecessarily deprived of data they need to carry out important and socially beneficial work if unwarranted privacy concerns lead to panicked responses. In my day job we face the same issue: as a software company it is increasingly challenging to support customers in the medical field because privacy concerns make it difficult to reproduce problems.

That said, I felt that much of Paul Ohm's paper was concerned with "release and forget" situations, whereas institutions such as yours would obviously be treating the data under controlled and accountable circumstances - as you spell out. I do feel that there is more danger to legitimate research from mistaken and casual assurances of privacy by companies that should know better (Google's claim that IP addresses are not PII, for example) than there is from those who challenge PII based approaches in the context of "release and forget" data sets.

The massive cross linking of multiple large data sets by advertisers, by internet companies and by others (loyalty card information for example) raises, to my mind, significant concerns about "PII-protected" information. The abuse of "PII-protected data" by such companies may lead to a backlash that could spill over to affect socially productive (and privacy-respecting) research efforts. You have chosen to focus on the danger of spill-over, while I have been writing more about the potential for abuse of PII-protected data sets in the context of advertising or "open data" initiatives. I don't think the goals are incompatible, but we are definitely looking for sources of trouble in different directions.

JGM

You're hard on Tim O'Reilly. Perhaps before feigning shock that he wasn't aware of the studies you mention, you should consider that non-academics rarely have free access to academic journals.

Perhaps that's the kind of data access Tim O'Reilly is most concerned with? Yes, anonomizing data sets is non-trivial -- but perhaps that's even more reason to be forgiving of (or at least more civil towards) someone who is researching this issue, contributing to an ongoing public debate and running a company. Frankly, some of your comments come off as snarky rather than enlightening.

tomslee

JGM: I'm surprised. First, I don't have access to firewalled academic journals either, and I have to work for a living too, so I have no advantage over Tim O'Reilly when it comes to the information available to me. And I too am in my own little way contributing to an ongoing public debate - and not being paid for it either. In fact, given that his promotion of Open Data is part of his work while for me it is of necessity a personal sideline, he should be in a much better position to find important information.

If Tim O'Reilly is going to evangelize the virtues of open data from his position of significant influence, he has a responsibility to think through the downsides of what he is promoting and I was actually (not feigning) surprised to see that he apparently hasn't.

And I'm sure he can withstand a little criticism from me - in fact, I'd be surprised if he even hears about it, so I'm not too worried about being hard on him.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Circular References

  • Could Try Harder
    This here is a relaxed, slow-moving weblog. It ain't one o' them hyperactive updated-all-the-time weblogs. Slow down a little.

Books

Blog powered by TypePad
Member since 11/2005

Tools

  • Sitemeter