Data Strategy

August 30, 2007

Data mining and privacy

Filed under: Datamining, People and Data, Privacy — chucklam @ 4:09 pm

John Langford (at Yahoo) has a good post on The Privacy Problem in datamining at his Machine Learning (Theory) blog. The privacy issue is getting a lot of attention in the datamining community lately. In fact, there’s a whole research area on privacy-preserving datamining emerging, although most results to date have tended to demonstrate how hard it is to guarantee privacy. The negative publicity surrounding datamining has prompted KDnuggets (a newsletter for dataminers) to poll its readers whether the term “datamining” has become an inaccurate/misunderstood term to describe what they do, especially given the fact that a lot of datamining don’t deal with data about individuals.

The main issue is really the trade-off between the benefits and the privacy problems of large-scale data collection and retention. Unfortunately, there’s so much unintended consequences, both good and bad, from data collection that it’s hard to even fully discuss the pros and cons. People who’ve worked with data know the positive potential of finding new uses from data originally collected for other purposes. On the other hand, even well-intentioned efforts, such as AOL’s release of query data, are problematic if they are not handled carefully. Of course, it doesn’t help that some government datamining efforts are just outright bad ideas to begin with.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: