Data Strategy

January 15, 2008

Jon Kleinberg to speak at Stanford tomorrow

Filed under: Datamining, Network analysis, People and Data, Privacy — chucklam @ 5:13 pm

Date & time: Wednesday, January 16 4:35-5:45 pm
Location: 380-380C
Speaker: Jon Kleinberg, Cornell University

Title: Computational Perspectives on Large-Scale Social Network Data


The growth of on-line information systems supporting rich forms of social interaction has made it possible to study social network data at unprecedented levels of scale and temporal resolution. This offers an opportunity to address questions at the intersection of computing and the social sciences, where algorithmic styles of thinking can help in
formulating models of social processes and in managing complex networks as datasets.

We consider two lines of research within this general theme. The first is concerned with modeling the flow of information through a large network: the spread of new ideas, technologies, opinions, fads, and rumors can be viewed as unfolding with the dynamics of epidemic, cascading from one individual to another through the network. This suggests a basis for computational models of such phenomena, with the potential to inform the design of systems supporting community formation, information-seeking, and collective problem-solving.

The second line of research we consider is concerned with the privacy implications of large network datasets. An increasing amount of social network research focuses on datasets obtained by measuring the interactions among individuals who have strong expectations of privacy. To preserve privacy in such instances, the datasets are typically anonymized — the names are replaced with meaningless unique identifiers, so that the network structure is maintained while private information has been suppressed. Unfortunately, there are fundamental limitations on the power of network anonymization to preserve privacy; we will discuss some of these limitations (formulated in joint work with Lars Backstrom and Cynthia Dwork) and some of their broader implications.

Speaker bio:

Jon Kleinberg is a Professor in the Department of Computer Science at Cornell University. His research interests are centered around issues at the interface of networks and information, with an emphasis on the social and information networks that underpin the Web and other on-line media. He is a Fellow of the American Academy of Arts and Sciences, and the recipient of MacArthur, Packard, and Sloan Foundation Fellowships, the Nevanlinna Prize from the International Mathematical Union, and the National Academy of Sciences Award for Initiatives in Research.


October 28, 2007

Orkut on Orkut

Filed under: Network analysis, People and Data — chucklam @ 10:57 pm

Orkut Buyukkokten, founder of, will talk at Stanford today about “Who do you know: The Social Network Revolution”. The talk will be in Gates 498 at 5pm.


Online social networks fundamentally change the way we get connected.
The people we cross paths with have the biggest influence in our
lives. Now it’s easier to cross paths than ever as we are much closer
and so much more connected. In this talk I will discuss the
motivation behind the development of, touch on the social
and technical aspects of implementing and maintaining a system that
has over 60 million users and reflect on the lessons we learned.


Orkut Buyukkokten is a software engineer and product manager at
Google. He received his PhD in Computer Science from Stanford in
2002. He has been building and working on online communities the past six years. His interests include social networks, interface design
and mobile applications.

October 16, 2007

On building the Facebook platform

Filed under: Network analysis, People and Data — chucklam @ 4:12 am

An upcoming presentation on the design decisions behind the Facebook platform.

Building the Facebook Platform

6:30 PM – 9:00 PM October 23, 2007
Tibco Software Inc.
3301 Hillview Avenue, Building #2
Palo Alto, , CA

Presentation Title: Building the Facebook Platform
Presentation Summary: In building Facebook Platform, we tried to design a system that was incredibly powerful while still easy for developers to use. We also needed to protect our users’ privacy and ensure that the site remained valuable and enjoyable for users. In this talk, we’ll go over the technical decisions we made to accomplish these goals. We’ll talk about the design of the API, FQL, FBML, and the numerous ways in which applications can integrate with the Facebook site.

– Ari Steinberg, Facebook
– Charlie Cheever, Facebook

October 4, 2007 has interesting analyses of Facebook and MySpace

Filed under: Network analysis, People and Data — chucklam @ 6:02 pm

I just discovered a bunch of interesting posts on analyzing Facebook and MySpace. has a proprietary collection of Internet traffic data, so much of their analysis is quite unique. Links and notes of the posts I’ve read:

  • 14 million people interacted with Facebook Applications in August
    That’s out of 22 million visitors to Facebook. In terms of activity, picture browsing (16M) and profile browsing (21M) have more visitors. The post also shows stats like average time spent per visit.
  • MySpace vs. Facebook: The Party Starter Showdown
    “…in terms of traffic, Facebook is where MySpace was a good two years ago.” The post also had an interesting breakdown of early MySpace and Facebook users.


    Very few (1%) of the early MySpace users have “abandoned” it for Facebook. In fact, as a percentage, more early Facebook users have abandoned Facebook for MySpace (6%). This contradicts the general Silicon Valley anecdote that “everyone” is leaving MySpace for Facebook.

    Granted, The charts above were made in May, before Facebook opened up their platform. But still, while Facebook has attracted a lot of developers, has those developers developed apps that attract Non-Facebook users to Facebook?

  • Top Social Networks: Facebook grows while MySpace slows
    This post provided data comparing growth rate between Facebook and MySpace. That Facebook has a higher growth rate is well reported, and honestly, not surprising. After all, they’re in different stages of growth. The interesting info from this post is the plot of Facebook usage by state. It’s surprisingly dense in the east coast.


Having access to interesting data, the way has traffic data from ISPs and toolbars, enables a lot of interesting analysis. However, clever joining of public data can also give interesting results, as my correlation of Facebook usage with high school quality shows.

September 21, 2007

Google to enable access to its social graph data

According to a TechCrunch post:

Google will announce a new set of APIs on November 5 that will allow developers to leverage Google’s social graph data. They’ll start with Orkut and iGoogle (Google’s personalized home page), and expand from there to include Gmail, Google Talk and other Google services over time.

Not much info yet, but can’t wait…

September 1, 2007

Debunking the “small world” myth

Filed under: Network analysis, People and Data — chucklam @ 3:22 pm

It’s pretty well ingrained in popular educated culture (at least in the U.S.) that “everyone” is separated by no more than six degrees of separation, that it is a “small world.” The promoters of the idea often point to Stanley Milgram’s experiment of having “random” people in Kansas forward a letter to acquaintances until it reaches a specific person in Boston and that no more than five intermediaries were needed. The experiment has supposedly been repeated enough times to become solid “science.”

The fact that so many people would believe such a ridiculous idea is itself a pretty interesting phenomenon, especially it’s the more educated people who tend to believe it. I’ve finally come across an article today, “Could It Be A Big World After All? The ‘Six Degrees of Separation’ Myth” by Judith S. Kleinfeld, that had dug into Milgram’s archive at Yale and point out the paucity of evidence for the “small world” interpretation and the lack of experimental replication across subjects of any significant distance (i.e. from two different cities). She gave several possible explanations for the persistence of this “small world” myth.

As I listened to these descriptions of cherished small world experiences, I realized that these experiences had a different mathematical structure from the classic small world problem that Milgram and the mathematicians were investigating. The classic “small world problem” is expressed in such forms as: What are the chances that two people chosen at random from the population will have a friend in common? But the small world experiences I was hearing about would be expressed mathematically in a very different form: What is the probability that you will meet a friend from your past or a stranger who knows a friend from your past over the course of your lifetime?

How likely would it be, particularly for educated people who travel in similar social networks, never to meet anyone anywhere anytime who knew someone from their past? We have a poor mathematical, as well as a poor intuitive, understanding of the nature of coincidence.

A poor intuitive understanding of probability partly explains people’s willingness to believe the “small world” myth. I think a deeper explanation is that people simply refuse to believe how predictable their social lives are. Rather than doing the hard work of meeting interesting people and living an interesting life, educated people do what they do best–rationalize things. When your “random” friends happen to know each other, the best explanation is not that it’s a “small world.” The best explanation is that your friends simply aren’t random!

Blog at