Data Strategy

August 11, 2007

David Heckerman interview

Filed under: Bayesian networks, Pattern recognition — chucklam @ 11:31 pm

A little over a month ago CNet published an interview with David Heckerman, lead researcher of Microsoft’s Machine Learning and Applied Statistics Group. I haven’t heard much about David lately, and apparently he’s been busy developing open source analytical tools for HIV research.

Back in 1990, David had won the ACM Doctoral Dissertation Award for his thesis “Probabilistic Similarity Networks”. He was a major influence in the ’90s in establishing Bayesian networks and Bayesian methodologies as practical tools in AI and machine learning. I remember my adviser lending me a copy of David’s thesis when I first started my PhD study. He told me to read it and to strive for a thesis of similar caliber. (Yes… the first couple years of graduate study tend to be filled with optimism and ambition.) From David’s thesis I had seen the potential of quality research.

In the last five years or so, I haven’t come across any publication by David Heckerman. It’s great to learn that he’s still doing great work, just now in a slightly different field.


July 24, 2007

Web seminars on data mining

Filed under: Bayesian networks, Datamining — chucklam @ 2:05 am

Via KDnuggets: ACM’s Special Interest Group on Knowledge Discovery and Data Mining have two interesting webinars coming up. One is on exploiting link data in data mining. The other is a tutorial on learning Bayesian networks. You can register for either event here. They’re both free and given by noted experts in those areas. More info below.

Exploring the Power of Links in Data Mining
Thursday, July 26, 2007 11:30 am ET (duration 1 hour)
Jiawei Han
University of Illinois at Urbana-Champaign Register at (free)

Algorithms like PageRank and HITS have been developed in late 1990s to explore links among Web pages to discover authoritative pages and hubs. Links have also been popularly used in citation analysis and social network analysis. We show that the power of links can be explored thoroughly at data mining in classification, clustering, information integration, and other interesting tasks. Some recent results of our research that explore the crucial information hidden in links will be introduced, including (1) multi-relational classification, (2) user-guided clustering, (3) link-based clustering, and (4) object distinction analysis. The power of links in other analysis tasks will also be discussed in the talk.

Jiawei Han, Professor, Department of Computer Science, University of Illinois at Urbana-Champaign. He has been working on research into data mining, data warehousing, database systems, data mining from spatiotemporal data, multimedia data, stream and RFID data, Web data, social network data, and biological data, with over 300 journal and conference publications.

He has chaired or served on over 100 program committees of international conferences and workshops, including PC co-chair of 2005 (IEEE) International Conference on Data Mining (ICDM), Americas Coordinator of 2006 International Conference on Very Large Data Bases (VLDB). He is also serving as the founding Editor-In-Chief of ACM Transactions on Knowledge Discovery from Data. He is an ACM Fellow and has received 2004 ACM SIGKDD Innovations Award and 2005 IEEE Computer Society Technical Achievement Award. His book “Data Mining: Concepts and Techniques” (2nd ed., Morgan Kaufmann, 2006) has been popularly used as a textbook worldwide.

Please register at (webcast is free)

The second webinar…

Learning Bayesian Networks
Wed, Aug 1, 2007, 1 pm PT, 4 pm ET (duration 1 hour)
Richard E. Neapolitan
Northeastern Illinois University Register at (free)

Bayesian networks are graphical structures for representing the probabilistic relationships among a large number of variables and doing probabilistic inference with those variables. The 1990’s saw the emergence of excellent algorithms for learning Bayesian networks from passive data. In 2004 I unified this research with my text Learning Bayesian Networks. This tutorial is based on that text and my paper.

Neapolitan, R.E., and X. Jiang, “A Tutorial on Learning Causal Influences,” in Holmes, D. and L. Jain (Eds.): Innovations in Machine Learning, Springer-Verlag, New York, 2005.

I will discuss the constraint-based method for learning Bayesian networks using an intuitive approach that concentrates on causal learning. Then I will show a few real examples.

Richard E. Neapolitan is Professor and Chair of Computer Science at Northeastern Illinois University. He has previously written three books including the seminal 1990 Bayesian network text Probabilistic Reasoning in Expert Systems. More recently, he wrote the 2004 text Learning Bayesian networks, and Foundations of Algorithms, which has been translated to three languages and is one of the most widely-used algorithms texts world-wide. His books have the reputation of making difficult concepts easy to understand because of the logical flow of the material, the simplicity of the explanations, and the clear examples.

June 22, 2007

Just published: 2nd edition of Finn Jensen’s Bayesian Networks and Decision Graphs

Filed under: Bayesian networks, Datamining — chucklam @ 12:15 am

The second edition of Finn Jensen’s Bayesian Networks and Decision Graphs was just published this month. I haven’t read it yet, but I’m a fan of the first edition and Finn’s other book: An Introduction to Bayesian Networks. The books are very accessible and work for self-study.

Looking at the table of content, the major addition to the 2nd edition is chapters on learning from data, both for parameter estimation and structure learning. Not dealing with learning was a major hole in the first edition. It made the book less useful for people from the machine learning community. Fixing that in the 2nd edition should make this a welcome introduction to Bayesian networks for all practitioners.

More info at their website.

Blog at