Data Strategy

January 17, 2008

The Google Online Marketing Challenge

Filed under: Advertising — chucklam @ 11:39 pm

Google is encouraging universities to teach their students “online” marketing (i.e. how to use AdWords) by hosting an Online Marketing Challenge:

Here’s how the Challenge works: Your students will receive US$ 200 in Google ads to drive traffic to a business website of their choosing. Students will compete with groups from their institution along with student teams from all over the world.

This is not a simulation; students gain real-world experience with a real client. You have a great student project; clients gain free advertising and Internet consulting. Google provides US $200 in vouchers, teaching materials and other resources.

November 19, 2007

IEEE Computer special issue on search

Filed under: Advertising, Collective wisdom, People and Data, Personalization, Search — chucklam @ 6:08 pm

I’m quite behind on a lot of my readings, so I only got around to reading the IEEE Computer’s (August) special issue on search this weekend. (Abstracts are free but actual PDF’s require an expensive subscription or an expensive purchase.) It includes the following articles:

  • Search Engines that Learn from Implicit Feedback
  • A Community-Based Approach to Personalizing Web Search
  • Sponsored Search: Is Money a Motivator for Providing Relevant Results?
  • Deciphering Trends in Mobile Search
  • Toward a PeopleWeb

The articles were written by a mix of university academics and researchers from Google and Yahoo. They seem targeted at giving the general practitioner a sampling of some of current research, rather than being comprehensive in any specific domain or deep in a particular research area.

For me, the most interesting article is “Search Engines that Learn from Implicit Feedback” by Thorsten Joachims and Filip Radlinski of Cornell University. It’s a very accessible summary of the research those two have been doing in the last few years. To start off their research, they used eye-tracking experiments to characterize how people react to search engine rankings. They found that the ranking order strongly biases what people view and therefore click on. A result in the top ranking will often be clicked on more often than a better result in the second or third ranking, as some users may not even have looked at the results beyond the first ranking. A straightforward assumption that a click is the equivalent of a positive vote is therefore naive. Instead, they examine results that were not clicked on but should have. For example, if results at ranking 3 and 4 are clicked on, but not the result at ranking 2, then one can be sure that the result at ranking 2 is worse than the ones at ranking 3 and 4 and can use that knowledge to improve the search engine. Note that if the result at ranking 1 was clicked on, nothing new is learned. People are so biased towards clicking the first result that only if it was not clicked on would that be considered informative.

Under that model, they can interleave the results from two different search engines (or algorithms) and evaluate which one is better based on users’ clickthroughs. This insight led them to develop a ranking SVM model to learn search engine rankings. The new algorithm was shown to create a better meta-search engine as well as a better domain-specific search engine.

November 9, 2007

List of accepted papers for WSDM’08

The first ACM conference on Web Search and Data Mining (WSDM), to be held at Stanford on Feb. 11-12, has released its list of accepted papers. A total of 24 papers will be presented. The following ones already sound interesting based on their titles.

  • An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising – Anindya Ghose and Sha Yang
  • Ranking Web Sites with Real User Traffic – Mark Meiss, Filippo Menczer, Santo Fortunato, Alessandro Flammini and Alessandro Vespignani
  • Identifying the Influential Bloggers – Nitin Agarwal, Huan Liu, Lei Tang and Philip Yu
  • Can Social Bookmarks Improve Web Search? – Paul Heymann, Georgia Koutrika and Hector Garcia-Molina
  • Entropy of Search Logs: How Hard is Search? With Personalization? With Backoff? – Qiaozhu Mei and Kenneth Church

October 18, 2007

How Acxiom uses offline data for behavioral targeting

Filed under: Advertising, Data Collection, Datamining, People and Data, Personalization — chucklam @ 12:37 am

A really fascinating piece at the WSJ yesterday: Firm Mines Offline Data To Target Online Ads (subscription req.). There’s a particular side-bar that reveals how Acxiom uses offline data for behavioral targeting:

How Acxiom delivers personalized online ads:

  1. Acxiom has accumulated a database of about 133 million households and divided it into 70 demographic and lifestyle clusters based on information available from… public sources.
  2. A person gives one of Acxiom’s Web partners his address by buying something, filling out a survey or completing a contest form on one of the sites.
  3. In an eyeblink, Acxiom checks the address against its database and places a “cookie,” or small piece of tracking software, embedded with a code for that person’s demographic and behavioral cluster on his computer hard drive.
  4. When the person visits an Acxiom partner site in the future, Acxiom can use that code to determine which ads to show…
  5. Through another cookie, Acxiom tracks what consumers do on partner Web sites…

It’s an interesting approach. While Acxiom has offline information on “133 million households,” it’s not clear how many households actually have gotten the offline-tracking cookie.

One can imagine Google really taking advantage of an approach like this. The wide spread use of Google Analytics across web sites already gives Google the potential to track your surfing habit. Specifying your home address when you use Google Maps or Checkout allows Google to match you with offline marketing databases. And we haven’t even talked about your query data and your emails yet…

October 14, 2007

Interesting job posting for a click fraud analyst

Filed under: Advertising, Datamining, Pattern recognition — chucklam @ 2:09 am

This is one of the most interesting job posts I’ve come across in a while. It’s a datamining contest in which contestants with interesting approaches get to apply for an analyst position. No word on whether the contest itself has any other prize 🙂

Data Mining Contest: Uncover Criminal Activity in a Real Fraud Case

The purpose of the contest is to identify a particular type of highly sophisticated fraud that appeared in a pay-per-click advertising program. The fraud targeted one advertiser only. Participants providing an interesting answer will be invited to apply for a Sr. Fraud Analyst position with Authenticlick. The small spreadsheet with the fraudulent click data can be downloaded here. Your answers should be emailed to vlg @

The two questions are

  • Why are these clicks fraudulent?
  • What type of click fraud is it?

You can check references such as our click scoring blog to help you answer the questions.

September 25, 2007

Social networks turning to targeted advertising

Filed under: Advertising, Personalization — chucklam @ 5:04 am

I saw a number of articles in the last couple months on how Myspace and Facebook are turning to personalized ad targeting. Just thought to note them here.

September 10, 2007

How AdSense can be improved with NLP

Filed under: Advertising — chucklam @ 3:25 pm

I have this web site that I’ve been using for the last 10 months to study web advertising and monetization. I was surprised to find that begging users for money is as effective as AdSense. Partly it’s because users were more willing to send money than I had expected. Partly it’s also due to the fact that AdSense hasn’t worked well for my site.

To give a little background, is a site for amateur web designers who want to put rounded corners in their web page design. The AdSense ads I see now are

  1. Corner Protectors
  2. Nissan Lights on Sale
  3. Corner Board
  4. Help Elect Barack Obama (banner ad)

Of course, you may see different ads than I do. (And I’ll see different ones if I refresh.) The point is that they’re generally pretty irrelevant.

When I first put up RoundedCornr and saw the useless ads, my first reactions was to ping my friends working at AdSense to see if they would suggest anything. Well, I can safely say that having friends at Google doesn’t help you much. (And their “we’re not clueless, we’re just secretive” stance has never really worked on me…) I was told that AdSense is not optimized for “this kind of web site,” by which I assume they mean it works better for blogs and news sites. I was also told to change some of the wording to avoid triggering some of the bad ads. Well… it’s hard to avoid using the word “corner” in describing my site, and some ads just seem totally unrelated to anything I’ve said on my site anyways. I was also told to wait until the system learns from user click-throughs. Well… it’s been 10 months now…

For things like banner ads, it’s pretty easy to figure out why it’s so bad: there just isn’t enough inventory. Unfortunately, the content of these banner ads is also what I have the most issues with. I haven’t decided on which presidential candidate to support yet, so it’s a bit misleading to have a Barack Obama ad. At some other time, the banner ad was soliciting support for more border patrols. (Maybe it was triggered by my description of rounded corners with “border”.) I had tried to remove that banner ad since it doesn’t reflect my political belief, but AdSense didn’t like to give its users much control.

For text ads, where inventory is not a problem, what can AdSense do to be more targeted? Personalization and behavioral targeting have been suggested before, but I think a little natural language processing will help a lot more. And unlike search, NLP for contextual advertising can help without needing much change in user behavior. The technology needed is also more achievable and should be within the grasp of technologists including Powerset (see their first public demo here.) The idea is to increase the semantic understanding of page content so that advertising is more semantically relevant.

Specifically, there’re two technologies that I’m thinking of. One is to use a language parser that pick out the main subject words in the sentences of a page. Instead of indexing keywords based on simple statistics, one only indexes nouns and noun phrases. (Verbs, adverbs, etc. are quite secondary in terms of semantic understanding, especially for advertising.)

The other NLP technology to use is word sense disambiguation. A word like “jaguar” has many senses. One sense being a type of animal, another being a brand of cars. Automated techniques exist to figure out which sense is being used in a sentence. An AdSense advertiser should then be able to specify that she wants to advertise on pages that talk about “jaguar” in the car sense of the word, and not just any page that mentions “jaguar.”

Granted, it’s a lot easier said than done. Word usage on the web requires algorithms to be more dynamic and scalable than most academic research has looked at. However, “semantic contextual advertising” is still simpler than “semantic search.” If I have the Powerset technologies, I’d seriously look at contextual advertising as another business model to go after.

September 8, 2007

AdSense versus just begging your users for money

Filed under: Advertising, Personalization — chucklam @ 4:46 pm

Last December I built a little web site for fun called The site is a simple way of automatically generating HTML/CSS code for creating rounded corners on a web page design. In addition to being a fun project, I also used the site as an opportunity to learn about several advertising/monetization schemes.

The site gets between 400 to 900 pageviews per day, with an average of 740 and a grand total of 205,000 to date. (As far as monetization is concerned, the site is really just a one-page site.) I have various monetization schemes on the page including Google AdSense, Amazon Omakase, some affiliate marketing run by Commission Junction, and just plain begging for a $5 “fee.” (I ran into trouble with the Google Checkout police when I called it a “contribution,” but that’s another story.) The affiliate marketing is chosen by me so it’s highly relevant, but it’s gotten no revenue at all in the last 10 months 😦 The Amazon Omakase program is Amazon’s contextual advertising/referral program. You can think of it as AdSense with “behavioral targeting” but only promotes products on Amazon and you get paid by referrals instead of clicks. Personally I’m impressed with how well targeted the Amazon ads are, but they’re quite poor in terms of monetization. People don’t seem to be clicking on them much and so far I’ve only had one who ended up purchasing something. I don’t know why such well targeted ads can do so poorly. My hope is that it’s just specific to my site’s audience. Maybe they already got their web design book or whatever they need from Amazon and have no need to purchase more…

The surprising thing is that AdSense has so far only made a few more dollars than begging. Begging has gotten me $120 in the last 10 months, and the users have to scroll all the way down the web page and read the details to even know that I’m begging for $5. A straightforward calculation would say that I’ll lose half my revenue if I take out all the ads, but my sense is that a lot more users will feel more comfortable paying the $5 “fee” when they don’t see any ads on the page. Will that completely compensate for the lost of AdSense revenue? I don’t know. Maybe, maybe not, but I do feel that users’ contribution is a better “quality” income than advertising revenue. In fact, I think AdSense advertising is so bad on my site that I’m surprised anyone has clicked on them at all, even though somehow there were 1500 clicks. Of course, in the grand scheme of things, the financial impact is so little that it’s not worth strategizing over.

Anyways, that’s one data point from me. Anyone else want to share their experience?

September 3, 2007

Google AdSense enters affiliate marketing

Filed under: Advertising — chucklam @ 10:43 pm

I don’t remember hearing this anywhere else, so I was surprised when I checked the AdSense site to find the AdSense Referrals program had expanded from promoting just Google products (e.g. Google Apps, Google Pack, Firefox) to third party products.

Most people think of AdSense as “AdSense for Content,” which is you letting Google put ads on your site and you earn money on a pay-per-click model. Referrals (aka affiliate marketing) is very similar except you earn money through a pay-per-action model. The “action” is determined by the advertiser, and it can mean an actual purchase, filling out a form, a software download, etc. The main strategic factor differentiating pay-per-click and pay-per-action is “conversion risk,” or the risk of whether someone who’s clicked on an ad will actually convert into a buyer (or take some other action). Right now the advertisers are taking that risk, and judging from the number of advertisers signing up for Google, it’s a worthy risk to take. The bet with pay-per-action is that Google is in a even better position to take on such risks, as Google can aggregate and smooth out the uncertainty and leverage its informational advantage to in fact reduce such risk.

Of course, other factors will be involved in the success of the program as well. It’s not clear that publishers (i.e. site owners) would care much for the PPA model, although Google may only be testing PPA on AdSense first before moving it to AdWords on the main search engine site. (The truth is, any publisher who cares much about advertising income would’ve gone beyond AdSense long time ago, but that’s for another post.) Advertisers may be hesitant to share so much information with Google, especially if Google is also their main source of traffic. The amount of work needed to integrate all the tracking/accounting is also non-trivial, which may turn off many advertisers, although here Google Checkout may eventually play a role.

The best known affiliate marketing networks are Amazon, eBay, and Commission Junction. I don’t think any of them will be too happy about this development from Google.

AdSense Referrals description

AdSense Referrals categories

August 7, 2007

Advertising’s digital future

Filed under: Advertising, Datamining, Personalization, Statistical experimentation — chucklam @ 12:23 pm

The New York Times yesterday had an article on advertising’s digital future. It mostly discussed the view of David W. Kenny, chairman and chief executive of Digitas, the advertising agency in Boston that was acquired by the Publicis Groupe for $1.3 billion six months ago.

The plan is to build a global digital ad network that uses offshore labor to create thousands of versions of ads. Then, using data about consumers and computer algorithms, the network will decide which advertising message to show at which moment to every person who turns on a computer, cellphone or — eventually — a television.

“Our intention with Digitas and Publicis is to build the global platform that everybody uses to match data with advertising messages,” Mr. Kenny said.

That is, advertising in the future will be much more data driven. Now, if we take that vision for granted, then the interesting question will be Who will end up controlling what data? No doubt Mr. Kenny would love to see advertising agencies being the central gateway, if not the outright owner, of all such data. However, privacy advocates, media companies, new “intermediaries”, and search engines like Google all have different ideas about their ownership of data and their place in this advertising future. It’s too early to tell how things will turn out, and everyone is making educated guesses.

“How do we see Google, Yahoo and Microsoft? It’s important to see that our industry is changing and the borders are blurring, so it’s clear the three of those companies will have a huge share of revenues which will come from advertising,” said Maurice Lévy, chairman and chief executive of the Publicis Groupe.

“But they will have to make a choice between being a medium or being an ad agency, and I believe that their interest will be to be a medium,” he added. “We will partner with them as we do partner with CBS, ABC, Time Warner or any other media group.”

I wonder if Mr. Lévy has considered the possibility that in this digital future, Google may in fact be CBS, ABC, and Time Warner combined.

Older Posts »

Blog at