Machine learning is hard. It can be awfully tempting to try to skip the work. Can't we just download a machine learning package? Do we really need to understand what we are doing?
It is true that off-the-shelf algorithms are a fast way to get going and experiment. Just plug in your data and go.
The only issue is if development stops there. By understanding the peculiarities of your data and what people want and need on your site, by experimenting and learning, it is likely you can outperform a generic system.
A great example of how understanding the peculiarities of your data can help came out of the Netflix Prize. Progress on the $1 million prize largely stalled until Gavin Potter discovered peculiarities in the data, including that people interpret the rating scale differently.
More recently, Yehuda Koren found additional gains by supplementing the models to allow for temporal effects, such as that people tend to rate older movies higher, that movies rated together in a short time window tend to be more related, and that people over time might start rating all the movies they see higher or lower.
In both cases, looking closely at the data, better understanding how people behave, and then adapting the models yielded substantial gains. Combined with other work, that was enough to win the million-dollar prize.
The Netflix Prize followed a pattern you often see when people try to implement a feature that requires machine learning. Most of the early attempts threw off-the-shelf algorithms at the data, yielding something that works, but not with particularly impressive results.
Without a clear metric for success and a way to test against that metric, development stops there. But, like Google and Amazon do with ubiquitous A/B testing, the Netflix Prize had a clear metric for success and a way to test against that metric.
There are a lot of lessons that can be taken from the Netflix contest, but a big one should be the importance of constant experimentation and learning. By competing algorithms against each other, by looking carefully at the data, by thinking about what people want and why they do what they do, and by continuous testing and experimentation, you can reap big gains.
I just spent the afternoon working with teenagers at some of our summer school workshops. As luck would have it, we had two different sessions running on the same afternoon, and while galloping between labs, it occurred to me some interesting things were going on. First, a bit about the workshops; the summer schools were both for 17- and 18-year-olds, both were set up to encourage young people to study computer science, and both involved building virtual worlds. One of the workshops, on making computer games using the Neverwinter Nights 2 toolset, lasted for just two hours, and the other was the final presentation session of an eight-week project on Second Life programming. Both of them went very well from the point of view of introducing young people to the fun aspects of computer science. Whether they pay off in terms of recruiting people to study our degree courses in CS remains to be seen. But you have to start somewhere, right? Here are some things I noticed that might be useful to others who are interested in schools' outreach and recruitment.
Rather than pushing our agenda of what we think is important and berating young people that they ought to find it interesting, we need to meet them halfway. We need to start from their interests, and then help them to see how computer science knowledge can help them achieve something that appeals to them. As in "You're interested in alcohol and The Simpsons. Ideal. How about you make a 3D Homer Simpson whose arm can move up and down to drink beer?" At that point you can start explaining the necessary programming and math concepts to do the rotation in 3D space. Or even just admire what they have figured out by themselves. Once you have them hooked on programming or signed up on your degree program, you can build on it. I'm not saying we don't need to teach sober, serious, and worthy aspects of computer science. Of course we do. I'm just saying we don't need to push it immediately. It's kind of like when you have a new boyfriend and you know you have to introduce him to your weird family. Do you take him to meet the mad uncle with the scary eyebrows straight off? No, you introduce him to a friendly cousin who will make him feel at home and has something in common with him.
What I'm suggesting is not new—there are pockets of excellent outreach work with kids in various parts of the world. I think it's time we tried more of it, even although it is time consuming. After all, we know we can recruit hardcore computer scientists to our degree programs with our current tactics (you know, the people who are born with silver Linux kernels in their mouths). But given there aren't that many of them, it's well worth the effort to reach out to the normal population. Unleash the inner computer scientist in everyone!
Mobile phones are a way of life in Japan, and this aspect of the culture manifests itself in many ways. Among the more remarkable are the ubiquitous quick response (QR) codes that adorn a sizable percentage of billboards, magazines, and other printed media. In brief, these two-dimensional bar codes offer camera phones with the appropriate software an opportunity to connect with Web-based resources relating to the product or service featured in an advertisement. Encoding a maximum of 4,296 alphanumeric characters, or 1,817 kanji, QR codes are a forerunner of ubiquitous computing technology and portend great things to come.
What's remarkable to me is, for all our similarities, how widely divergent American and Japanese urban cultures can be. The market penetration numbers aren't that strikingly different; a March 2008 study showed that more than 84% of Americans own a cell phone, where a Wolfram Alpha query shows that 83% of Japanese own one. The differences in practice, however, could not be more pronounced. In terms of mobile phone use, walking the streets of Japan is like being on a college campus all the time. It's not unreasonable to estimate that every fifth person is interacting in some way with a mobile device, and here's the rub on this point—Americans make calls on their phones, the Japanese interact.
Ubiquitous Web access and widespread support for the mobile platform, in addition to the vastly increased data-transfer capabilities, mean Japan is a society in which cell phones are a practical mobile computing platform. QR codes have blossomed in this culture not only because they're immensely useful to both organizations and consumers, but because the cultural soil is ripe for their adoption. QR codes have been met with lukewarm response in the U.S., and I fear it may be yet another mobile technology to which we get hip three to five years behind the curve.
Irrespective of this, the applications of QR codes in Japan are at times astounding. For many high-dollar corporations, such as Louis Vuitton and Coca-Cola, the QR code is the ad (art?) itself. Oftentimes, the QR code is the actual content, made of something unexpected or even a medium for digital activism. Because of its robust digital format, creative marketers have a lot of wiggle room when it comes to creating eye-catching, market-driven applications of this technology and, like ubiquitous translation technology, it's the widespread use of Internet-enabled phones that underlies this technological paradigm shift.
©2009 ACM 0001-0782/09/1000 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2009 ACM, Inc.
No entries found