acm-header
Sign In

Communications of the ACM

BLOG@CACM

In Support of Open Reviews; Better Teaching Through Large-Scale Data Mining


BLOG@CACM logo

http://cacm.acm.org/blogs/blog-cacm/100030
October 20, 2010

At ECSS 2010, the annual meeting of Informatics Europe, we heard a fascinating keynote by Moshe Vardi, editor-in-chief of Communications of the ACM, titled "The Tragedy of the Computing-Research Commons." Professor Vardi talked about the importance of engaging in activities that benefit the community even if they bring no huge immediate reward to the individuals who participate in them. He lamented the degradation of the computer science culture due, among other causes, to shoddy refereeing practices.

Lamenting about the reviewing process is common, and everyone has horror stories. Yet the simple solution is almost never considered: Turn the de fault refereeing mode to open, rather than anonymous.


BERTRAND MEYER: "Refereeing should be what it was before science publication turned into a business: scientists giving their polite but frank opinion on the work of other scientists."


Some cases may still justify anonymity, but they should be the exception, calling for a specific justification. Refereeing should be what it was before science publication turned into a business: scientists giving their polite but frank opinion on the work of other scientists. Anonymity just encourages power games, back stabbing, and, worst of all, poor quality: Since no one can call your bluff, you are not encouraged to be a good referee. Of course, many people do an excellent job anyway, but they do not necessarily prevail. In the highly competitive world of computer science publications—conference publication, in particular, with its schedule pressures—one incompetent but arrogant negative review typically outweighs five flattering and carefully considered analyses.

By revealing who you are, you force yourself to write reviews that you can defend.

More than two decades ago I started refusing to do anonymous reviews. This stance may not have only brought me new friends (which may not be a big deal as I am not sure people who hate you because you found flaws in one of their papers are worth having as scientific friends), but it has certainly made me a better reviewer. In fact, it did bring me "some" friends—people who are grateful for having gained new insights, positive or negative, into their own work.

A more complete discussion and rationale can be found in this Web page, http://se.ethz.ch/~meyer/publications/online/whysign, to which I regularly refer editors asking for reviews. That text, written several years ago, is verbose and should be rewritten, but it does include the basic analysis.

The decision to perform open refereeing was personal and, until now, I have always refrained from proselytizing. Seeing the degradation in refereeing, however, I believe such reserve is no longer appropriate. Establishing open refereeing as the default strategy is the first step toward fixing the flawed culture of computer science refereeing.

Back to Top

Greg Linden "Massive-Scale Data Mining for Education"

http://cacm.acm.org/blogs/blog-cacm/101489
November 10, 2010

Let's say, in the near future, tens of millions of students start learning math using online computer software. Our logs fill with a massive new data stream, millions of students doing billions of exercises, as the students work.

In these logs, we will see some students struggle with some problems, then overcome them. Others will struggle with those same problems and fail. There will be paths of learning in the data, some of which quickly reach mastery, others of which go off in the weeds.

At Amazon.com a decade ago, we studied the trails people made as they moved through our Web site. We looked at the probability that people would click on links to go from one page to another. We watched the trails people took through our site and where they went astray. As people shopped, we learned how to make shopping easier for others in the future.


GREG LINDEN: "Let's say we have massive new logs of what these students are doing and how well they are doing. What would a big Internet company do with this data?"


Similarly, Google and Microsoft learn from people using Web search. When people find what they want, Google notices. When other people do that same search later, Google has learned from earlier searchers, and makes it easier for the new searchers to get where they want to go.


GREG LINDEN: "Teachers might think one concept should always be taught before another, but what if the data shows us different? What if we reorder the problems and students learn faster?"


Beyond a single search, the search giants watch what people look for over time as they do many searches—what they eventually find or whether they find nothing, where they navigate to after searching—and learn to push future searchers onto the more successful paths trod by those before them.

So, let's say we have millions of students learning math on computers. Let's say we have massive new logs of what these students are doing and how well they are doing. What would a big Internet company do with this data? What would be the Googley thing to do with these logs? What would massive-scale data mining look like for students?

We could learn that students who have difficulty solving one problem would have trouble with another. For example, perhaps students who have difficulty with the problem (3x − 7 = 3) have difficulty with (2x − 13 = 5).

We could then learn of clusters of problems that will be diffcult for someone to solve if they have the same misunderstanding of an underlying concept. For example, perhaps many students who cannot solve (3x − 7 = 3) and similar problems are confused about how to move the −7 to the other side of the equation.

Also, we could discover the problems in that cluster that are particularly likely to teach that concept well, to break students out of the misunderstanding and then be able to solve all the problems they previously found so difficult. For example, perhaps students who have difficulty with (3x − 7 = 3) and similar problems are usually able to solve that problem when presented first with the easier problems (x − 5 = 0) and (2x − 3 = 1).

Then we could learn paths through clusters of problems that are particularly effective and rapid for students. Teachers might think one concept should always be taught before another, but what if the data shows us different? What if we reorder the problems and students learn faster?

We could even learn personalized and individualized paths for effective and rapid learning. Some students might start on a generic path, show early mastery, and jump ahead. Others might struggle with one type of problem or another. Each time a student struggles, we will try them on problems that might be a path for them to learn the underlying concepts and succeed. We will know these paths because so many others struggled before, some of which found success.

As we experiment, as millions of students try different exercises, we forget the paths that consistently led to continued struggles, remember the ones that lead to rapid mastery, and, as new students come in, we put them on the successful paths we have seen before.

It would be student modeling on a heretofore unseen scale. From tens of millions of students, we automatically learn tens of thousands of models, little trails of success for future students to follow. We experiment, try different students on different problems, discover which exercises cause similar difficulties, and which exercises help students break out of those diffculties. We learn paths in the data and models of the students. We learn to teach.

Back to Top

Authors

Bertrand Meyer is a professor at ETH Zurich.

Greg Linden is the founder of Geeky Ventures.


©2011 ACM  0001-0782/11/1100  $10.00

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2011 ACM, Inc.


Comments


CACM Administrator

The following letter was published in the Letters to the Editor in the February 2012 CACM (http://cacm.acm.org/magazines/2012/2/145403).
--CACM Administrator

I agree with Bertrand Meyer's blog "Fixing the Process of Computer Science Refereeing" (Nov. 2011) and "Why I Sign My Reviews" (http://se.ethz.ch/~meyer/publications/online/whysign/) in favor of open reviewing but suggest we go further with the quality of refereeing by rewarding reviewers and encouraging their contribution. Reviewing papers and grant proposals is part of academic life but receives no reward in the publish-or-perish culture, yet writing a good review requires thought and time. Perhaps Meyer's open reviewing would mean fewer reviews, as referees could be reluctant to take on reviewing work for fear of (inadvertently) writing a low-quality review.

A simple response is to credit attributed reviewers. Though their contribution is relatively limited, it is vital, and acknowledging it publicly by attaching their names to a published work is a way to acknowledge a referee's place in the scientific community.

Indeed, Meyer said "Even honest people will produce bad-quality reviews out of negligence, laziness or lack of time because they know they will not be challenged." Giving reviewers credit would be a carrot rather than a stick, assuming, of course, academic management recognizes the need for referees and rewards their effort.

Moreover, inexperienced academics must learn to review just as they learn other research practices. Editors and program committee chairs play a vital role in the process of challenging and guiding referees to produce better reviews. More important, a better quality of scientific debate would likely prevail.

Most, if not all, academics have received reviews that were constructive and helpful, though they cannot easily contact anonymous reviewers to continue the discussion. Open reviews enable that discussion, leading to more valuable work in the future.

Phil Brooke
Middlesbrough, U.K.


CACM Administrator

The following letter was published in the Letters to the Editor in the February 2012 CACM (http://cacm.acm.org/magazines/2012/2/145403).
--CACM Administrator

Bertrand Meyer's blog (Nov. 2011) argued passionately for non-anonymous reviews, an idea that may sound revolutionary to computer scientists, proposing to change the very way science is done, but in the context of science in general is not radical at all. Computer scientists with experience in interdisciplinary collaboration know that, in many areas of science, non-anonymous reviews are the norm. For example, among geologists, it is up to reviewers to disclose their names to authors, and about half the time, they do. In spite of this non-anonymity, many reviews are still harsh, and, at least in good-quality journals, the quality of accepted papers is equally high.

Vladik Kreinovich
El Paso, TX


Displaying all 2 comments