How to Get Me to Positively Review Your CHI Paper

Getting a paper accepted into CHI can really be a pain if you’re just starting out. I’ve been there and I sympathize. I try to be positive and constructive in my reviews, but I frequently find myself pointing out the same issues over and over again. I’m going through 7 reviews for CHI right now and there is a lot of good stuff in there, but also a couple of papers making rookie mistakes that make it hard to give the “4” or “5” score. Obviously, other people might be looking for something else, but if you do all of the things below, I would really have no reason to reject your paper.

  • Introduction: keep it brief and to the point, but most importantly, don’t overreach in saying what you’ve done. If you say that your system actually helped kids learn something, I’m going to expect to see an evaluation that supports that claim and I’m not going to be able to accept a paper that only actually measured engagement or preference. So, frame your introduction in a way that sets the expectations right.
  • Related Work: give me an overview of what has been done by others, how your work builds on that, and why what you did is different. At the end of this section, I need to have a good idea of the open problem you are working on, gap you are addressing, or improvement you are making. I actually find it helpful to draft this section before I start the study. If somebody has already addressed the same problem, it’s nice to know about it before you start rather than when you’re writing up your results (or worse, from your reviewers). On a final unrelated note, if I see a paper that only references work from one lab or one conference, I get suspicious about it potentially missing a lot of relevant work. I rarely reject a paper based on that, but it makes me much more cautious when reading the rest and I sometimes do my own mini lit-review if I suspect that there is big stuff missing.
  • Methods: give me enough information to actually evaluate your study design, otherwise I have to assume the worst. For example, if you don’t say that you counterbalanced for order effects, I will assume that you didn’t. If you don’t say how you recruited participants, I will assume that they are all your friends from your lab. If you don’t say how you analyzed your qualitative data, I will assume that you just cherry-picked quotes. The rule of thumb is: can another researcher replicate the study from your description? I will never reject a paper for small mistakes (e.g., losing one participant’s video data, using a slightly inappropriate stat test, limitations of sampling, etc.) as long as it’s honest about what happened and how that affects the findings, but I have said “no” if I just can’t tell what the investigators did.
  • Results: I basically want to see that the results fulfill the promises made in the intro, contribute to the problem/gap outlined in the related work, and are reported in a way that is appropriate to the methods. I’m not looking for the results to be “surprising” (I know other CHI reviewers do, however), but I do expect them to be rigorously supported by the data you present. The only other note on the results section is that I’m not looking for a data dump, I probably don’t need to see the answer to every question you asked and every measure you collected — stick to the stuff that actually contributes to the argument you are making (both confirming and disconfirming) and the problem at hand.
  • Discussion: this is the section where I feel a lot of papers fall short. I won’t usually reject for this reason alone if the rest of the paper is solid, but a good discussion can lead me to checking that “best paper nomination” check box. Basically, tell me why the community should care about this work? If you have interesting implications for design, that’s good, but it’s not necessary and there’s nothing worse than implications for design that just restate the findings (e.g., finding: “Privacy is Important,” implication: “Consider Privacy”). When I look at an implication for design (as a designer), I want to have an idea of how I should apply your findings, not just that I should do so. But alternatively, I would like to hear about how your investigation contributes something to the ongoing threads of work within this community. Did you find out something new about privacy in this context that might be interesting or might be a new perspective of thinking about privacy in HCI? Does this work bring out some interesting new research directions and considerations (not just, “in the future, we will build this system”)? If you used an interesting/new/unusual method in your work, that could be another thing to reflect on in the discussion, because your approach could be useful to other investigators.

Okay, I have given away all of my reviewing secrets, because I don’t like rating good work down when the paper fails to present it with enough detail, consistency, or reflection. I hope that this is helpful to somebody out there! I say “yes” to a lot of papers already, but I’d like to be able to accept even more.

What next, Ubicomp?

Gregory Abowd was my Ph.D. advisor at Georgia Tech. Those of your who know him will not be surprised that he is sharing his opinions loudly and looking to start a debate with others in the community. His vision paper this year has provided an interesting look back (and forward) at the Ubicomp conference and he asks us “What next, Ubicomp?”

You can read the whole paper and join in the Facebook discussion, but here are the main points of his paper, for the lazy:

  • Ubicomp (the paradigm) is so accepted as part of all computing that it is no longer a meaningful way to categorize computing research
  • Ubicomp (the conference) has many successes to celebrate, including: popularizing “living labs” style investigations, the “your noise is my signal” intellectual nugget, and bringing together researchers from diverse disciplines
  • Ubicomp (the conference) values both “application agnostic” novel technologies and “application driven” investigations of established technologies. And that’s good!
  • Ubicomp (the paradigm) embodies the “3rd generation” of computing. The next generation may bring a blurring of the lines between the human and the computer through cloud, crowd, nano, and wearable technologies.

During the presentation, he was also a bit incendiary to generate discussion, saying that the bad news is that: (1) defining Ubicomp at the 3rd generation means the generation might be over and we have to move on, (2) a lot of people don’t think of submitting their relevant work there, but rather put it in other places, (3) Ubicomp as a research area is dead because ubiquitous computing is now, in fact, ubiquitous. The paper generated quite a bit of discussion at the conference. I want to add my two cents and (hopefully) put this idea out to a wider audience.

The main point that I want to make is that there is a distinction between Ubicomp the conference and Ubicomp the paradigm. I make these distinctions explicitly in the summary above, but the two points were a bit muddled in the paper and in the discussion. Yes, Ubicomp the paradigm is becoming so common-place that it may no longer be an interesting way to categorize one’s work, but Ubicomp the conference seems to mostly have papers that focus on a very specific brand of that paradigm. My long name for Ubicomp would be “enabling cool sensors and application of cool sensors in the wild” (with inversely varying degrees of “cool” and “wild”). Pretty much all of the papers in this year’s proceedings fall into this category. From an informal survey of my colleagues, this also seems to be the general perception of the kind of paper you might think about submitting to Ubicomp and may help understand why work that Gregory views as relevant to the paradigm doesn’t get submitted to the conference. Would trying to change the perception of Ubicomp the conference to include more of the stuff that is touched upon in Ubicomp the paradigm revitalize the conference? I argue that it wouldn’t (because the paradigm is becoming less relevant to research) and that Ubicomp the conference should take an alternate approach:

  1. Embrace the perception that has developed of it in the community and strive to do (or rather, continue doing) good work in enabling and understanding use of sensors even after the low-hanging fruit are picked. Essentially, Ubicomp’s new name should be SensorComp (I’m not arguing for a formal change, but you get the idea).
  2. By any definition, the 4th generation of computing will build upon the 3rd and SensorComp can contribute to that by building ties with other communities who will find SensorComp’s work relevant, including communities focusing on applications (e.g., health), technologies (e.g., wearables), and paradigms (e.g., social computing). I especially like the idea of collocating with relevant conferences once in a while.

So, rather than arguing that the conference should change or that it should attract different kind of work (good luck, cat herder!), I say that the conference should embrace what it does well, become THE place to publish that sorts of work, and be really in-your-face about it to other communities who would find that sort of work useful. Gregory says that Ubicomp is dead. I say long live SensorComp!

Now, please tear my ideas to bits, esteemed colleagues.

Review of Rethinking Statistical Analysis Methods for CHI

I’m starting to slog through the thick pile of papers that look interesting from CHI 2012. I wanted to start with the heavier stuff while I have the energy, so I began by looking at a paper discussing statistical methods for HCI. It helped that the first author was Maurits Kaptein, who is a great guy I met while studying abroad at TU/e in the Netherlands.

The basic gist of the paper is that HCI researchers frequently get 3 things wrong:

  1. We wrongly interpret the p-value as the probability of the null-hypothesis being true
  2. Our studies often lack statistical power to begin with, making it impossible to draw meaningful conclusions about rejecting the alt hypothesis
  3. We confuse statistical significance (e.g., p-values) with practical significance (i.e., actual difference meaningful to the real world)

Good stuff and I’m sure I’ve been guilty of both 1 and 2 frequently (I am usually fairly careful with #3). The authors don’t just point out the problem, but also give 7 suggestions for addressing this issue. The main point of this post is to critique these suggestions and perhaps give a few resources:

  1. Make bolder predictions of direction and magnitude of effects AND
  2. Predict the size of the effect likely to be found — both of these are easily said, but the problem is that HCI frequently bridges into areas uncharted by previous studies. We frequently just don’t know what the effects might be. Piloting is always useful, but frequently leads to results that are different from the deployment as most pilots are done with confederates (e.g. lab mates) to moderate the costs. Even small differences in how a study is set up or positioned could lead to HUGE changes in a field deployment (for examples, see the “Into the Wild” paper from last year)
  3. Calculate the number of participants required ahead of time — every time I have done this, the number has WAY exceeded what I actually had resources to do, but Maurits predicted this objection and suggests…
  4. Team up with other researchers to do multi-site experiments and pool results — I agree with this suggestion, though I wonder how to structure such collaborations in a community like CHI which values novelty over rigor (in my humble opinion). Maurits also suggests that we use valid and appropriate measurement instruments so that we can build on each others’ work. I agree with this SO hard that I’ve actually gone through the process of validating a questionnaire to use in evaluating the emotional aspects of communication technologies. It’s called the ABCCT and it is available freely (the final publication for this is still under review, but I could provide upon request).
  5. Use Bayesian analysis if you need to calculate the probability of the hypothesis given the data — this is great and it’s definitely new to me! To help others who are trying to learn this new way of doing stats, here are a few resources I’ve found online: (1) the section on Bayesian methods in statspages (a great resource in its own right) (2) a Bayesian t-test tutorial for R and (3) an online calculator for Bayes Factor. I still need to figure out how to put all this stuff together for the actual work that I do… What do I do with non-parametric data, for example? If somebody would write a step-by-step online tutorial for HCI researchers, I would give major kudos!
  6. Encourage researchers, reviewers, etc. to raise the standard of reporting statistical results — my translation is “reject papers that get it wrong” which is depressing. I think this would be a lot easier to do in the new CSCW-ish model of reviewing where you have a revise cycle. That way you can actually encourage people to learn it rather than just take their (otherwise interesting work) elsewhere
  7. Interpret the non-standardized sizes of the estimated effect — with this I agree unequivocally and I’d actually like to add one more point to this idea of “considering if saving 1 minute actually significant to anybody.” As HCI researchers, we are usually the ones designing the intervention so we have a fairly good idea of how difficult it would be to incorporate into existing practices. For example, fiddling slightly with the rankings produced by a search algorithm or changing the layout of a site or adding a new widget to an existing system is all fairly low effort, so even if the effect size of the produced outcome is small, it may be worth adopting. Changing your company to a new email system, changing the work flow of an existing organization, or making new hardware get adopted is really high effort, so it’s really only worth considering if the effect size of the produced outcome is quite large.

All in all, I really like this paper and its suggestions, but just to cause some intrigue I would like to point to a slightly different discussion issue. The main goal of this paper seems to be to lead HCI researchers to doing better science. But, is that really what we do? Do all HCI researchers consider themselves scientists? I know that for me it is not the most important part of my identity. I run studies as a designer. The goal for me is not to convince others that A is better than B (A and B so frequently shift and evolve that this is usually a meaningless comparison 2 years after the study is run). Rather, I run studies to understand what aspects of A may make it better than B in what situations and what future directions may be promising (and unpromising) for design. To me, the study is just an expensive design method. The consequences of “getting it wrong,” in the worst case, is spending time exploring a design direction which in the end turns out to be less interesting. There’s rarely an actual optimal design to be found. It’s all just me poking at single points in a large 3D space of possibilities. Should you reject my paper because I didn’t get the number of participants right (which I never will) even if it can inspire others to move towards promising new designs? Just because I didn’t prove it, doesn’t mean that there isn’t something interesting there anyway. Maybe, a large proportion of HCI studies are meant to be sketches rather than masterpieces.