Review of Rethinking Statistical Analysis Methods for CHI

I’m starting to slog through the thick pile of papers that look interesting from CHI 2012. I wanted to start with the heavier stuff while I have the energy, so I began by looking at a paper discussing statistical methods for HCI. It helped that the first author was Maurits Kaptein, who is a great guy I met while studying abroad at TU/e in the Netherlands.

The basic gist of the paper is that HCI researchers frequently get 3 things wrong:

  1. We wrongly interpret the p-value as the probability of the null-hypothesis being true
  2. Our studies often lack statistical power to begin with, making it impossible to draw meaningful conclusions about rejecting the alt hypothesis
  3. We confuse statistical significance (e.g., p-values) with practical significance (i.e., actual difference meaningful to the real world)

Good stuff and I’m sure I’ve been guilty of both 1 and 2 frequently (I am usually fairly careful with #3). The authors don’t just point out the problem, but also give 7 suggestions for addressing this issue. The main point of this post is to critique these suggestions and perhaps give a few resources:

  1. Make bolder predictions of direction and magnitude of effects AND
  2. Predict the size of the effect likely to be found — both of these are easily said, but the problem is that HCI frequently bridges into areas uncharted by previous studies. We frequently just don’t know what the effects might be. Piloting is always useful, but frequently leads to results that are different from the deployment as most pilots are done with confederates (e.g. lab mates) to moderate the costs. Even small differences in how a study is set up or positioned could lead to HUGE changes in a field deployment (for examples, see the “Into the Wild” paper from last year)
  3. Calculate the number of participants required ahead of time — every time I have done this, the number has WAY exceeded what I actually had resources to do, but Maurits predicted this objection and suggests…
  4. Team up with other researchers to do multi-site experiments and pool results — I agree with this suggestion, though I wonder how to structure such collaborations in a community like CHI which values novelty over rigor (in my humble opinion). Maurits also suggests that we use valid and appropriate measurement instruments so that we can build on each others’ work. I agree with this SO hard that I’ve actually gone through the process of validating a questionnaire to use in evaluating the emotional aspects of communication technologies. It’s called the ABCCT and it is available freely (the final publication for this is still under review, but I could provide upon request).
  5. Use Bayesian analysis if you need to calculate the probability of the hypothesis given the data — this is great and it’s definitely new to me! To help others who are trying to learn this new way of doing stats, here are a few resources I’ve found online: (1) the section on Bayesian methods in statspages (a great resource in its own right) (2) a Bayesian t-test tutorial for R and (3) an online calculator for Bayes Factor. I still need to figure out how to put all this stuff together for the actual work that I do… What do I do with non-parametric data, for example? If somebody would write a step-by-step online tutorial for HCI researchers, I would give major kudos!
  6. Encourage researchers, reviewers, etc. to raise the standard of reporting statistical results — my translation is “reject papers that get it wrong” which is depressing. I think this would be a lot easier to do in the new CSCW-ish model of reviewing where you have a revise cycle. That way you can actually encourage people to learn it rather than just take their (otherwise interesting work) elsewhere
  7. Interpret the non-standardized sizes of the estimated effect — with this I agree unequivocally and I’d actually like to add one more point to this idea of “considering if saving 1 minute actually significant to anybody.” As HCI researchers, we are usually the ones designing the intervention so we have a fairly good idea of how difficult it would be to incorporate into existing practices. For example, fiddling slightly with the rankings produced by a search algorithm or changing the layout of a site or adding a new widget to an existing system is all fairly low effort, so even if the effect size of the produced outcome is small, it may be worth adopting. Changing your company to a new email system, changing the work flow of an existing organization, or making new hardware get adopted is really high effort, so it’s really only worth considering if the effect size of the produced outcome is quite large.

All in all, I really like this paper and its suggestions, but just to cause some intrigue I would like to point to a slightly different discussion issue. The main goal of this paper seems to be to lead HCI researchers to doing better science. But, is that really what we do? Do all HCI researchers consider themselves scientists? I know that for me it is not the most important part of my identity. I run studies as a designer. The goal for me is not to convince others that A is better than B (A and B so frequently shift and evolve that this is usually a meaningless comparison 2 years after the study is run). Rather, I run studies to understand what aspects of A may make it better than B in what situations and what future directions may be promising (and unpromising) for design. To me, the study is just an expensive design method. The consequences of “getting it wrong,” in the worst case, is spending time exploring a design direction which in the end turns out to be less interesting. There’s rarely an actual optimal design to be found. It’s all just me poking at single points in a large 3D space of possibilities. Should you reject my paper because I didn’t get the number of participants right (which I never will) even if it can inspire others to move towards promising new designs? Just because I didn’t prove it, doesn’t mean that there isn’t something interesting there anyway. Maybe, a large proportion of HCI studies are meant to be sketches rather than masterpieces.

Family Tech Workshop @ CHI 2012

Yesterday, I got a chance to participate in the “Technology for Today’s Family” workshop at CHI 2012. Even though I arrived late (because of having to be at commencement the previous day), I felt like I got a lot out of it. Not only was it great to see some great folks who do work in this space, but I got a lot of ideas for future things I want to try. These weren’t necessarily long discussions during the workshop, but these are things that I jotted down:

  1. “Contrarian Design” — this is a technique for defamiliarization and eliciting design feedback. Instead of considering what may be a good design in the space, it focuses on eliciting bad designs. Looking at these ideas can point to issues of importance to families that might otherwise be difficult to elicit. For example, in the workshop Scott Mainwaring (Intel) role played as a grandmother and presented a contrarian design communication gadget that had an inaccessible interface, presented meaningless shared information (Tweets), and required a service fee each time you used it. This is a fun exercise and I think I’ll incorporate it into my interviewing practices in the future.
  2. Using Kickstarter as a way for quickly gauging interest in a design idea — just put the idea up on kickstarter.com to get the crowd to give you feedback on your design. In the future, I might put up two or three ideas simultaneously and let them battle it out!
  3. Disseminating ideas from a workshop — I think the difficulty every workshop faces is the questions of “what next?” I’ve certainly been in my share of workshops where there was talk of special issues or books at the end, but this only rarely pans out. So, what is there between the “see you around!” and “we’ll write a special issue!”? I thought that we actually came up with some great ideas for keeping the momentum going. In the short term, each workshop member committed to posting about the workshop to their social media of choice. In the medium term, we will write a short article for the interactions magazine to discuss some of the biggest challenges in the space of designing for families. In the long term, we may pursue the more ambitious book and/or special issue of a journal goal. We were particularly excited about giving this a more method-centered spin. But, I’ll definitely be trying a similar structure for disseminating ideas at the next workshop I organize!