By David A. Ross, MD, PhD, assistant professor, Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut
“I felt the inspiration for the method (college hockey rankings?) to be poorly attributed.” – Reviewer B
Our reviewer was asking the same question that everyone asks about this research: “Seriously? Where did you get this idea from?”
The answer: Seriously. I got it from watching hockey.
In late winter / early spring of each year, I spend inordinate amounts of time thinking about two things: recruiting the next class of residents and how well our hockey team will perform in the post-season. I wish I could say that I was able to neatly compartmentalize my work, but the truth is that I spend plenty of time at the rink mulling over residency issues.
This conflation of interests reached new heights in 2010-2011. It was my second year as Associate Program Director for our residency, we had recently discovered the “Bob Effect”, and we were struggling to figure out how to fix the system. Briefly, the “Bob Effect” posits that: of two faculty members with the same first name, one is temperamentally cheery and tends to rate each of the applicants he sees a couple of points higher than other faculty assessing them. The other is a bit more… “down to earth”… and tends to rate applicants a couple of points lower than other faculty assessing them. Each applicant is seen by 4 faculty members. Because of the temperamental difference in Bobs, the simple substitution of one for the other in who interviews an applicant – having nothing to do with the applicant him or herself – could lead to a shift of more than 40 places on our rank list. (We discuss the Bob Effect in more detail in our recent Academic Medicine report.)
Meanwhile, Yale’s hockey team spent most of the winter ranked #1 in the country. As the NCAA tournament approached, I was fixated on the question of how favorable a seed we would get, and I spent lots of time online studying the methods used to determine the bracket. (It would be disingenuous if at this point I did not acknowledge the extent to which my hockey obsession may have reciprocally intruded on my regular work hours.)
Studying these problems side-by-side, I realized that an elegant solution already existed to our applicant ranking problem. Over the next two years we gradually refined our efforts until they morphed into the ROSS-MOORE system that we describe in the paper*. While our hockey team did clinch the #1 overall seed for the 2011 tournament and, with it, what was ostensibly the easiest bracket in the tournament, they were eliminated in the second round (due to egregious malfeasance by the referees).
The 2012-2013 season was a different story. We had a younger group of players with far less experience and less flashy “star power”. Everyone regarded the season as a “rebuilding year” except for the players themselves, who rallied around the maxim from the Greek philosopher Heraclitus: “character is destiny”. They lost the last two games of the conference playoffs and only secured an NCAA tournament bid after Michigan lost in the very last game of the season. Yale was the #15 seed (out of 16 teams) and had a frighteningly difficult draw. No one outside of New Haven (and probably no one inside of New Haven) expected what happened next: the team played the four best games in the history of the program, defeating, in order, the #2, #7, #3, and #1 ranked teams in the country on their way to winning Yale’s first national championship.
The Future: Evidence Based Recruitment
Our focus on increasing rank list accuracy is part of a broader initiative to optimize recruitment outcomes. By understanding the system more fully, we hope to improve the sophistication of our model, to continue refining our algorithm, and, more generally, to expand our ability to apply scientific method to continued inquiry. By creating a system that allows us to conduct experiments on different aspects of the process, we can now use empirical data to conduct Evidence Based Recruitment.
Last year, we conducted a beta-test of our new approach with a psychiatry program at another major university. They submitted de-identified data from their previous year’s recruitment. We analyzed the noise and bias contained in that data and made recommendations to them for changes to their interview structure/process to improve the overall quality of their data. Their faculty continued to assess applicants using a traditional scale; faculty were also asked to submit an ordinal rank list of all applicants that they saw. We provided the program with a traditional rank list and a list that was based on our new algorithm and we encouraged their rank committee to consider both lists as they determined their final rank order list (ROL).
Qualitatively, the program reported being extremely pleased with the overall process and felt that using the ROSS-MOORE list improved the quality and ease of their deliberations. We were also able to quantitatively assess similarity between Traditional Rank, MOORE, and the final ROL and we found that our new algorithm was a better predictor of final rank. Though this is only a first step, we are highly encouraged by this experience.
In discussing our work with colleagues over the past year, we have found a high demand among programs to improve the quality of their recruitment and ranking process. Accordingly, this year we are conducting a multi-site study that will compare different rank methods across a wide variety of residency programs. If you and your program are interested in participating or learning more about our work, please feel free to contact me directly (email@example.com) –
Best wishes to all for this year’s Match!
*Various individuals have suggested to us—at times with an implicitly disparaging tone—that the choice of using the PWR as the basis for our new system is somewhat arbitrary: there are many other approaches to ranking (ordinal and otherwise) – why choose this one?
I don’t consider the choice arbitrary so much as a serendipitous discovery. It is now common for many fields in applied science to turn to biological models for inspiration (e.g. creating super-strength adhesives that are based on the micro-structure of gecko feet). These approaches are predicated on the idea that over time, biological pressures and evolution will create a product that is ideally “designed” for its environment.
The pair-wise ranking system is also the result of an intense evolutionary process: the time scale may be brief by biological standards, but the system is the byproduct of many professional statisticians working for years under enormous financial and institutional pressures. The result is something that is very well designed for its niche.
Oh. And we’ve also tested our model against each of the ones that have been offered as alternatives. It works better. (Maybe the hockey guys got it right…)
**N.B.- Because the main article was submitted and accepted for publication months before the hockey post-season, the authors deeply regret having missed the opportunity to formally express our gratitude to Coach Allain and to Athletic Director Tom Beckett in the Acknowledgments section of the paper. We appreciate the reader’s indulgence as we do so now.