Transcript for Using Machine Learning in Residency Applicant Screening

Below is a transcript of the following Academic Medicine Podcast episode:

Using Machine Learning in Residency Applicant Screening
September 20, 2021

Read more about this episode and listen here.

Toni Gallo:

This episode is brought to you by Stanford Medicine. Are you an aspiring or early-career physician leader, looking for the skills to help take your career to the next level? Look no further. At Stanford Medicine we recognize the need to foster physician leaders in health care. That’s why we developed the Physician Leadership Certificate Program. This six-month cohort based program includes live virtual sessions, self-paced learning modules, professional coaching, a capstone project and so much more. Providing C-suite education for the non C-suite physician leader.

Toni Gallo:

This selective program provides evidence-based frameworks for personal growth, vital management and leadership skills, and helps you cultivate meaningful connections within your own teams. We encourage all early-career and aspiring physician leaders to apply. The application deadline is October 18, 2021. To find more information about this leadership program, and to apply, visit physicianleadership.stanford.edu.

Toni Gallo:

Hi, everyone, I’m Toni Gallo. I’m a staff editor with the journal. And every year academic medicine publishes the proceedings of the annual Research in Medical Education sessions that take place at the AAMC’s Learn Serve Lead meeting.

Toni Gallo:

This year, there will be on demand presentations of the RIME papers available through the Learn Serve Lead meeting platform and live Q&A sessions with some of the authors. The RIME papers themselves, including the one we’ll be talking about today, they’re all available for free to read now on academicmedicine.org. And the full RIME supplement will be available in November.

Toni Gallo:

Like last year, I’ll be talking to some of the RIME authors on this podcast about their medical education research and its implications for the field. So for the first of those conversations, I’m joined by RIME committee member, Dr. Mahan Kulasegaram. And we’ll be talking to Dr. Jesse Burk-Rafel who co-authored the paper “Development and Validation of a Machine Learning Based Decision Support Tool for Residency Applicant Screening and Review.”

Toni Gallo:

And I’ll put the link to that paper in the notes for this episode. Jesse is no stranger to this podcast, he served as the journal’s first assistant editor for trainee engagement, and he cohosted our episodes on wellbeing and the learning environment and on podcasts as educational tools. And you can find those episodes in our archive. So with that, let’s start with introductions. Mahan, you want to go first?

Mahan Kulasegaram:

Yeah, let me say again, it’s a pleasure to be here with you, Toni and Jesse. I had the good fortune to edit Jesse’s paper as part of the RIME supplement and certainly is one of the most exciting and innovative papers in the issue this year, and many of the issues that I’ve read. I’m Mahan Kulasegaram, I’m a scientist at the Wilson Center and in the Temerty Faculty of Medicine at the University of Toronto, where I’m also the Temerty Chair in Learner Assessment and Evaluation. And a lot of my work focuses on how we use assessment data to make decisions and improve education. And I think Jesse, your paper is a wonderful example of how we use assessment data in admissions to do that. So excited to discuss this work.

Jesse Burk-Rafel:

Awesome. I’m so thrilled to be here. I’m Jesse Burk-Rafel. I’m a second-year hospitalist at the NYU Grossman School of Medicine and the Division of Hospital Medicine there and serve as the assistant director of UME-GME innovation in our Institute for Innovations in Med Ed. Which is a sort of a very tech savvy group of medical educators doing work at the junction of informatics, artificial intelligence, and classical medical education.

Jesse Burk-Rafel:

I’m not an artificial intelligence expert, or augmented intelligence expert if you want to use that term, but I have done a fair amount of work in the space now and super excited to talk about how these worlds can kind of collide in and some of the pitfalls and challenges but also promise of this approach. So excited to be here. Thanks for having me, Toni.

Toni Gallo:

Thank you. So our discussion today is going to focus on the work that Jesse and his team did to develop and study a decision support tool that uses machine learning in residency applicant screening. And he’s going to tell us what exactly that means in just a minute. We’re going to talk about the residency application process and potential ways that AI might help mitigate some current challenges. But Mahan could you get us started and just tell us a little bit about what the residency application process looks like today? And what some of those challenges are that folks like Jesse are trying to work on.

Mahan Kulasegaram:

The process of getting into residency is probably the most stressful experience for medical students and for faculty in terms of program directors and leaders. One, it’s very high stakes. People are making decisions about their careers. And there is a lot of competition, whether we like to acknowledge it or not, between students for getting into the programs of their choice, between programs for selecting students who they think might be the most aligned with their program, or what we might term loosely the best and the brightest.

Mahan Kulasegaram:

And within this sort of environment, I think one of the things that residency programs have always struggled with is finding evidence-based approaches to selecting candidates. And admissions selection overall, in medical education, I think is still an emerging and developing field. There’s been a lot of work done in undergrad admissions around specific types of tools and processes, and less so in residency where the challenges of having many applicants for a limited number of spots is compounded by the fact that residency programs are often small.

Mahan Kulasegaram:

They don’t have the resources of a medical school, for example, in terms of faculty time, and faculty numbers, or even financial resources to select candidates. And in that context, you can imagine a program director from a very large program that’s very popular, might receive up to 8,000 applications in a year, if they’re a nationally reputed program. And have to deal with winnowing that down to a number of residents, they’re going to offer that’s in the triple digits, low triple digits often.

Mahan Kulasegaram:

And in that context, I think one of the things we’ve always struggled with as an education community is balancing feasibility and resource efficiency, with evidence-informed practice, along with a mandate to be socially accountable and responsible in how we use our tools to represent and be representative of the needs and composition of the societies that many of these institutions serve. And so Jesse’s paper on the role of machine learning and helping that process is very timely. Because it promises to at least address some of those tensions and give us one more tool in the toolbox that residency programs can work with, and hopefully one that is very resource efficient.

Toni Gallo:

So Jesse, maybe you can start by just telling us a little bit, like the basics of what machine learning is, and how did you use it in the work that you describe in your paper?

Jesse Burk-Rafel:

Awesome. Yeah. So second, completely what Mahan said about the process and the challenge, and that was definitely the gap we identified. Machine learning is a subset of AI, augmented intelligence or artificial intelligence. Basically, where a computer is learning patterns. There are different types of machine learning, the type of machine learning we’re using is called supervised machine learning.

Jesse Burk-Rafel:

And that’s where you provide input data, which is called features, or our community might know it as variables, input variables, and output information. So in the machine learning world, that’s called labels. Our community would call it sort of the output variable or the response variable in some ways. And then it uses techniques like regression and branching techniques to basically find patterns through iterative, powerful approaches. And folks may be familiar with techniques like logistic regression, that is an element of an algorithm that is used in some machine learning algorithms. And we tested it and, in our approach, and so it’s not completely foreign to many readers of Academic Medicine. But it’s really this conceptual model of allowing a machine to try to find patterns in data to predict an outcome based on that data.

Toni Gallo:

So could you tell us a little bit about the study that you describe in your paper and sort of how you developed this decision support tool, and then maybe what you learned when you went to study it?

Jesse Burk-Rafel:

Sure. So we sought to develop a support tool that would complement a program director driven, human review process for interview invitation. Really to address that gap that Mahan just mentioned, of this huge pile of applications needing to get down to a reasonable number of interview invites. But we also weren’t aiming to create a tool that we could just turn on, would do the job of program directors. Because we really felt like there were some major human review elements to this.

Jesse Burk-Rafel:

So we basically took the Electronic Residency Application Service or ERAS data from three years. It ended up being over 8,000 applicants and took the Excel file which you can get from your program director workstation, and it initially has something like 600, 700 variables. And so there’s a fair amount of cleaning and processing of that data. And that’s called feature engineering in machine learning speak, but trying to get out the input variables you think matter.

Jesse Burk-Rafel:

And after that’s done, a very important thing is done in machine learning where the data set is split. And you split it into a training data set and held out what I’m going to call a validation set in the words of medical education. Where we want to avoid the overfitting problem, this idea that maybe we’re just overfitting a pattern that doesn’t mean anything and won’t replicate in the future. And so we split it into these two sets.

Jesse Burk-Rafel:

And on that training set, you run your algorithms to say, how can we detect patterns from those input variables, the 60 Electronic Residency Application Service features that then predict invitation for interview. That was our outcome measure or our label. Because we felt like that was the biggest bottleneck in the process getting from that large pile to interview invites, and then it’s a very human process from there.

Jesse Burk-Rafel:

So it iterates, there’s quite a bit of work to figure out what the best model is, tuning that model. And then importantly, you test the performance of that model, not on the data you trained it on, not on that training set, but on that held out data that it’s never seen before. And that held out data was randomly selected, I think we used about 20% of our overall set. And you look at the performance, and you say, “Hey, how well is it predicting who got an invite and who did not?” based on historic data.

Jesse Burk-Rafel:

That model then, the final model that we found that was predicting that decision was then built into a dashboard. And this is where some of the data scientists on our team, Marina Marin and others, really took this work and elevated it to the next level by taking that algorithm and building it into a dashboard that incorporates both the ERAS data and now this machine learning information into a view that a program director can quickly go through and appraise applications in a way that helps support their decision. A decision support tool.

Jesse Burk-Rafel:

It doesn’t tell them what to do, doesn’t tell them who to select, but allows them features to be able to drill down on people that the algorithm says maybe this is someone who should get a look. Or maybe this is someone who is not a good fit for our program. So that’s the kind of summary of our approach. I’m certainly happy to talk about sort of on the results side, what we found, and then we can get into some of the nitty gritty of some of the challenges of these approaches.

Toni Gallo:

Yeah, maybe you can give us like a quick overview of maybe your key findings, and then we’ll ask some questions about kind of the residency applicant review process and some of the challenges and all that kind of stuff. But yeah, some high-level findings would be great.

Jesse Burk-Rafel:

Awesome. So at a high level, again, we were predicting interview invite based on prior data, using about 60 different variables from ERAS. And we had a pretty good performance, we look at measures like the area under this receiver operating characteristic curve or AUROC, it’s a mouthful. And our value there was .95. You’ll see that presented in a lot of papers like this, that try to find these relationships. And unfortunately, when you have a set where most people unfortunately do not get invited, it’s a little falsely high.

Jesse Burk-Rafel:

So actually, the measure we really like to look at is called the area under the precision recall curve. And for the non-machine learning folks in the audience, that’s like your average positive predictive value. So for us, that was about .75, and if we had done it by chance, that would have been our prevalence of invite, which was about 15% of our applicant pool. So you can see that it’s substantially goes above what just that chance decision would be. But that’s not really the end goal.

Jesse Burk-Rafel:

What’s really interesting about these models is you can get the summary measures, okay, it seems to be working, but then you can select a threshold for it to choose whether we’re going to say this is an invite or not. And you can change that threshold based on what your task is. So are you going to use this tool to support a screening out decision? Are you going to use this tool to support a screening in decision? For a screening out decision sensitivity becomes more important.

Jesse Burk-Rafel:

And so you might select say, for us a sensitivity threshold we had of 91% we reported on our paper. And at that sensitivity threshold, there was 85% specificity. And so how that would play out is let’s say your program has a similar sort of invite ratio as ours and you got 2,000 applications. Using this tool it would allow program directors to effectively screen out 1,500 applicants with a negative predictive value of 98%. So really very few false misses. Leaving about 500 applicants for 300 positions.

Jesse Burk-Rafel:

And so the human process could do deep, holistic review of those 500 to get to the 300. That’s just an example of a use case of how this tool can be used. In our process, we kind of took a two-stage approach. We wanted to be really careful that we weren’t introducing bias into the process, especially with a brand new tool. So the program directors use the tool only after they had reviewed the applications through their normal process, and ended up doing more of a screen in process. So not using it to screen out but actually defined sort of diamonds in the rough, folks we had maybe missed on that initial review.

Jesse Burk-Rafel:

And we reserved about 20 spots over actually two years. We reported one year in the paper, but we’ve done it two years now. And filled 20 spots of individuals who, for whatever reason, were not invited on the initial review. But the algorithm suggested, “Hey, this person could be an amazing fit for our program.” And what was probably the most interesting thing, and again, it’s sort of qualitative findings but was that when you asked the program directors to look back at those folks who were not invited initially, but the algorithm said, let’s give them a consideration it was often metrics related issues.

Jesse Burk-Rafel:

They came from maybe a school that was not as prestigious, or their Step scores were not as high. Those are features used in our selection process, like many other programs, no shocker there. But I am bullish, that this kind of approach could give a second look to people who might be actually a very good fit for our program. So that’s the high-level findings. The only last piece I would say is, we did do an analysis, what we’d call a sensitivity analysis, where we said, “Well, what if we took the same exact approach, but this time, we leave out all USMLE scores. Step 1 and Step 2 CK.”

Jesse Burk-Rafel:

We kept whether someone passed or failed, we thought that was fair, given the change to pass/fail reporting for Step 1, but we left out the actual score. And indeed the performance dropped a teeny bit, but not clinically significant amount. The model could still predict with good precision, who was likely to get an invite. And so that gives us hope that these kinds of models that incorporate multiple pieces of the application might alleviate some of the fears that some program directors do have that are very reliant on Step scores, that they might not be able to… They might have to use even worse heuristics, which we can talk about.

Jesse Burk-Rafel:

So I think that is an important finding. And it’s something that we were happy to see, but actually somewhat surprised to see that Step scores, although they were used in our process, and were part of the first model, when removed, the model could compensate for that removal through the rest of the information in the model.

Mahan Kulasegaram:

There’s a lot there to unpack. But what I really like about what you said, and I think the way you described it in your paper, is that methodologically it’s very easy to follow. And you introduce the approach and logic of how machine learning analysis works out really well. But I want to come back to something you said you’re really interested in, the idea of the second look.

Mahan Kulasegaram:

And that’s fascinating to me, particularly because we’re in this moment in time where issues of bias and exclusion and structural practices that exclude people from training are really under the spotlight. And I think admissions is probably one of the biggest places where we have a lot of work to do. And at the same time, we’ve also seen this sort of concern in the popular media and in academic circles about these biases, or structural practices being built into our systems like artificial intelligence and machine learning. These systems are not immune from these processes. So can you talk a little bit about that issue, and what we can do to either make that bias visible or mitigated and the role of machine learning might play?

Jesse Burk-Rafel:

Yeah, it’s a major concern. It was a major concern of our group. I mean, the fact of the matter is fancy as machine learning and AI sounds, they’re pretty stupid, actually. They just learn from old data and predict new things. We like to say that they’re really cool, and they are really cool techniques. And they’re very powerful techniques. But at the end of the day, they’re absolutely at risk of propagating the existing human biases that they’re trained on.

Jesse Burk-Rafel:

And we know from prior publication that there are racial, gender, sex, and other inequities in the review process across the national resident selection process. So there is no doubt about it, that machine learning models built on some data, if that data has bias, it could propagate that. What we found that was really interesting at a human level is that again, I’m coming at it from a data lens. And then I was presenting these tools to our program directors who are clinicians and are educators, but are clinician educators, not on the data side.

Jesse Burk-Rafel:

The tool itself allowed, making more explicit our process. We didn’t really track who was getting invited, what percentage relative to the overall expected percentage, who was missing out on these? What sorts of schools? Some of that was being followed. But the tool actually made it more explicit of what is our process? And are there biases in the process? I don’t want to impugn our profs, but there are biases across every program, and some may be metrics related, others may be regional or other biases.

Jesse Burk-Rafel:

So it made that more explicit. I think there’s still a lot of work to be done in the community to say, well, so it makes that explicit, what are you going to do about it? What’s the right thing to do? And let’s be honest the line between bias and selection factors is a fine one. There are selection factors that we hold up as important selection factors. But when we look at the data behind them, for example, USMLE scores, we know that there are issues with bias around who can perform well on some of these things.

Jesse Burk-Rafel:

And so it’s a complex discussion, I think, I would really urge folks who are thinking about implementing artificial intelligence or machine learning approaches in high stakes decisions like this, to use a conservative, staged approach. To not just turn something on to say, we’re just going to use it, great, off the shelf, it’s good to go. Because the performance looks good.

Jesse Burk-Rafel:

You really need to do that perspective step of saying, “Okay, let’s complement the human process and see where those gaps are.” And that’s where the program director stood back and said, “You know what, we weren’t inviting enough of this type of applicant. Let’s look, among those types of applicants using our support tool, which allowed drill down into different applicants features, who the model says also is a great fit for our program, and based on prior data should have been invited.” So I think it did actually help us mitigate bias in our process. But it’s still a long ways to go to make these algorithms themselves anti-bias.

Mahan Kulasegaram:

That’s a wonderful unintended consequence, I guess. That using this data, you have more insights about what you’re doing. And I imagine for many programs, adopting this approach, sort of a data driven, insight model, it’s going to reveal some very surprising things. But let’s take your program as an example. And let’s think five years ahead now, and you’ve been using this approach for some time.

Mahan Kulasegaram:

What looks different? Will you have more or fewer applicants? Will you be interviewing more or fewer applicants? Or will the pool of people who apply to your program will they look different? Because you’re using this system and people are becoming aware that this system exists. How do you become adaptive to the changes in applicant behavior as they try to game this thing? And try to get through that first filter?

Jesse Burk-Rafel:

Yeah, it’s a fascinating question. It is an adversarial use case, as we use that word. So like spam detection is an example of an adversarial AI issue. Where you develop this thing to prevent spam. And then the spam makers try to get past that tool. I’m less worried about that element of folk trying to game the system, because really, as you sort of mentioned, whether we want to talk about it or not, every single medical student across the country is trying to game the system. It is a game theory, kind of how can I get into the top program that I want to go to.

Jesse Burk-Rafel:

And for better or worse, programs use a lot of the same sorts of things at different programs for selection. So I think there’s going to be always that drive by applicants to make themselves as appealing as possible to programs. I will say that at the program level, a program facing tool like this, it can affect how many applications you receive. That is a systems level issue. And the Coalition for Physician Accountability, just released preliminary recommendations on the UME-GME transition.

Jesse Burk-Rafel:

And I was on that committee, and we thought long and hard about how are we systematically going to be addressing both advising on which programs might be a good fit for students and then also how do we reduce that application burden so program directors can do more holistic review. So I think there are definitely systems level issues.

Jesse Burk-Rafel:

Five years ahead from now I do hope there is more applicant facing artificial intelligence or augmented intelligence methods for helping them kind of do their Netflix selection of what movie you might like to watch. What program would you might like to choose? It really seems like this could be a good fit, you haven’t chose it, maybe let’s leave this other one out, they’re really unlikely to give you an interview, save the money.

Jesse Burk-Rafel:

So I hope there’s more work in that space. The AAMC has the Residency Explorer Tool, which is kind of moving into that space of giving some applicant compatibility. The OB/GYN specialty and Maya Hammoud’s group has a large grant from the AMA working on an applicant compatibility index and providing both programs and applicants information.

Jesse Burk-Rafel:

So I think there’s emerging work in that space, it hasn’t yet engaged AI techniques. But I think that’s where we are in five years. And then for our programs, specifically, in five years, my hope is we’ve developed a way to give every applicant a really fair, holistic review. And one piece of that we hadn’t included in our model yet was narrative components of the application. Our model uses the structured components. So the scores, the things that can be captured in fields, where they were from their letters, who wrote them, things like that.

Jesse Burk-Rafel:

But it didn’t use that their experience description said, “Hey, I worked in a student run free clinic.” And it’s very mission aligned with what our program wants. We want clinicians who are working in underserved areas coming to work at Bellevue Hospital here at NYU. And so that mission alignment piece of how could the AI start to read that narrative components, that is where I hope we move to in five years from now. And it’s something we’re actively working on.

Toni Gallo:

Maybe you could talk a little bit about that piece of it. And you mentioned holistic review, which I think has been used more on the undergraduate admissions side, then maybe into residency. But how could your decision support tool or machine learning be really used as part of a holistic review process where you are looking at all of the unique attributes of each applicant? And how does that fit in with your program? Describe maybe some of the work you’re doing or where you would like to see this going.

Jesse Burk-Rafel:

Yeah, I mean, there are different definitions of holistic review. And I think that’s important. Because the term gets thrown around. And I like the AAMC definition, it’s sort of an experience, attributes and metrics all together. The attributes and metrics have been the piece that have loomed large in the current residency selection process. Because they are easy, you can see them quickly, you as a human being can appraise them quickly.

Jesse Burk-Rafel:

And indeed, actually, the machine learning can appraise that information more easily as well. And so, in the current system, many programs are using elements of the applications that do reflect individuals’ attributes and their metrics, but maybe at the expense of kind of what their experiences have been, and how that aligns with their local mission.

Jesse Burk-Rafel:

And so that kind of idea of appraising all three of those elements, and then figuring out how it aligns with your mission as a program is the holy grail, is the North Star that folks like me, are really excited about trying to figure out where I can help with that. The fact of the matter is, half of applications never get read in the current state. So I hope we can use tools like this to kind of more smartly appraise that pile and get more applications read that are the right applications in the right place for the right program, versus the quick heuristics. Like Step 1 scores, and now with Step 1 going pass/fail Step 2 CK scores being this quick heuristic of a holistic screener.

Jesse Burk-Rafel:

Which is, of course, not what they were designed for. So I think the challenge is this. If an AI process is learning on old data, it really depends. If your existing process is not sort of mission aligned and holistic, then if it’s trained on that, it will to our prior point, it will propagate that process. So you may not want to train an AI process on an existing process that is already not holistic.

Jesse Burk-Rafel:

And I’m going to say that out loud, because there are a lot of programs that would love to have a tool that can just tell them who to invite, but I do not advocate that we train AI processes on broken processes. In contrast you can train models on say, a subset that you carefully review and use the full holistic process. And maybe that’s not the whole pool, but you review a subset, and you say, “Hey, we’re going to holistically review these and get the invite decisions.”

Jesse Burk-Rafel:

Train it on that pile that you’ve done the proper experience, attributes, metrics review. And so that’s where I see this going is trying to be a piece of the puzzle, aiding program directors and others. Also on the applicant side, we sort of talked about, to make sure that fit is good and it’s really fit around mission alignment. Not the nasty element of fit that’s being used as sort of like, a quick term for who’s got the best scores.

Jesse Burk-Rafel:

I really mean it in the truest sense of the word, the compatibility sense. I do also think AI can break big problems into smaller problems. So maybe a program says “For us, the real bottleneck is reading through all the letters of recommendation and the experiences, that’s the hard part, I can deal with the rest of it. But that part really slows us down.”

Jesse Burk-Rafel:

Well, maybe we develop processes that then try to ascertain out of that narrative data, what are the values of this person? What kind of a person is this? And how does that align with the kind of person who succeeds in my residency program? That’s what we’re looking for, is to try to find someone who’s going to succeed in my program. Not necessarily the person who’s the best on day one, but the person who’s going to grow throughout their three, four or five, six years of residency.

Jesse Burk-Rafel:

And so that’s a more narrow problem than who to invite and I think if we start to break down the overall review process into these smaller problems, and try to address those, maybe that’s where we start to have success. And there’s some great work coming out of Utah from Casey Gradick’s group there, where their med-peds group, and they’re working on using natural language processing techniques, which is a subset of AI to take those experiences fields and those letters and to try to appraise them for what are the values of this person.

Jesse Burk-Rafel:

Are they at a leadership person or teamwork person? What’s their communication skills? Does that really come through? Are they focused on underserved populations? And then you can imagine overlaying that on what are the values of my program and the people who succeed in our program. So I see that as the way forward rather than some master AI, big brother thing that tells people this is where to apply and this is who to select. I really actually see it breaking down to smaller problems complementing a human process that’s very mission aligned.

Mahan Kulasegaram:

Jesse, you’ve already sort of talked about the challenge of training these systems on data that we know has some problems associated with either how we collect it or actually in the data source itself. USMLE being a big one. We are moving away from USMLE scores. In Canada, we don’t have that problem. In the United States, that’s a major change for many, many programs.

Mahan Kulasegaram:

But I think all programs still struggle with this idea of what data do we look at, and what data should we stop looking at. In your view, as someone is working in this field what data sources do you think still need to be into the process that might help us optimize how we use AI? And what data sources besides the USMLE do you think we should start ignoring or not including in our models, as we think about the larger goals of selecting both competent trainees, but also ones that are responsible and accountable to their communities?

Jesse Burk-Rafel:

It’s a great question and it’s actually a really deep question, because then it kind of actually comes down to well, what is success in residency? How do we define that, locally and nationally? And if we can define what a success in residency looks like, then we can say these are the residents who really did succeed in our program and these were the ones who didn’t succeed in our program.

Jesse Burk-Rafel:

If we can define that ,you can use organizational psychology approaches to work your way backwards. And I really am advocating for our community to start thinking at the end product. Maybe not even residency but actually out in practice. What are the residents that we graduated, who produced exemplary care for the communities they cared for and out in practice? What are their performance in training but also out in practice?

Jesse Burk-Rafel:

Then let’s work backwards to say, okay, knowing what we know about who succeeded and who provided exemplary care to the communities that need it the most how could we then use evidence-informed selection criteria to predict who we should be recruiting? And then it’s a pipeline issue as well, early on, who are bringing into the pipeline who’s getting into medical school to supply those needs of the community?

Jesse Burk-Rafel:

So it’s actually a very deep question, because the cursory answer would be well, I can say something about all this narrative data. But the fact of the matter is, nobody knows right now what elements in the application materials predict that success in residency and beyond. And that is a major gap. It may be that there are things we currently submit to programs or, that is, medical students submit to programs when they’re applying that predict their success in that program and also their success out and practice. Their ability to grow during residency and provide exemplary care to communities in need.

Jesse Burk-Rafel:

But it may be that it doesn’t. It may be that we are not actually having them submit the right information or collecting the right information and handing that off in a trustworthy fashion from the UME to GME space. So there are deep issues in medical education, at least in America, of a lack of trust.

Jesse Burk-Rafel:

And you alluded to it initially with saying this is a competitive process. And these are folks competing sometimes within their own institution against peers for the same spots. And medical schools similarly have hesitancy of handing off with a full, kind of, let’s lay it out on the line, all the information about their students. If there were professionalism concerns, communication concerns, other concerns that are in a growth mindset, really important for programs to know, because they want to grow the person in those spaces. But from a competitive zero sum game application process are problematic for the medical schools. Because they want every single one of their medical students to match.

Jesse Burk-Rafel:

So very complex issues related to trust in UME and GME, what gets handed off. And how that information actually relates based on real data, not based on what experts think, to performance in residency and beyond. So I’m deeply interested in these questions. I have other projects, looking at CMS data across the continuum and looking at the national level to try to say, well, let’s start with that success question and work our way backwards. Because we can only design evidence informed selection processes if we have the right outcome. And I’m not sure that… I love my program director team, but I’m not sure yet that their expert decision is necessarily always the right one. So a lot of work in that space to be done.

Mahan Kulasegaram:

That’s an excellent point. Maybe if we look at it from the other perspective, as the new sort of wave of AI enhanced technologies come into the residency selection process. How do you think this changes the game for applicants? And if you were speaking to an applicant about preparing for residency selection, what type of advice would you give them in relation to how much they should worry about these processes and tools or what they should do in terms of changing the way they apply as a result of the use of these technologies?

Jesse Burk-Rafel:

Yeah I don’t want my prior answer about what we don’t know to say that I’m not completely bullish that I think these technologies are awesome, and really could empower this process to work more efficiently and smarter. I am bullish on that, I think it really will help this process work better for everyone. I am also bullish that applicants will start to see tools that help them. That don’t make it harder but actually make it easier to figure out where they’re a good fit, and where they have a reasonable chance of getting an interview and thus being ranked to match.

Jesse Burk-Rafel:

So I hope we see more of that, I hope it doesn’t become more adversarial to use that term, again. Because it’s already such a stressful process for all parties. And particularly for applicants. I care about the stress of our medical school advisors and our program directors, but frankly, I care most about the medical students’ stress and their wellbeing and the stress through that process. So I think we have to be, as each of these tools is developed and rolled out, we really have to view it through an applicant lens, first and foremost. Because at the end of the day, if it’s not serving the applicants, I don’t think it should be out there in the wild doing things.

Jesse Burk-Rafel:

And so I view it through a pilot lens. Ours locally is very much still a pilot lens. We’re not sure if this is going to keep going. We’re going to iterate it sure, try to improve it, we’ve done initial work incorporating that narrative information into it. So we now have some NLP or natural language processing that is of the experiences fields, that improves the performance of the model, and actually does end up being a very important feature in the model when we add that in. So that’s not published yet. But that’s coming out.

Jesse Burk-Rafel:

But we’re still not sure that’s the right thing to do. We still have to step back and say, is this helping applicants? Is this helping not just the programmers, but applicants, figure out what’s the right thing to do. So I really want to see these conversations out in the open. It’s challenging, because as you mentioned I think no program wants to share too much of their secret sauce with the applicants because they’re worried about gaming the system and gamification.

Jesse Burk-Rafel:

And so, it’s a balance there of the transparency needed to understand as a community, what we’re doing to help applicants find programs that are a great fit for them, really compatible, where they’re going to grow again, using it in that sense. But at the same time, recognizing that it’s a zero sum game where we have more applicants nationally than there are positions. It should be said that the number of US applicants is less than the number of positions available, so a lot of that is driven by foreign medical grads. But nonetheless, it’s perceived as a very competitive process.

Jesse Burk-Rafel:

And so I think we need to start having those open conversations about, okay, a competitive process. Everyone’s acting in their own best interest as it should be. As these new tools come out, what are we thinking about the bias side of this and the equity side of this? How does that fit into this? So viewing it through that lens. How is this viewed from an applicant lens? Is this helping them? Or is this another hurdle for them to jump through? And so I am personally working on having our program better convey that to the applicant so they really have that transparency at our local level. But I think we need to start having that national conversation about these tools in this quite competitive process.

Toni Gallo:

Mahan, any final questions to ask Jesse?

Mahan Kulasegaram:

No, it’s just a wonderful, I think, conversation and I do hope people go and read the paper and continue the conversation with you Jesse and your team and broadly in our community. Because this is such an important thing for us to grapple with. Congratulations on this work.

Jesse Burk-Rafel:

Thanks so much for having me. And again, yes, I encourage folks to read this paper and all the RIME papers. Because I am a huge fan of the RIME supplement. There are just gems in the RIME supplement. It’s usually comprehensive work. And so all open access and available. So take a look at it. And certainly as well, my contact info is on the paper, it’s there with the email. So reach out to me if there are folks who are interested in exploring this space, and I think we should have bigger discussions about this. So thank you, Toni, for having us on, getting this started. This is fantastic.

Toni Gallo:

Yeah. Thanks for joining us today, Jesse. Jesse’s paper and the rest of the RIME papers are on academicmedicine.org right now you can read them all for free. And I’ll include the link to the paper we talked about today again in the notes for this episode if you’re interested in that one specifically.

Toni Gallo:

Remember to visit academicmedicine.org for the latest articles from the journal as well as our complete archive dating back to 1926. You can also access additional content, including free ebooks and article collections. Subscribe to academicmedicine.org through the subscription services link under the Journal Info tab, or visit shop.LWW.com and enter Academic Medicine in the search bar. Be sure to follow us and interact with the journal staff on Twitter at @acadmedjournal. And subscribe to this podcast through Apple Podcasts, Spotify, or wherever else you get your podcasts. While you’re there, leave us a rating and a review and let us know how we’re doing. Thanks so much for listening.