The recent JAMA Surgery publication “Association of surgeon-patient sex concordance with postoperative outcomes” is a population-based, retrospective cohort study of more than 1 million patients that concluded “sex discordance between surgeons and patients negatively affected outcomes following common procedures.”1 Furthermore, the finding is driven by “worse outcomes among female patients treated by male surgeons.”
My initial reaction upon reading these conclusions was: “How can this be possible?” The scientist in me was certain I could find something(s) in the methodology that could easily explain how they got these seemingly outrageous findings. Thus, I set off to meticulously review the article.
The authors used the Ontario Health Insurance Plan database, derived from the single government payer for Ontario, Canada. The strength of this dataset is that it is representative of the population of Ontario, more so than most administrative databases available in the U.S., such as Medicare claims, which all have some limitation in the population captured. The authors linked the database to four other databases that provided follow-up data for hospitalizations, emergency department visits, patient demographic information and surgeon level data. This looked like about as strong a dataset as you could get in the world of “big data.”
The investigators then imposed a number of inclusion criteria that I won’t bore you with. Suffice it to say, I was satisfied that they carefully considered situations that could represent data entry errors (e.g. date of death preceded date of surgery) or could skew the data (e.g. multiple surgical procedures in the same day and sex-specific procedures). These cases were excluded, leaving the authors with 1,320,108 unique patients—an impressive sample size.
Well, what about the statistical methods? I thought surely I could find something to pick apart there. The methods described in detail the sophisticated statistics used (multivariable generalized estimating equation with an independent correlation structure and clustering on the procedure performed). The investigators appeared to have accounted for all the covariates and potential confounders they could in their statistical modeling with the available data. I found myself commending the authors for their appropriate, rigorous and robust statistical approach.
“Big data” analyses such as this are often criticized for being “fishing expeditions,” where the investigators do not start out with a hypothesis and then perform hundreds of comparisons looking for anything that might be statistically significant and then assign it meaning. Wallis and colleagues were careful to indicate throughout the manuscript that this was not a “fishing expedition.” They presented the preliminary data to support their explicitly stated hypothesis which was that “sex discordance between surgeons and patients may contribute to differences in postoperative outcomes, with worse outcomes in female patients treated by male surgeons.” They carefully outlined the “pre-planned” stratified and subgroup analyses in the methods.
The primary outcome was a composite of death, readmission or major complication (e.g. acute renal failure, stroke, myocardial infarction) within 30 days after surgery. The composite outcome rate was 14.9%. Sex discordance between patient and surgeon was independently associated with a 7% increase in the likelihood of the primary outcome. Analyses stratified by physician, patient and hospital factors demonstrated that this effect largely persisted in the stratified analyses. Further, among female patients, those treated by a male surgeon were 15% more likely to experience the primary outcome compared to those treated by a female surgeon. In contrast, among male patients, those treated by a female surgeon were no more likely to experience the primary outcome compared to those treated by a male surgeon.
Poking holes in the way the study was executed was proving to be difficult. Certainly, the study suffered from the limitations that are inherent to all administrative data studies, namely lack of granularity, which in this case, as the authors acknowledge, included the inability to distinguish between sex and gender. The authors also acknowledged that case complexity could contribute to the findings if male surgeons perform more high-risk cases. The authors did perform an analysis of low- vs high-risk cases, and found that the association of discordance with the primary outcome was robust to this analysis. However, low-risk was defined as appendectomy, cholecystectomy and carpal tunnel release. All other operations were defined as high-risk. Relevant to vascular surgery, femoropopliteal bypass and abdominal aortic aneurysm (AAA) repair were categorized as high risk. Clearly, within each of those operations, there are complexities such as redo bypass, or short angulated AAA neck, that cannot be captured by administrative data.
Nevertheless, my mind shifted to thinking, “Maybe they have something here.” The only thing these kinds of “big data” analyses can do is demonstrate an association. It is simply not possible to prove causality using a retrospective cohort study design. In a retrospective study, it is entirely possible for two variables to be associated with each other without one causing the other. That said, the investigators did a good enough job with this study for me to believe the observed association in this dataset was real. The obvious next question was “What is the underlying cause?” The authors speculated in the discussion that these findings may be a result of the way sex discordance between surgeon and patient may “adversely affect the physician-patient relationship.” There are abundant data, although largely in medical specialties, to demonstrate that the physician-patient relationship and communication can influence long-term health outcomes.2,3
This study generated a wide range of reactions on social media, with everything from “Of course women surgeons do a better job than men” to “This study is a lie.” Once we’ve processed our visceral reactions, as clinicians that prioritize our patients’ well-being above all else, we should be concerned that these results may represent a real disparity in health outcomes. The data scientists among us can and should do more work to investigate whether this association holds true in other populations. However, as surgeons, and even data scientists, we are poorly trained and poorly equipped to properly investigate whether the patient-surgeon relationship is the driver of these findings. We need to partner with those who have appropriate expertise, such as sociologists, to rigorously study this issue. In the meantime, this study reminds us that what all of us can do now, regardless of whether the findings are valid, is strive to treat each of our patients equally with the utmost compassion, respect, and diligence.
- Wallis CJD, Jerath A, Coburn N, et al. Association of surgeon-patient sex concordance with postoperative outcomes. JAMA Surgery. 2021; doi:10.1001/jamasurg.2021.6339.
- Coelho KR, Galan C. Physician cross-cultural nonverbal communication skills, patient satisfaction and health outcomes in the physician-patient relationship. International Journal of Family Medicine. 2012;2012.
- Street RL, Makoul G, Arora NK, Epstein RM. How does communication heal? Pathways linking clinician–patient communication to health outcomes. Patient Education and Counseling. 2009/03/01/ 2009;74(3):295–301. https://doi.org/10.1016/j.pec.2008.11.015.
Karen Woo, MD, is associate professor of surgery at the University of California, Los Angeles (UCLA). She is also associate director of the Vascular Low Frequency Disease Consortium.