Confirmation Bias in Research on Gender Affirming Treatment

Bob Yentzer
Sep 7, 2023
7 min read

Updated: Jun 16

September 2023 (Update note at the bottom)

As part of their anti-woke political agenda, Ron Desantis and the Florida legislature passed a law that prohibits gender dysphoric minors from being treated with hormones or surgery.

Ordinarily I would side with the law's critics. But not this time.

The essential justification for subjecting minors to life-changing hormone manipulation is its effectiveness in treating the major depression and anxiety associated with gender dysphoria. However, contrary to endorsements by the AMA and other medical guilds, the efficacy and safety of Gender Affirming Hormone Treatment (GAHT) has not been established.

Before you dismiss the "not-effective" argument as right-wing anti-LGBTQ disinformation, consider this fact: recent history is replete with examples of the medical profession's systematic willingness to bill patients for ineffective or harmful nostrums.

For example, before 1973, the medical profession echoed the homophobic culture of the time by defining homosexuality as a disease, which they attempted to "cure" with "chemical castration" and aversion therapy, including electric-shock. Then, around 1995, they managed to trigger the current opioid crisis by revving-up Oxy prescriptions to outpatients on the dubious claim that the risk of addiction was small when used to treat pain. And today, they routinely prescribe Epidural Steroid Injections for conditions known to be unresponsive to that treatment.

So don't be surprised if the effectiveness and safety of GAHT is not what they claim it to be.

The reason I am inclined to support the age restriction on GAHT is scientific: the evidence underpinning its safety and efficacy is not credible, at least not credible enough to subject children to the risk of long term harm.

That is the sobering conclusion of the National Institute for Health and Care Excellence (NICE). Its comprehensive review of published research prior to 2020 characterizes the studies as "mostly small uncontrolled observational studies, which are subject to bias." Bias is a major problem because the authors are often affiliated with clinics and hospitals already in the business of delivering GAHT to children.

A more recent (2023) review echoes the NICE findings, and politely concludes that the "long-term effects of hormone treatment on psychosocial health could not be evaluated."

These assessments have led Sweden, Finland and England to issue revised treatment guidelines for youth, which "prioritize noninvasive psychosocial interventions while sharply restricting the provision of hormones and surgery."

The NICE review's most damning indictment is that GAHT has never be been tested in a Randomized Control Trial (RCT). Not once!

RCTs are the gold-standard of proof when it comes to safety and effectiveness. The FDA will not approve of a new drug without at least two positive RCTs. Meta-analyses have shown that rigorous RCTs are less likely to report positive outcomes than non-experimental studies of the same treatment. Perhaps that is why the GAHT Industry has avoided the former.

Is the dour assessment of past evidence challenged by the newest studies?

Far from it! The following analysis of two recent studies reveals that the finding of effectiveness is an artifact of wishful thinking and dishonest statistics. While neither is a RCT, the first excels in terms of sample size, time-span, and prestige of the journal that published it. The other is notable for its clever obfuscation.

Both have this in common: Their finding of support for GAHT can be characterized as putting lipstick on a pig.

Psychosocial Functioning in Transgender Youth after 2 Years of Hormones.

The New England Journal of Medicine, 2023

This study is touted as an improvement because of its long-term follow-up and large (315) cohort of transgender youth. The Beck Depression Inventory-II was administered to subjects at the start of GAHT, and repeated every 6 months thereafter. The same protocol was followed for the assessment of anxiety.

Displayed on the chart are the findings for depression. After two years of treatment the average depression score for the whole sample fell by a miniscule 2.54 points. All of this decline was contributed by biological females. Males' level of depression remained virtually unchanged.

The authors acknowledge that an "effect size" of 2.54 is regarded as small in statistical terms. But 'small' is an understatement. With a mean baseline score of 15 on a 50 point range, a 2.54 difference is clinically insignificant. In other words, indiscernible in real life to both patient and clinician.

Yet thanks to a large sample, that trivial difference achieves statistical significance, which grants the authors license to legitimize the profitable practice of manipulating children's hormones:

our findings... support the use of GAHT as effective treatment for transgender and nonbinary youth.

Absolute bullshit! This conclusion is inferential malpractice. The authors' non-experimental research design is incapable of establishing causality, so they have no basis for attributing the 2.54 dip to the effect of treatment, and no way of controlling for other factors known to reduce scores on self-report inventories. The following "other factors" by themselves could easily produce that tiny drop in depression:

The most important is collateral medication: In addition to GAHT, an undisclosed number of subjects were prescribed medications that could have included psychotropic drugs. The authors didn't say, nor did they control for this factor. The failure to separate the effects of GAHT from other medications is not mentioned in the discussion of the study's limitations.
The statistical law of regression toward the mean: Subjects who score far above average on a subjective questionnaire, e.g., the Beck Depression inventory, tend to score lower the next time around.
Natural remission: Without treatment, most episodes of major depression tend to resolve themselves within 12 months.
The Placebo Effect. An analgesic injected by a nurse is more effective than a drip-bag. Likewise, the affirmative atmosphere of a "multidisciplinary" GAHT program is likely to elevate mood and calm the nerves.

What about Anxiety?

Exactly the same critique applies to the findings regarding anxiety. The chart shows a clinically insignificant dip, also a likely artifact of the above foursome.

My take...

Given the small size of observed changes and the inability to control for the fearsome foursome above, the claim that the "findings support the use of GAHT as effective treatment" is dishonest.

By the way, if the reader wants a complete account of the many other flaws in this study, and is willing to delve into the issues of research methodology, I recommend this deep critique by Jesse Singal.

Mental Health Outcomes in Transgender and

Nonbinary Youths Receiving Gender-Affirming Care.

JAMA Network Open, 2022

Like the first study, this one assesses a cohort of candidates for hormone treatment at time-0 and then reassess their levels of depression and anxiety at 3, 6, and 12 months later. However, the recruits do not start treatment at the same time. They are treated serially: at 3 months, 53% have started treatment; at 6 months, 71%; and by year's end, 89%.

So, if GAHT is effective, the rate of serious mental disorder should decline over time as more and more kids undergo treatment.

Strangely, the data needed to test this proposition is not presented in the published paper, but stashed in an online supplement. From this stash I constructed the following chart and two others below.

At time-0, when only 7% of the cohort had received hormonal treatment, 59% were seriously depressed. By the 12th month, 59% of the cohort were still depressed even though 89% of them had undergone treatment. The rate of serious anxiety follows exactly the same pattern.

Conclusion? GAHT appears to make no difference.

But, for authors who are working at the behest of organizations already dedicated to GAHT and convinced of its righteousness, "no difference" is inadmissible. They want a confirmatory finding, not one that sows doubt.

So, by employing a methodology which uses the waiting-list as the comparison group, they contrived this favorable finding:

(a) "the odds of depression among kids undergoing treatment is 60% lower that kids still waiting for treatment."

However, an equally accurate version of this finding is this:

(b) the odds of depression among kids still waiting is 167% higher than those undergoing treatment.

So why did the authors choose to present the finding in terms of (a) and not (b)?

Well, (b) conveys the message that the odds differ because waiting for treatment increases depression, while (a) implies that the odds differ because GAHT effectively lowers depression. The authors want to prove that GAHT is effective, so naturally they choose (a). But the chart shows that the more credible and honest interpretation is (b).

The blue curve shows that among kids being treated, the rate of depression did not substantially improve from the 59% baseline, but remained stable at 56%. By contrast, the rate of depression among those still waiting for treatment soared from 59% at the start of the study to 76% at 3 months, and rebounded to 86% at 12 months (based on only 7 kids still waiting).

It's obvious that the two curves diverge because waiting for treatment increases the rate of depression, not because treatment lowers it. The accurate version is (b). But the authors are desperate to put lipstick on a pig, so they reported the finding in terms of (a).

So, in its attempt to make an ineffective treatment look effective, this study actually demonstrates that making kids wait for promised treatment is a stressor that aggravates depression.

And what are the authors' findings regarding anxiety?

They had to admit that treatment had no effect. The ploy of comparing the still-waiting to the being-treated failed to produce a statistically significant odds-ratio. The chart reveals why. The anxiety gap between the two curves at 3-months is only half as large as the corresponding depression gap above.

Last word on the two new studies...

Contrary to the authors' spin, an honest reading of their data actually discredits the claim that GAHT significantly reduces major depression and anxiety.

But, I have other reasons for doubting the wisdom of inflicting GAHT on children, all related to flimsy scientific information.

Depression and anxiety might not stem from Gender Dysphoria per se, but from a social environment of avoidance, ridicule and rejection. In that case, the appropriate therapy would involve something other than GAHT, yes?
When left to its natural course, a majority of youth who identify as transgender adopt a different identity (usually 'gay') as adults.
Compared to the general population, Transgender youth are much more likely to suffer from depression, anxiety, ADHD, OCD, Autism, obesity and binge eating. So, is Gender Dysphoria really a separate and distinct phenomenon?

UPDATE - Below is an example of the flip side of confirmation bias: the suppression of disconfirmation. It involves an author of the first study I critiqued, from the New England Journal of Medicine.

On October 23, 2024 the New York Times reported on the delayed publication of a $9.7 million Federally funded study. The data showed that puberty blockers, did NOT improve the mental health of 95 children ages 8 to 16 who were followed for two years. The lead researcher, Dr. Johanna Olson-Kennedy, blocked publication of the negative findings because they might be used by critics to discredit GAHT.

Also, Olson-Kennedy is now being sued for pushing a pre-teen girl into aggressive transitioning treatment, including a double mastectomy at age 14, without truthful justification.