Mental Health Parity Enforcement Made Simple
How can we possible demonstrate parity!? Science is here for us!
A nagging shortage of providers in the industry is worsened by significant sections of the sector opting not to accept insurance plan participation because the rates are too low or not worth the administrative burden.
Welcome to the Frontier Psychiatrists Newsletter.
Mental Health Parity Law seems hard to enforce…how could anyone be expected to prove that two things are not different?
Parity means that. It means medical care and behavioral health care should not be different.
Science can help! Two groups being different or not different to some degree of certainty in an uncertain world?
Welcome. Welcome to my world. That is what every statistical test seeks to answer. Two samples not being meaningfully different? That is the definition of the Null Hypothesis.
(in a statistical test) the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.
This is the argument of major payers. They're arguing that there is no difference between their network and what parity would be, and then they're further arguing that it's impossible to measure the difference between one thing and another thing, and it's too impossible to do so.
Science has an answer to that second part of their argument, which is nope. Let me explain.
In a scientific study, we have accepted that when things are different enough, we can be relatively certain that there is a real difference.
We defined that small probability we are comfortable with to say, “no, it's really not due to chance1, it's because of the thing that we're addressing in our research2.”
When health plans claim that there is no difference, and then it's impossible to measure if there's a difference?
“It is simply too difficult!”
This is the very problem that science has been solving, more or less since the renaissance. I propose we use the very same standards for health plans that we use for science.
I want to make clear, it is very generous to use the standard of the null hypothesis in favor of health plans. If we use the standard P value from science for the evaluation of health plans, we will solve a lot of arguments!
Big Health is constantly using science-ish arguments against us.
They claim proven treatments are “experimental and unproven”—regardless of scientific data behind them, arguing that statistical data with FDA approval and research grade P values are not rigorous enough.
Insurance companies have been arguing that science is not sciencey enough for their balance sheets. We're going to give you the scientific standard that you have been arguing for, and we're gonna raise you applying it to your plans.
So here's how I would propose we do it.
Ahead, a pay wall. No, I'm saying you have to pay to subscribe. Cheapskates. Kidding! It’s an advocacy piece!
Take every easily accessible Data point.
Take every medical condition. By ICD 10 code. Insert them into either general medical care, or mental health and substance use disorder.
Take every admitting diagnosis. Take every primary diagnosis in an outpatient setting.
If the insurance companies are providing care that meets a parity standard, then there should be no statistical difference between general medical care and behavioral healthcare. The null hypothesis will be true— there is no difference, that's what “parity” means. It means parity enough.
In science, we use the standard of excepting a 5% chance that if something is different, that's statistically significant. I'm arguing for the same standard here.
That is a P Value.
So in order to scientifically validate parity, there should be no statistically significant values, across the multiple comparisons. There are some correction factors here, but it's pretty simple.
Here is the math to compare the mean, median, and mode of healthcare spending across every ICD-10 diagnosis:
# Calculate the mean healthcare spending for each ICD-10 diagnosis
mean_healthcare_spending = np.mean(healthcare_spending_per_diagnosis)
# Calculate the median healthcare spending for each ICD-10 diagnosis
median_healthcare_spending = np.median(healthcare_spending_per_diagnosis)
# Calculate the mode of healthcare spending for each ICD-10 diagnosis
mode_healthcare_spending = np.mode(healthcare_spending_per_diagnosis)
To calculate a P value for if there is a difference between all general medical diagnoses versus all mental health/substance use diagnoses, we can use a two-sample t-test.
The null hypothesis is that there is no difference in mean healthcare spending between the two groups. The alternative hypothesis is that there is a difference in mean healthcare spending between the two groups.
# Conduct a two-sample t-test
ttest_results = ttest_ind(
healthcare_spending_for_general_medical_diagnoses,
healthcare_spending_for_mental_health_diagnoses
)
# Extract the p-value from the t-test results
p_value = ttest_results.pvalue
The p-value is the probability of obtaining the observed results if the null hypothesis is true. A p-value of less than 0.05 is generally considered to be statistically significant. Keen observers will note that this involves multiple comparisons, and to be really sure that these things are different, and not a fluke fluke, in science we would do a Bonferroni correction3 for multiple comparisons. Here is the math for that!
# Calculate the number of comparisons
number_of_comparisons = len(healthcare_spending_per_diagnosis) - 1
# Calculate the Bonferroni-corrected alpha value
alpha_corrected = alpha / number_of_comparisons
# Conduct a two-sample t-test, Bonferroni-corrected
ttest_results_corrected = ttest_ind(
healthcare_spending_for_general_medical_diagnoses,
healthcare_spending_for_mental_health_diagnoses,
alpha=alpha_corrected
)
# Extract the p-value from the t-test results
p_value_corrected = ttest_results_corrected.pvalue
Department of Labor, this is just one way to address a Mental Health party, but it's pretty scientifically ironclad, and if they fail this test, it's really hard to argue they succeed in any other manner absent compelling data. This is also relatively easy dated a pole from Healthcare payers, because they know what they pay for things. That's the whole business.
Repeat the above math for both median and mode, and you have a pretty robust mental health parity check that is not a heavy lift for the department of labor.
If they fail the above test, I think it's on them to prove they have better outcomes and access to those at lower cost.
Which would of course qualify for a safe harbor, but I think we have to see the data.
Big Health, thank me later.
But every scientific study in biomedical science accepts a small possibility that the results could not be due to our intervention, they could be due to chance.
And then we demand replication of those studies—because our same conclusion being due to chance, but twice? The risk is tiny.
The Bonferroni correction adjusts the alpha value for the number of comparisons being made. This helps to control the false positive rate, which is the probability of incorrectly rejecting the null hypothesis.
In this case, the Bonferroni-corrected alpha value is alpha divided by the number of comparisons.
So, if the original alpha value was 0.05, then the Bonferroni-corrected alpha value would be 0.05/(len(healthcare_spending_per_diagnosis) - 1).
The Bonferroni-corrected p-value is then calculated using the Bonferroni-corrected alpha value. So, if the p-value from the uncorrected t-test was 0.01, then the Bonferroni-corrected p-value would be 0.01/(len(healthcare_spending_per_diagnosis) - 1).
A Bonferroni-corrected p-value of less than 0.05 is still considered to be statistically significant.