Last month (February 2017), the journal BMC Psychiatry published a study by James Christian Jakobsen et al. The study is titled Selective serotonin reuptake inhibitors versus placebo in patients with major depressive disorder. A systematic review with meta-analysis and Trial Sequential Analysis.
The research was a meta-analysis – i.e. it combined the findings from several earlier studies. Here are the authors’ conclusions:
“SSRIs might have statistically significant effects on depressive symptoms, but all trials were at high risk of bias and the clinical significance seems questionable. SSRIs significantly increase the risk of both serious and non-serious adverse events. The potential small beneficial effects seem to be outweighed by harmful effects.”
The authors point out that there have been previous meta-analyses assessing “the effects of SSRIs in adults with major depressive disorder”, the general conclusions of which have been that SSRI’s have a statistically significant effect on depression. But Jakobsen et al point out that the previous studies had a number of limitations, including one or more of the following:
- failure to use predefined Cochrane methodology in the selection of studies;
- only including subgroups of depressed individuals;
- failure to search all relevant databases;
- failure to systematically assess harmful effects;
- failure to systematically assess risk of bias in the source studies.
The authors’ objective was:
“…to conduct a comprehensive systematic review assessing the beneficial and harmful effects of SSRIs versus placebo, ‘active’ placebo, or no intervention in adult participants with major depressive disorder using our eight-step procedure for assessing evidence in systematic reviews.”
The authors’ protocol, detailing the methodology to be used, was published in advance, and can be seen here. “The methodology was not changed after the analysis of the review results began.”
. . . . . . . . . . . . . . . .
Here are some quotes from the article, interspersed with my comments/observations.
“Independent investigators searched for eligible trials published before January 2016 in The Cochrane Library’s CENTRAL, PubMed, EMBASE, PsychLIT, PsycINFO, clinicaltrials.gov., and Science Citation Index Expanded…Trials were included irrespective of language, publication status, publication year, and publication type. To identify unpublished trials, we searched clinical trial registers of Europe and USA, websites of pharmaceutical companies, websites of U.S. Food and Drug Administration (FDA) and European Medicines Agency, and we requested the U.S. Food and Drug Administration (FDA) to provide all publicly releasable information about relevant clinical trials of SSRIs that were submitted for marketing approval.
Participants had to be 18 years or older and have a primary diagnosis of major depressive disorder based on standardised criteria, such as DSM III, DSM III-R, DSM IV, DSM V, or ICD 10.”
Obviously this was a very comprehensive search for source studies.
“Primary outcomes
- Depressive symptoms measured on the 17-item or 21-item Hamilton Depression Rating Scale (HDRS), the Montgomery-Asberg Depression Rating Scale (MADRS), or the Beck’s Depression Inventory (BDI).
- Remission (Hamilton <8 points; BDI <10 points; MADRS <10 points).
- Adverse events during the intervention period which were classified as serious and non-serious adverse events. Serious adverse events were defined as medical events that were life threatening, resulted in death, disability, or significant loss of function, or caused hospital admission or prolonged hospitalization. The remaining events were classified as non-serious adverse events.
Secondary outcomes
- Suicides, suicide attempts, and suicide ideation during the intervention period.
- Quality of life (scale used by the trialists).
The time point of primary interest was end of treatment (defined by trialist). We also planned to report results assessed at maximum follow-up.”
. . . . . . . . . . . . . . . .
“Using our strict inclusion and exclusion criteria, a total of 195 publications/unpublished trials were identified and included. Due to multiple publications of single trials and lack of useful data, only 131 randomised clinical trials enrolling a total of 27,422 participants were included in our analyses.”
. . . . . . . . . . . . . . . .
“Estimating a meaningful threshold for clinical significance is difficult and an assessment of clinical significance should ideally not only include a threshold on an assessment scale. Major depressive disorder affects daily functioning, increases the risk of suicidal behaviour, and decreases quality of life. Some adverse events might therefore be acceptable if SSRIs have clinically significant beneficial effects. We therefore both predefined a threshold for clinical significance and assessed the balance between beneficial and harmful effects.
As threshold for clinical significance, we chose a drug-placebo difference of 3 points on the 17-item HDRS (ranging from 0 to 52 points) or an effect size of 0.50 standardised mean difference. This has been recommended by the National Institute for Clinical Excellence (NICE) in England and has been chosen in other reviews. Nevertheless, these recommendations are not universally accepted and have been questioned. Others have suggested the following ‘rules of thumb’ regarding the standardised mean difference: 0.2 a small effect, 0.5 a moderate effect, and 0.8 a large effect. One study has shown that a SSRI-placebo mean difference of up to three points on the HDRS corresponds to ‘no clinical change’ [186]. Another valid study has shown that a SSRI-placebo difference of 3 points is undetectable by clinicians, and that a mean difference of 7 HDRS points, or a standardized mean effect size of 0.875, is required to correspond to a rating of ‘minimal improvement’[187]”
In studies of this kind, it is necessary to distinguish clinical significance from statistical significance. This is well illustrated with a simple analogy. Imagine a coin that has a 1% bias in favor of heads. What this means essentially is that if one conducts a very large number of trials, each of 200 tosses, the average result per trial will be 101 heads and 99 tails: a discrepancy of 2 per 200, or 1%. In ordinary practice, however, this coin could be considered fair for use in sporting events and other simple decision-making situations. The 1% discrepancy is statistically significant (as opposed to a random fluctuation) but not practically significant.
Similarly, in clinical trails involving a large number of participants, a treatment effect might be statistically significant (i.e., probably not a random fluctuation), but may nevertheless be so small that it has no clinical significance.
For the purposes of the present meta-analysis, Jakobsen et al chose a 3-point difference on the 17-item Hamilton Depression Rating Scale as the threshold for clinical significance. Their discussion of this in the above quote is self-explanatory. Reference 187 in the above quote is to a 2015 paper by Joanna Moncrieff and Irving Kirsch. The paper is a discussion of a 2013 study by Leucht et al, What does the HAMD mean?, which is reference 186 in the above quote.
The Leucht et al study was a meta-analysis embracing 7131 participants. The methodology used was to compare the improvement scores that people obtained on the HDRS questionnaire with their improvement scores on the Clinical Global Impression scale (CGI-I). The CGI is a rating scale based on a face-to-face interview and is used frequently by psychiatrists and other mental health workers in clinical and research settings to assess client improvement or deterioration. Leucht et al found:
“The results were consistent for all assessment points examined. A CGI-I score of 4 (‘no change’) corresponds with a slight reduction on the HAMD-17 of up to 3 points.” [Note: the HAM-D and the HDRS are the same questionnaire.]
What this means essentially is that trained psychiatrists and other clinicians, conducting face-to-face assessments, could not detect an improvement of 3 points on the HDRS. People who achieved an improvement of 3 points on the HDRS questionnaire were rated as “no change” on the face-to-face CGI-I.
Incidentally, five of the six authors of Leucht et al report close links to pharma.
“HF, MK and AS work full-time for Merck & Co, parent company of Organon, which provided the data for this study. AS is also a shareholder with Merck & Co. SL has received honoraria for consulting/advisory boards from Alkermes, Bristol-Myers Squibb, Eli Lilly, Janssen, Johnson & Johnson, Lundbeck, Medavante, Roche, lecture honoraria from AstraZeneca, Bristol-Myers Squibb, Eli Lilly, Essex Pharma, Janssen, Johnson & Johnson, Lundbeck Institute, Pfizer, Sanofi-Aventis, and Eli Lilly has provided medication for a trial with SL as the primary investigator. PL has received honoraria for educational talks on medical ethics from AstraZeneca and Eli Lilly.”
So it seems unlikely that they would be biased towards downplaying the efficacy of antidepressant drugs.
Also incidentally, here’s an interesting quote from a general discussion of this matter in the Moncrieff and Kirsch paper:
“The small differences detected between antidepressants and placebo may represent drug-induced mental alterations (such as sedation or emotional blunting) or amplified placebo effects rather than specific ‘antidepressant’ effects.”
. . . . . . . . . . . . . . . .
Primary Outcomes
“…meta-analysis of the results of all 92 trials [that used the HDRS] showed that SSRIs versus placebo significantly reduced the HDRS score (mean difference −2.25 points…).”
In other words, there was a statistically significant reduction in average depressive “symptoms” as measured by the HDRS, but the reduction was only 2.25 points, which as discussed above, is of no clinical significance, i.e., would be rated as “no change” on the CGI-I scale.
Risk of Bias
“All trials were at high risk of bias per several bias risk domains and especially the risk of incomplete outcome data, selective outcome reporting, and insufficient blinding bias may bias our review results. Our GRADE assessments show that due to the high risks of bias the quality of the evidence must be regarded as very low. The high risks of bias question the validity of our meta-analysis results as high risk of bias trials tend to overestimate benefits and underestimate harms. The ‘true’ effect of SSRIs might not even be statistically significant.”
In other words, the small, but statistically significant results found in the meta-analysis may simply reflect various biases that were present in the source studies, and which would inevitably bias the results of the meta-analysis.
GRADE is a protocol developed by Cochrane Training for assessing the quality of a body of evidence. Grading of Recommendations, Assessment, Development and Evaluation.
Incomplete outcome reporting is a particularly important source of bias in these kinds of trials. Consider a drug vs. placebo trial in which the authors have gathered ten different pieces of outcome data on each participant. Let’s say that four of these outcome measures favor the drug and six favor the placebo. It is very tempting for the researchers, particularly those who have a vested interest in the drug, to report the four drug-favorable measures and omit the other six.
I leave it to the reader to decide whether psychiatric research might be prone to this kind of selective reporting.
Subgroup Analyses
In trials involving people with relatively higher levels of depression (HDRS score higher than 23 points), the improvement with SSRI’s was higher than in the trials involving lower levels of depression (less than or equal to 23 points). The mean difference was 2.69 points for the SSRI group vs. 1.29 points for the placebo group, neither of which cross the threshold for clinical significance (3 points).
Other tests for sub-group differences showed no significant differences. These tests were:
- comparison of different SSRI’s
- older vs. younger participants
- trials with washout period vs. those without
- trials with drug/alcohol dependent participants vs. those without
- short “treatment” period (less than 8 weeks) vs. other trials
- dose of SSRI, below median vs. equal to or above median
Long-term Follow-up
Only six of the 131 trials reported long-term follow-up results. The meta-analysis of all six trials showed a mean difference of 1.30 HDRS points in favor of the SSRI participants. This is a marginally significant result statistically. The probability that it could have been a chance fluctuation is 7%. Additionally it is considerably below the threshold for clinical significance.
The fact that the mean difference in long-term follow-up is lower than the short-term results suggests that whatever tenuous benefits came from the SSRIs were short-lived. It is also interesting that only 6 of 131 trials pursued any kind of long-term follow-up.
Remission and No Response
Thirty-four of the 131 trials assessed participants for remission, and 70 trials assessed for no response. Remission was usually defined as achieving a score below 8 on the HDRS. Response was usually defined as a 50% reduction of HDRS score. On each of these measures, the SSRI group had a better outcome.
Jakobsen et al, however, point out that the results on remission and response need to be interpreted cautiously for the following reasons:
“1) the assessments of remission and response were primarily based on single HDRS scores and it is questionable whether single HDRS scores are indications of full remission or adequate response to the intervention; 2) information is lost when continuous data are transformed to dichotomous data and the analysis results can be greatly influenced by the distribution of data and the choice of an arbitrary cut-point; 3) even though a larger proportion of participants cross the arbitrary cut-point in the SSRI group compared with the control group (often HDRS below 8 for remission and 50% HDRS reduction for response), the effect measured on HDRS might still be limited to a few HDRS points (e.g., 3 HDRS points) or less; 4) by only focusing on how many patients cross a certain line for benefit, investigators ignore how many patients are deteriorating at the same time. If results, e.g., show relatively large beneficial effects of SSRIs when remission and response are assessed but very small averaged effects (as our results show) – then it must be because similar proportions of the participants are harmed (increase on the HDRS compared to placebo) by SSRIs. Otherwise the averaged effect would not show small or no difference in effect. The clinical significance of our results on ‘no remission’ and ‘no response’ should therefore be questioned.”
Serious Adverse Events
Forty-four trials reported data on serious adverse events. The meta-analysis odds ratio for the SSRI vs. the placebo group was 1.37. What this means is that “31 per 1000 SSRI participants will experience a serious adverse event compared to 22 per 1000” in the placebo group: an excess of about eleven serious adverse events per 1000 SSRI users. And it should be stressed that in this study, serious adverse events were defined as “…medical events that were life threatening, resulted in death, disability, or significant loss of function, or caused hospital admission or prolonged hospitalisation.”
It should also be remembered that the great majority of the trials collected no long-term data. The eleven-per-thousand excess serious adverse events occurred during the relatively brief period of “treatment” duration, and it is reasonable to suspect that such events would be more frequent with long-term use.
It is also noteworthy that only 44 of the 131 trials reported data on serious adverse events.
Non-Serious Adverse Events
“Meta-analyses showed that the participants randomized to SSRIs versus placebo had a significantly increased risk of several [non-serious] adverse events.”
These included: abnormal ejaculation, tremor, anorexia, nausea, somnolence, sweating, asthenia, diarrhea, constipation, insomnia, dizziness, dry mouth, libido decreased, sexual dysfunction, appetite decreased, fatigue, vomiting or upset stomach, flu syndrome, drowsiness, blurred/abnormal vision or dry eyes, nervousness, headache, dyspepsia, weight loss, central or peripheral nervous system problems, lightheadedness/faint feeling, agitation, impotence, taste perversion, etc..
Suicides, Suicide Attempts, and Suicide Ideation
“There were almost no data on suicidal behaviour, quality of life, and long-term effects.”
The limited amount of information on these matters showed no significant differences between participants randomized to SSRI’s vs. placebo on number of suicides, number of suicide attempts, or suicidal ideation. This is particularly noteworthy in that the prescription of antidepressants is routinely defended on the grounds that they provide protection against suicide.
Readers may find this no-difference result puzzling, given the accumulation of anecdotal information linking SSRI use to suicides and even murder-suicides. But it has to be borne in mind that, compared to the enormous numbers of people taking SSRIs, suicide is still a very rare event and is unlikely to occur in sufficient numbers in short studies of this kind to provide useful comparative information.
General Points
“All trials used placebo as control intervention and due to the large number of adverse events, some patients might have figured out if they received an ‘active’ intervention or not, which might question the blinding of the trials.”
“Multiple previous reviews and meta-analyses have, as mentioned in our Background, assessed the effects of SSRIs and have generally concluded that SSRIs have significant effects on depressive symptoms. However, the estimated results (and not the conclusions the review authors made) of these reviews and meta-analyses actually are in agreement with our present results and show that SSRIs do not seem to benefit patients more than a few HDRS points. This increases the validity of our present results. Furthermore, we assessed in detail the risks of serious adverse events and of non-serious adverse events and found that both were significantly increased by SSRIs.”
“Our HDRS mean differences were averaged effects. Hence, it cannot be concluded that SSRIs do not have clinically significant effects on all depressed participants. E.g., certain severely depressed patients compared with lightly depressed patients… might benefit from SSRIs even though there is no evidence backing this hypothesis. However, any clinical research result will have this ‘limitation’. Specific patients might benefit from any given intervention even though valid research results have shown that this intervention ‘on average’ is ineffective or even harmful.”
In addition, it needs to be recognized that there is no way to identify in advance which individuals will be harmed by these drugs. So, in prescribing these drugs, psychiatrists are effectively playing Russian Roulette with one essential difference: they’re pointing the gun at someone else’s head.
“SSRIs versus placebo seem to have statistically significant effects on depressive symptoms, but the clinical significance of these effects seems questionable and all trials were at high risk of bias. Furthermore, SSRIs versus placebo significantly increase the risk of both serious and non-serious adverse events. Our results show that the harmful effects of SSRIs versus placebo for major depressive disorder seem to outweigh any potentially small beneficial effects.”
“Per our results, we now believe that there is valid evidence for a public concern regarding the effects of SSRIs. We agree with Andrews et al. that antidepressants seem to do more harm than good. We have clearly shown that SSRIs significantly increase the risks of both serious and several non-serious adverse events. The observed harmful effects seem to outweigh the potential small beneficial clinical effects of SSRIs, if they exist. Our results confirm the findings from other studies questioning the effects of SSRIs, but are in contrast to the results of other reviews concluding that SSRIs are effective interventions for depression. However, our present analyses represent the most comprehensive systematic review on the topic and we hope it may guide clinical practice.” [Hyperlink added]
. . . . . . . . . . . . . . . .
DISCUSSION
If psychiatry were a bona fide medical field, a meta-analysis of this quality yielding these results would send Richter 9 shock waves through the profession. The prescription of SSRIs for depression is probably the single most frequent activity performed by psychiatrists in their day-to-day work. The unambiguous revelation that this activity is doing more harm than good should be generating enormous concern for psychiatry’s leadership and for the rank and file.
But the publication of this study on February 8 generated no discernible concern within the profession. This is because psychiatry has very little interest in the scientific assessment of its “treatments”. Psychiatry is drug-pushing, pure and simple. The primary criteria of success are customer retention, and volume of product sold. As long as the “patients” keep coming back for more, psychiatrists can convince themselves that they are doing good.
The fact that the “patients” have been duped by unsubstantiated assertions of chemical imbalances, and are, in many cases, addicted to the pills, are inconvenient truths that can readily be eclipsed by self-deception and in-group mutual reassurance.
. . . . . . . . . . . . . . . .
AN INTERESTING “REBUTTAL”
On February 21, 2017, Emil Karlsson, who describes himself as “a debunker of pseudoscience and steamroller of miscellaneous nonsense”, critiqued the Jakobsen et al paper on the grounds that the HDRS 3-point cut-off for clinical significance is arbitrary and excessively high. But in fact, as we saw earlier, the threshold is based on the empirical findings that a “minimal improvement” on the Clinical Global Impression scale corresponds to an improvement of about 7 on the HDRS, and a CGI rating of “no change” corresponds to a 3-point HDRS shift. So the threshold of 3 points, far from being too high, is actually excessively generous. Nevertheless, the average HDRS improvement in the trials reviewed by Jakobsen et al was only 2.25 points, and the improvement in those trials that contained long-term follow-up was only 1.30 points.
FUNDING
“The work was supported by The Copenhagen Trial Unit, Centre for Clinical Intervention Research, in Denmark.”