The ethics of COVID-19 treatment studies: too many are open, too few are double-masked

June 30, 2020

Jeffrey K Aronson, Nicholas DeVito, Robin E Ferner*†, Kamal R Mahtani, David Nunan, Annette Plüddemann 

On behalf of the Oxford COVID-19 Evidence Service Team
Centre for Evidence-Based Medicine, Nuffield Department of Primary Care Health Sciences
University of Oxford

Correspondence to

*University of Birmingham University College London


A WHO panel, convened during the 2014 Ebola virus epidemic, assessed the ethical implications of using medicines that have shown promising results in the laboratory and in animal models, but before they have been evaluated for safety and efficacy in humans. They concluded that researchers have a moral duty to evaluate unproven interventions (for treatment or prevention) in clinical trials that are of the best possible design in exceptional circumstances. Circumstances such as apply today.

In our view, the description “clinical trials that are of the best possible design”, where therapeutic interventions are concerned, implies adequately masked randomized controlled trials.

But more trials of pharmacological interventions are being conducted in the treatment of COVID-19 without blinding (or masking) of interventions than are being conducted masked. Some single-blind trials mask only the participant rather than the investigator. A minority are placebo-controlled. Trials of non-pharmacological interventions are being even less well served, relying largely on retrospective studies and mathematical models of uncertain value.

We believe this to be a potential waste of participants’ and investigators’ time and therefore unethical. Biased results can distort therapeutic decision-making, public perceptions, investment in healthcare, and the standing and value of medical research. They may falsely suggest a lack of equipoise, discouraging investigators from performing well designed clinical trials and make recruitment to such trials difficult. They may result in more harms than benefits.

The example of the RECOVERY trial shows that it is possible to carry out large-scale high-quality clinical trials that yield reliable and useful results in a completely ethical framework.

Tom Chalmers’s battle-cry encapsulates what he had written earlier, in 1975: “The concept that the first sick patient to receive a new drug, procedure, or operation should be randomized stems from recognition of the pernicious influence of pilot trials in postponing or eliminating a definitive therapeutic trial”, and earlier still, in 1968: “I am firmly convinced that the first patient to receive a new agent should be randomised.” It is a message that is often forgotten or ignored. Even trials that are registered as randomized do not often specify the method of randomization nor whether and how random allocation will be concealed.

Effective randomization to the intervention being studied or its comparator(s) is an essential component of reliable clinical trials, because it ensures that any differences in characteristics that may influence the outcome between a treatment group and a comparison group arise by chance.  The biases that result from inadequate randomization have been recognized at least since the Lanarkshire School Milk experiment reported in 1930. But while the error of “comparing apples and oranges” is commonly recognized, the COVID-19 epidemic has seen a series of uncontrolled or ineffectively randomized trials whose results are unlikely to be reliable and may be adversely influencing healthcare policy. Proper randomization (of the schedule by which treatment and comparison interventions are allocated), combined with concealment of the allocation schedule, form the bedrock of a well-designed clinical trial.

The choice of comparator against which to judge the experimental treatment or treatments is important. The null hypothesis is that the treatments being compared are equally efficacious, and this is easiest to judge  if the comparator has no effect, i.e. a placebo. However, difficulties arise if the comparator is “standard of care”, without a placebo (so that those caring for patients are not masked); or if it is of unknown efficacy; or if a placebo is used when there is an alternative that is known to be efficacious.

During a trial, blinding (also known as masking, i.e. concealing knowledge of the intervention to which participants have been assigned after randomization), to minimize the risk of biases, is equally important. However, blinding is even more commonly, and inexcusably, ignored in trials in COVID-19 than randomization and allocation concealment. Trials of the use of the 4-aminoquinolines, chloroquine and hydroxychloroquine, in the treatment of people with COVID-19, exemplify the problem. The data from 142 registered protocols for such trials are shown in Table 1. Similar data from currently registered trials of remdesivir, tocilizumab, and various corticosteroids are shown in Table 2. 

Table 1. The numbers of single-, double-, triple, or quadruple-masked studies* of chloroquine, or hydroxychloroquine, or both, either as primary drugs of investigation or as comparators, currently included in clinical trials registers in 30 countries as being in progress or planned

Country group (number of countries) Number of trials Single-


Double-, triple-, or quadruple-masked Total number of masked trials (%)
Europe (13)       40     0             13         13 (33%)
Far East & Australia (8)       37     5               4           9 (24%)
Middle East (4)       26     2               8         10 (38%)
North America (2)       26     2             12         14 (54%)
Latin America (3)       13     1               4           5 (38%)
Total 142 10             41         51 (36%)

*  Single-masked (10 studies): masking variably stated—mostly participants or outcomes assessors; primary investigators in only one case (but four not stated)
Double-masked (17 studies): almost always participants, usually with primary investigators, but sometimes with assessors and occasionally with caregivers
Triple-masked (7 studies): always participants, almost always with investigators and caregivers
Quadruple-masked (17 studies): always participants, investigators, caregivers, and assessors 

Table 2. The numbers of masked studies of remdesivir, tocilizumab, and corticosteroids currently included in clinical trials registries, although not all are recruiting

Medication Number of trials Single-masked* Double-, triple-, or quadruple- masked Total number of masked trials (%) Number of patients to be studied in masked trials
Remdesivir 11 1 3 4 (36%) 2061/22,437 (9.2%)
Tocilizumab 34 1 4 5 (15%) 1590/10,396 (15%)
Corticosteroids 49 3 9 12 (24%) 2636/22.306 (12%)

*Excluding participant-only masked studies
Methylprednisolone (22), dexamethasone (6), ciclesonide (6), prednisone or prednisolone (5), budesonide (5), hydrocortisone (1), unidentified (4)

The registered protocols  are difficult to assess, because different registries log different degrees of blinding, but it is clear that masked trials are in a minority. Combining the trials of all four types of medications, about 120,000 individuals are going to be taking part in trials that are likely to give us little or no useful information, or, worse still, potentially misleading information, exemplified by hydroxychloroquine, which has been shown in a large well-designed randomized trial to be no different from placebo, and may even be harmful. This is a significant ethical lapse, considering that participants often agree to altruistically expose themselves to risks of harms in order to help advance the clinical evidence base. For example in one study of 168 patients, 84% said that they would be happy to participate in clinical trials research and 58% endorsed the statement “I believe results could help other patients in the future”.

In some institutions chloroquine or hydroxychloroquine have been regarded as standard treatments in advance of the evidence. And some trialists, in order to avoid using placebos, have assigned to a control group those who have not consented to take part, thus introducing selection bias.

Unethical trials
Here are some of the ways in which trials may fall short of being regarded as ethical:

  • Failure to randomize allocation or to conceal it during the trial
  • Failure to mask (blinding)
    • Not masked
    • Patient only masked
    • Some investigators not masked
  •  Using inappropriate comparators, for example:
    • Choosing a comparator whose efficacy is unknown
    • Choosing a placebo when there is a recognized effective treatment
    • Choosing as a comparator an effective treatment in an inappropriate dosage
  • Changing a prespecified protocol without good reason and without explaining why
    • Changing the endpoint to report
    • Changing the drugs used
    • Changing dosage regimens
    • Reporting statistical analyses (e.g. subgroup analyses) that were not pre-specified
    • Failing to use a prespecified data monitoring committee
  • Prematurely terminating a trial without good reason 

Trials at a high risk of bias, because of design faults, are failing participants from the start. 

Can unmasked trials be ethical?
The importance of masking has been highlighted in a review that showed that studies that did not mask both participants and investigators gave treatment effect size estimates that were, on average, 19% higher (95% CI = 6-32%) than masked studies gave. Another study suggested that there was no difference between masked and unmasked trials, particularly for objective outcomes; a ratio of odds ratios (ROR) below 1 suggested exaggerated effect estimates in trials without blinding; the values of ROR in different analyses hovered around 1, and, for example, in 14 meta-analyses with outcomes reported by blinded observers the ROR was 0.98 (95% Bayesian credible interval = 0.69 to 1.39). The authors concluded that their results needed to be replicated and that in the meantime blinding of studies should continue. It has also been argued that blinding is not always necessary and may, in some cases, be harmful, for example in discouraging recruitment; this is good polemic, but we believe it to be poor science in treating acute infections.

The need for high-quality trials
We all clearly want to be able to give those who have the disease today the benefits of effective therapies. Which is presumably why so many trialists think that it is acceptable to do small, open, randomized or non-randomized, comparative trials or just uncontrolled case series observations, hoping for a quick fix. But the data that have emerged and been rapidly published, incomplete and without peer review, haven’t helped patients or clinicians. Some have been retracted, but not without having affected clinical practice. There are several problems:

  1. Most of the trials that have been published are inadequate, and the results cannot be relied on. Some early, poorly designed, very small trials on hydroxychloroquine, for example, seemed to show small beneficial effects on COVID-19 disease status, but did not report effects on mortality or adverse events. In at least two cases the trials as performed differed markedly from the studies that were specified in the pre-trial protocols, raising doubts about the standards of practice, reporting, and analysis and increasing the possibility of biases.

For example, in one study, the reported outcome measures were time to clinical recovery, the body temperature recovery time, and the cough remission time. These clinical outcomes were not mentioned in the protocol. Nor were the prespecified virological and haematological outcomes mentioned in the study report. In an unmasked non-randomized study, forming a control group from patients who declined to take part would have introduced selection bias.

Other similarly poor trials on hydroxychloroquine showed no benefits, leaving us wondering, until the results of a large well-designed trial showed clear evidence of no efficacy, and even a possible trend towards harm; the hazard ratio for increased mortality was 1.11 (95% CI = 0.98-1.26; P=0.10).

Poorly designed trials can lead us to believe that an intervention is beneficial, making us uncertain whether well-designed trials would have demonstrated no benefit, or even harm. So we can’t know if we will benefit people at all or perhaps even harm more people by using the intervention than by not using it.

  1. In some places hydroxychloroquine was adopted early on as part of a standard treatment for COVID-19.  In some countries hydroxychloroquine and lopinavir/ritonavir were included early on as part of standard treatment, for comparison with other untested medications, often in open (unmasked) trials. The results of such studies will be impossible to interpret sensibly and will not be helpful in deciding whether the drugs under trial are truly effective and if the benefit to harm balance is favourable. Hydroxychloroquine has now been shown to be ineffective and possibly harmful.
  1. If an intervention doesn’t work or, worse, is on balance harmful, we shall have wasted the time of both participants and investigators, possibly losing lives through the intervention, when we could have been doing properly controlled randomized trials. The authors of an assessment of waste in research concluded that “an important burden of wasted research is related to inadequate methods. This waste could be partly avoided by simple and inexpensive adjustments.” It is a well established principle that it is unethical to do wasteful trials. Although people  who take part in well-designed clinical trials may not themselves benefit from having done so, their time is not wasted, because they thereby help others.

The 4-aminoquinolines hydroxychloroquine/chloroquine have also undoubtedly become victims of “hot stuff bias”, as their prominence in global politics and social media have brought them to the forefront of the COVID-19 research agenda. This has caused not only an influx of lower quality trials, but an oversaturation in the quantity of trials. The research on these treatments is far more extensive than is necessary to confidently assess whether there is any treatment or prophylactic benefit. Skilled researchers, suitable research sites, willing participants, and funding for studies are all natural limits on the number of trials that can be conducted at any one time; an excess number of trials on a given “hot” treatment would present significant opportunity cost to the global research community, diverting time and resources that could be better used examining other promising treatments.

  1. The publication of a poor-quality drug trial, whose results suggest benefit, may discourage investigators from embarking on rigorous trials; or it may encourage widespread use of the drug, as has happened with hydroxychloroquine. This may make it difficult to recruit drug-naive participants into such trials. 
  1. Meanwhile, when the news gets out that a new drug is beneficial, the general public will start to look for the treatment. Early on we heard about a US couple who decided to treat themselves with chloroquine after their President proclaimed it to be effective. They took a formulation intended for cleaning fish tanks. The husband died and the wife became seriously unwell. The FDA later saw fit to issue a warning to the American public not to rush out and buy ivermectin formulated for treating parasitic infections in animals, following the publication of a preprint describing the effects of ivermectin on SARS-CoV-2 in a laboratory petri dish.
  1. A rush to use supplies of a drug that is being used for other purposes puts at risk those who are taking it regularly. Shortages of medicines, particularly hydroxychloroquine, have been reported during the COVID-19 pandemic.
  1. There is also a risk that poorly designed trials will be subjected to inappropriate meta-analysis, whereby incorrect conclusions will be promulgated. For example, a claim that hydroxychloroquine is beneficial in COVID-19, with reported effect sizes, was based on a meta-analysis of a heterogeneous mixture of publications, including articles published in peer-reviewed journals, non-peer-reviewed pre-prints, and articles available on the internet, and premised on a comparison of the numbers of apparently positive and negative studies, similar to vote counting, and where an effect size cannot properly be calculated.

We need well-designed trials
We are in the grip of panicky short-termism, striving to manage the current crisis and losing sight of the equally important future. 

Current patient experience and professional expertise is being wasted. Retrospective studies, and complex observational studies of poor design and using inadequate statistical techniques to attempt to mitigate or disguise deficiencies inherent in non-randomized studies, may not convincingly show either benefits or harms of an intervention, such as hydroxychloroquine, resulting in small effects if any and wide confidence intervals, followed by banal statements that randomized controlled trials are necessary. Such trials, however, when they are eventually carried out are often of poor quality, as we have outlined here, and the results unreliable.

Those who ignore, or worse decry, the use of well-masked, randomized, carefully controlled clinical trials have done damage to our ability to treat the effects of the coronavirus adequately. Those who are in a position to do such trials should be ensuring that they are well designed and likely to give reliable answers. Such trials take time to perform; some are starting to report, but others will not report until after the current outbreak has passed. That is the price that we pay for reliable evidence. We should be concentrating on finding effective treatments for those who are affected when the next outbreak occurs, not looking for an ineffective and possibly harmful patch for those who are currently being affected, for whom we must do the best we can with what we currently know.

A standard argument when short-term policies are criticized is the well-worn observation of the economist John Maynard Keynes, that in the long run we are all dead, a statement from his treatise A Tract on Monetary Reform (1923). What Keynes meant in economic terms is generally misunderstood. However, if we do not initiate more well-designed studies of not just the 4-aminoquinolines but all drugs that are being newly tested in the treatment of COVID-19, we may indeed all be dead, sooner than we would like.

In 2014, during the Ebola virus epidemic, the WHO convened a panel “to consider and assess the ethical implications for clinical decision-making of use of unregistered interventions that have shown promising results in the laboratory and in animal models but that have not yet been evaluated for safety and efficacy in humans.” They concluded that “Researchers have a moral duty to evaluate unproven interventions (for treatment or prevention) in clinical trials that are of the best possible design in the current exceptional circumstances, … in order to establish the safety and efficacy of the interventions or to provide evidence to stop their use.” In our view, there are no barriers, even during an outbreak such as the current one, to carrying out “clinical trials that are of the best possible design”, namely double-masked, randomized, controlled trials of any agent, whether new or repurposed, preferably with a placebo comparator. Kirchoff & Pierson have given good examples of how this has been done with drugs in development in previous epidemics. And others have commented that “large long-term trials that establish standards for drug treatments are so important to the health of the public that every effort, including investigator blinding, should be built into the trial design to produce valid results in studies of the highest reliability and the clearest interpretation”.

The example of the RECOVERY trial shows that it is possible to carry out large-scale high-quality clinical trials that yield reliable and useful results in a completely ethical framework.

So, we echo the battle-cries of eminent trialists and statisticians: