ROCKET AF Investigators Say New Analysis Supports Original Trial Results

–Test of stored blood may help answer troubling questions about the trial.
A new analysis of stored blood by the ROCKET AF trial investigators may help resolve lingering questions about the trial.

The questions about ROCKET AF, which compared rivaroxaban (Xarelto, Johnson & Johnson) to warfarin in patients with atrial fibrillation, emerged last November, when it was revealed that the portable device used to monitor and calibrate warfarin usage in the trial had been defective and had been the subject of a serious FDA recall. Since then two separate investigations, one from the trial investigators and one from the European Medicines Agency [PDF], concluded that the problem with the device did not appear to have substantially altered the reliability of the trial’s findings.

In the new report, published online in the New England Journal of Medicine, the investigators present a second effort to understand the impact of the defective device. In this case they compared INR scores of blood samples obtained from trial subjects at 12 and 24 weeks with INR scores obtained at the same time from the portable device used in the trial. However, they acknowledge that this method only allowed them to compare a small percentage– 6%– of all the point of care INR tests used in the trial.

13% of the samples were significantly different, according to international standards, at either week 12 or week 24. 4% of the samples were discordant at both time points. These discrepancies did not occur more often in people with the conditions, including inflammation, infections, and anemia, that were highlighted in the FDA recall notice. About 60% of the subjects would have had no change in their treatment criteria, which was based on INRs of below 2, 2-3, and over 3. More than one third of patients had a lower INR level in the stored sample while 4% had a higher level.

The investigators reported that bleeding and stroke rates were higher in subjects with discrepant values. But, they explained in a statement to the press, “if potentially underestimated INR values from the POC device testing led to clinical events, then more bleeding but not higher stroke rates would be expected in the warfarin-treated patients.” In addition, they found that rivaroxaban-treated patients with discrepant results also had higher bleeding rates.” This suggests that this discordance may be a marker for a different type of patient , said Manesh Patel (Duke University), a co-author of the NEJM letter and a member of the ROCKET AF executive committee, in an interview.

In another analysis the investigators compared the outcome of warfarin and rivaroxaban patients in only those with nondiscrepant values at both time points. The rate of stroke or systemic embolism was 1.46 per 100 patient-years for rivaroxaban compared with 1.37 for warfarin (HR 1.06, CI 0.78-1.45, p=0.70 for superiority, p=0.03 for noninferiority). For major and nonmajor clinically relevant bleeding there were 11.36 events per 100 patient-years in the rivaroxaban group compared with 12.42 in the warfarin group (HR 0.92, CI 0.82-1.03, p=0.13).

The investigators concluded that their results “are consistent with the originally reported overall trial results,” though they “acknowledge the limitations of these analyses.” In his interview Patel said that the trial investigators have no current plans to perform additional studies to address this issue.

Previous coverage of this topic:


  1. David Grainger says

    Not sure this analysis really moves the debate forward.

    Its presented as “evidence” that any inaccuracy in INR did not affect the conclusion of the trial, but the methodology completely lacks power.

    For a start, the real question that needs to be addressed is whether there is evidence the two groups of INRs from the different methods are essentially the same, not whether there is evidence they are different. The lack of evidence for a difference (what is in this new paper) is a long way from being the same as evidence for no difference.

    This is more important because so few of the observations from ROCKET-AF (6%) were available for re-analysis. So the power to detect a difference is negligible.

    The correct methodology to address this question was a Bland-Altman Test. It would be interesting to see a Bland-Altman analysis of the same data. Did the team not do this because they are ignorant of the correct methodology or preferred the answer from doing this powerless analysis?

    The other approach in the paper is also flawed. Picking those individuals where there was a significant difference in INR between the methods and asking if there was any difference in outcomes is also hopelessly lacking in power. Even if data had been available for 100% of the ROCKET-AF patients, it would STILL have been flawed by a lack of power: because only a proportion of the individuals crossed the threshold to show a difference between the methods, and because the original study needed all the observations to be powered to detect the effect size seen (the original study p value on the primary end-point was not very far below 0.05), then any subset of the data is too small to allow significant detection of a different effect size.

    All in all, this paper seems more like an attempt to apply a veneer of statistical respectability over the situation, and thereby bolster a conclusion which is likely correct but no longer absolutely robust, rather than a material contribution to the quest for true knowledge of the superiority of DOACs vs warfarin.

Speak Your Mind