AI/ML in Action - Case Study 1:

Using Machine Learning to Identify a Suitable Patient Population for Anakinra for the Treatment of COVID-19 Under the US EUA

First case of FDA using AL/ML to Identifiy the Patients for a Life-Saving Drug with a Missing Diagnostic

In the high-pressure environment of the COVID-19 pandemic, the interleukin-1 inhibitor Anakinra (Kineret) represented a paradox. While it held the potential to save lives, it was a drug that had struggled to prove its worth; earlier trials had failed to show consistent efficacy because they lacked a way to target the right patients.

The breakthrough finally came with the SAVEMORE trial. By using a specific biomarker—soluble urokinase plasminogen activator receptor (suPAR)—researchers were able to identify a high-risk subpopulation that responded dramatically to the treatment. The data was staggering: patients treated with Anakinra saw a 63% reduction in the odds of worse clinical outcomes (based on an odds ratio of 0.37). To achieve these results, clinicians needed to identify patients with a baseline suPAR level of >= 6 ng/mL, however, the suPAR assay was commercially unavailable in the United States.

Traditionally, a drug’s Emergency Use Authorization (EUA) depends on the availability of a physical diagnostic test. In this case, the FDA faced a diagnostic gap. To bridge access to this life-saving drug for patients who can benefit from it (ie. patients who best reflect the population in the clinical trial), FDA turned to an alternative way using "digital diagnostics".

A Credibility Assessment using FDA Risk-Based Framework

Let us attempt a credibility assessment of this algorithm-based patient selection scoring rule following the risk-based model assessment framework FDA released in the draft guidance in 2025.

Define the Question of Interest (QoI)

The clinical challenge centered on the successful implementation of the enrichment strategy used in the SAVEMORE trial. While other studies of IL-1 antagonists (e.g., tocilizumab, sarilumab) yielded inconsistent results in unselected COVID-19 populations, SAVEMORE demonstrated that anakinra significantly reduced the risk of clinical progression. This benefit, however, was strictly observed in patients with baseline suPAR levels ≥ 6 ng/mL.

Given the unavailability of a cleared suPAR assay in the U.S., the Question of Interest was defined as:

"Which clinical characteristics and laboratory tests can reliably identify patients with baseline suPAR levels ≥ 6 ng/mL?"

The objective was to develop a robust predictive tool using common clinical variables to identify the population most likely to benefit, thereby resolving the diagnostic bottleneck that would otherwise prevent timely access to life-saving therapy.

Define the Context of Use (COU)

The Context of Use (COU) defines the specific role and scope of the AI/ML model within this regulatory framework:

Model Role: The model serves as a clinical stratification tool to identify hospitalized adults with COVID-19 pneumonia who are at high risk of progressing to severe respiratory failure (SRF).
Model Scope: The scoring rule is the alternative patient identification method described in Section 1.1 of the EUA Fact Sheet for Healthcare Providers. It identifies the population authorized for treatment with anakinra (100 mg daily for 10 days).

Human-AI Team Consideration: Consistent with the FDA’s 2025 Draft Guidance on AI, the model does not act as a standalone gatekeeper. While the scoring rule identifies the eligible population, the final clinical decision to initiate therapy remains with the healthcare provider, who must evaluate the patient within the broader clinical context of the EUA.

Determining Model Risk

Applying the Agency's Risk-Based Credibility Assessment Framework, the model risk is classified as High based on two independent factors:

Decision Consequence: High. The decision concerns the treatment of a life-threatening disease. An incorrect decision (e.g., a false negative) could deny a patient a potentially life-saving intervention, leading to progression to SRF or death.
Model Influence: High. Because no cleared suPAR assay exists in the U.S., the AI-driven scoring rule is the primary determinant for treatment eligibility within the authorized population.

Overall Risk Finding: with both high decision consequence and high model influence, the overall risk is High. This classification necessitates the most stringent level of credibility evidence, including independent external validation and high performance standards for specificity and positive predictive value.

Model Description & Development Process

The FDA review team independently employed two AI/ML algorithms to predict suPAR levels ≥ 6 ng/mL based on 30 baseline variables:

Elastic Net Regression: Utilized for feature selection by exploring model penalties to determine the hierarchy of feature importance.
Artificial Neural Network (ANN): This model utilized the Gumbel-softmax technique, which allows for an end-to-end training strategy. This enabled the model to simultaneously select the most important features and determine optimal cutoff values for continuous variables.

The loss function was specifically designed to maximize sensitivity while maintaining a Positive Predictive Value (PPV) ≥ 0.95.

The Eight Scoring Criteria

Both independent models converged on the same eight criteria. A patient is "Score Positive" (likely to have suPAR ≥ 6 ng/mL) if they meet at least 3 of the following:

Age ≥ 75 years
Severe pneumonia (by WHO criteria*)
Current or previous smoking status
Sequential Organ Failure Assessment (SOFA) score ≥ 3
Neutrophil-to-lymphocyte ratio (NLR) ≥ 7
Hemoglobin ≤ 10.5 g/dL
Medical history of ischemic stroke
Blood urea ≥ 50 mg/dL and/or medical history of renal disease

*Note: Severe pneumonia is defined according to the 11-point WHO Clinical Progression Ordinal Scale (WHO-CPS).

Data Sourcing: Training vs. Testing

To ensure robustness, the model was trained and validated using independent datasets with no overlapping patients. The inclusion of "screened-out" patients ensures the data is representative of real-world clinical environments.

Model Performance Evaluation

Performance metrics were prioritized to ensure that the identified population strictly aligned with the high-benefit trial population. The Agency prioritized high PPV (0.94–0.95) and Specificity over Sensitivity. While this means the rule may miss some potentially eligible patients (low sensitivity), this trade-off was a deliberate regulatory choice to maximize the probability of benefit in the treated population.

Exploratory efficacy results confirmed that score-positive patients treated with anakinra had lower odds of more severe disease (Odds Ratio: 0.37–0.39) at Day 28.

While mortality results showed numerical improvement, there remains "considerable uncertainty" (wide confidence intervals) regarding mortality due to small subgroup sizes and low event counts.

Model Life Cycle Management & Mitigating Drift

In accordance with Step 4.b of the 2025 Draft Guidance, credibility must be maintained throughout the model life cycle through the following requirements:

Monitoring and Data Drift: Model performance must be monitored on an ongoing basis to detect "data drift"—where performance degrades due to differences between deployment environments and training data.
Uncertainty Quantification: Sponsors should utilize repeatability and reproducibility studies to quantify the uncertainty associated with model outputs, ensuring they remain fit for use.
Change Management within PQS: Any intentional or model-directed changes must be managed within a Pharmaceutical Quality System (PQS). Changes impacting model performance are considered significant and must be reported to the Agency in accordance with regulatory requirements (e.g., 21 CFR 314.70 or 601.12).
Credibility Evidence Re-execution: If performance metrics shift significantly, steps in the Credibility Assessment Plan—including retraining and retesting—must be re-executed.

In this case, the AI/ML model was used during the review process to identify 8 static patient characteristics, resulting in a fixed, easy-to-implement scoring rule for doctors to use. Because the AI itself is not deployed in hospital predicting individual cases on an ongoing basis, it is not considered "self-evolving" and is not directly exposed to real-time variations in daily data inputs.

Therefore it does not require the intense, continuous software lifecylce maintanence (such as constant retraining or algorithmic adjustment) that the FDA requires for continuously deployed models used in areas like pharmaceutical manufacturing or postmarketing pharmacovigilance.

However, performance monitoring is still required and the real-world applicability must be evaluated.

For the anakinra scoring rule specifically, the FDA acknowledged that the AI model's derivation was exploratory and resulted in a low sensitivity—meaning it might wrongly exclude some patients who could benefit from the drug.

To address this uncertainty and monitor the rule's real-world validity, the FDA included a provision in the Letter of Authorization determining that an additional study is needed to further evaluate the predictive performance of the scoring rule in hospitalized COVID-19 patients.

Why the Model Credibility Evidence is commensurate with the Model Risk

In the Anakinra missing-diagnostic case, the complex machine-learning output is converted into a transparent 8-point scoring checklist, targeting to identify patients who meet the suPAR ≥ 6 ng/mL threshold.

The model shows high transparancy and interpretability that both regulators and the clinicians understand the rationale behind patient selection.

Commensure with its high-risk context of use, FDA used two independent modeling approach which converged on the same 8-point checklist; rigorous validation using independent testing data was performed and the relevant metrics such as PPV and Specificity were high.

Furthermore, the scoring rule is effectively integrated into the standard clinical flow, maintaining the human element in clinical decision-making while leveraging the power of an algorithm.

CONCLUSION

"AI/ML can be powerful tools for drug development and precision medicine, due to their capability to identify patterns in complicated high-dimensional data."

In a decade, will we still be waiting for hours for specialized lab results, or will our clinical eligibility be determined by a validated algorithm the moment we enter the ER?

SOURCE:

https://www.fda.gov/drugs/spotlight-cder-science/using-machine-learning-identify-suitable-patient-population-anakinra-treatment-covid-19-under accessed Feb 26, 2026
https://www.fda.gov/media/163075/download?attachment, accessed Feb 26, 2026
Liu, Q, Nair, R, Huang, R, Zhu, H, Anderson, A., Belen, O, Tran, V, Chiu, R, Higgins, K, Chen, J, He, L, Doddapaneni, S, Huang, SM, Nikolov, NP, & Zineh, I, 2024, Using Machine Learning to Determine a Suitable Patient Population for Anakinra for the Treatment of COVID-19 Under the Emergency Use Authorization, Clin Pharmacol Ther, 115(4): 890–895. doi.org/10.1002/cpt.3191

Page updated

Report abuse