Evidence based medicine

tools

INTERACTIVE Tools for EBM

Evidence-based medicine can feel abstract or intimidating when reduced to formulas alone. This page offers quick, reliable calculators for core evidence-based medicine metrics, including sensitivity, specificity, predictive values, likelihood ratios, number needed to treat/harm, and post-test probability. By making these relationships visible and interactive, the tools are intended to support both learning and clinical reasoning—helping users move from study results to meaningful bedside interpretation.

The goal is not just faster calculation, but deeper understanding and greater confidence using evidence in practice.

2x2 Contingency table & Fagan Nomogram

  • A contingency table is a grid that compares test results (positive/negative) against the true disease status(present/absent).
    It organizes outcomes into true positives, false positives, false negatives, and true negatives, allowing calculation of sensitivity, specificity, predictive values, and likelihood ratios.

    Also known as:

    • Diagnostic 2×2 Table

    • Diagnostic Test Performance Table

    • Confusion Matrix

    • SensitivityHow often the test correctly identifies people with the condition
      (True positive rate)

    • SpecificityHow often the test correctly identifies people without the condition
      (True negative rate)

    • Positive Likelihood Ratio (LR+)How much more likely a positive test is in someone with the condition than without it
      (Higher values strengthen “rule-in” confidence)

    • Negative Likelihood Ratio (LR−)How much less likely a negative test is in someone with the condition than without it
      (Lower values strengthen “rule-out” confidence)

    • Pre-test probability (prevalence)Baseline probability of disease before testing

    • Post-test probabilityProbability of disease after accounting for test results
      (Derived using likelihood ratios)

  • Using PICO to search for evidence

    PICO can also be used in reverse—to formulate a focused clinical question that guides literature searching. Clearly defining each PICO element improves search precision, reduces irrelevant results, and helps identify studies that directly address a clinical uncertainty.

  • What it is:
    The specific group of patients or clinical condition being studied.

    Why it matters:
    Determines whether the study applies to your patient or population.

    Where to find it in an article:

    • Title

    • Abstract

    • Methods → Participants / Study Population

    • Inclusion / Exclusion Criteria

    Common keywords/phrases:

    • “patients with…”

    • “adults aged…”

    • “diagnosed with…”

    • “inclusion criteria”

    • “baseline characteristics”

  • What it is:
    The treatment, test, exposure, or action being evaluated.

    Why it matters:
    Defines what is being done differently from usual care or placebo.

    Where to find it in an article:

    • Abstract

    • Methods → Intervention / Treatment Protocol

    • Study Design section

    Common keywords/phrases:

    • “treated with…”

    • “received…”

    • “intervention group”

    • “administered”

    • “exposure”

  • A Fagan nomogram is a visual tool that uses Bayes’ theorem to convert a pre-test probability and a likelihood ratiointo a post-test probability.
    It shows how diagnostic test results change the estimated probability of disease, helping clinicians translate test performance into clinical decision-making.

    • Absolute Risk Reduction (ARR)Difference in outcome risk between treatment and control groups

    • Absolute Risk Increase (ARI)Increase in outcome risk caused by an intervention

    • Number Needed to Treat (NNT)Number of patients who need treatment to prevent one adverse outcome

    • Number Needed to Harm (NNH)Number of patients who need treatment for one to experience harm

  • Accuracy

    • AccuracyOverall proportion of correct test results (true positives + true negatives)
      (Can be misleading when prevalence is very high or very low)

  • Let’s use the JUPITER statin article to practice using the contingency and nomogram tools.

    STEP 1 — Define the clinical question

    Population: Apparently healthy adults with normal LDL but elevated hs-CRP
    Intervention: Rosuvastatin 20 mg daily
    Comparison: Placebo
    Outcome: Major cardiovascular events

    STEP 2 — Extract absolute event counts (Results section)

    From the NEJM paper:

    • Rosuvastatin group: 142 events / 8901

    • Placebo group: 251 events / 8901

    These raw counts matter more than hazard ratios for this exercise.

    STEP 3 — Populate the contingency table

    Treat statin use as the “test” and cardiovascular event as the outcome.

    A: Statin Event (+) n=142

    B: Statin Event (-) n=8759

    C: Placebo Event (+) n=251

    D: Placebo Event (-) n=8650

    This allows calculation of:

    • Absolute risk reduction

    • Relative risk reduction

    • Number needed to treat (NNT)

    STEP 4 — Output

    From these inputs, your calculator will show:

    • ARR ≈ 1.2% over ~2 years

    • NNT ≈ 95 over ~2 years

    • Much larger relative reductions than absolute reductions

    STEP 5 — Use the Fagan nomogram conceptually

    Although JUPITER is a treatment trial (not a diagnostic test), the nomogram analogy helps interpretation:

    • Pre-test probability: Low baseline event risk (healthy population)

    • Likelihood of benefit: Statin modestly shifts probability

    • Post-test probability: Still low absolute event risk

    Visually, even a strong “line” does not cross into large absolute benefit territory.

    INTERPRETATION:

    1. Relative risk reduction exaggerates perceived benefit

    • 44% relative risk reduction sounds dramatic

    • Absolute risk reduction was ~0.6% per year

    2. NNT is large in low-risk populations

    • ~169 treated for 1 year to prevent one composite event

    • ~500 treated for 1 year to prevent one MI

    • This is easy to miss without a 2×2 table

    3. Early trial termination may overestimate benefit

    • Trial stopped at 1.9 years instead of planned 4

    • Long-term harms and durability of benefit remain uncertain

    4. Harms matter when baseline risk is low

    • Increased incidence of diabetes observed

    • When NNT is high, even small harms can shift net benefit

EBM Tools
A
True Positive
n =
B
False Positive
n =
C
False Negative
n =
D
True Negative
n =
Total A + C
n =
Sensitivity
TP
TP + FN
=
Total B + D
n =
Specificity
TN
FP + TN
=
Positive Predictive Value
TP
TP + FP
=
+ LR =
Negative Predictive Value
TN
FN + TN
=
- LR =
Fagan Nomogram
Pre-Test Prob =
For educational use. Not a substitute for clinical judgment.

PICO Method

PICO can be used both to deconstruct a study and to construct a focused clinical question. When reading, it helps identify the key elements of a paper. When searching, it helps define the terms needed to find relevant evidence efficiently

  • Using PICO to read a journal article

    The PICO framework provides a structured way to extract what matters most from a journal article. By identifying the patient population, intervention, comparison, and outcomes, readers can quickly determine whether a study is relevant, how it was designed, and whether its findings apply to their clinical context.

  • What it is:
    The alternative to the intervention (placebo, standard care, another treatment, or no intervention).

    Why it matters:
    Provides the reference point needed to interpret effect size and clinical relevance.

    Where to find it in an article:

    • Abstract

    • Methods → Control / Comparator

    • Randomization section

    Common keywords/phrases:

    • “compared with…”

    • “control group”

    • “placebo”

    • “standard of care”

    • “usual care”

    Note: Some studies (especially observational ones) may not have a formal comparator.

  • What it is:
    The clinical result(s) measured to determine effectiveness or harm.

    Why it matters:
    Ensures the study measures outcomes that are meaningful for patients or decision-making.

    Where to find it in an article:

    • Abstract

    • Methods → Outcomes

    • Results

    • Tables and Figures

    Common keywords/phrases:

    • “primary outcome”

    • “secondary outcome”

    • “measured by…”

    • “incidence of…”

    • “mortality, symptom score, risk reduction”

Critical Appraisal - Diagnostic Studies

  • Big picture: what this worksheet is asking

    This worksheet helps you answer two core questions:

    1. Are the study methods valid?

    2. If valid, are the results useful for clinical decision-making?

    You are not being asked to judge whether the test is “good” in absolute terms — only whether the study design allows the results to be trusted and interpreted.

  • This worksheet helps you decide whether a diagnostic accuracy study is trustworthy and clinically meaningful. The first section focuses on study validity—whether the right patients were studied, whether a proper reference standard was used, and whether bias was minimized. The second section focuses on interpreting the results using 2×2 tables, sensitivity, specificity, and likelihood ratios to understand how test results change disease probability.

  • General rule

    👉 Almost all answers come from the Methods
    If you can’t find it in Methods → often “Unclear” is the correct answer.

    R — Representative spectrum of patients

    (Page 1 of worksheet)

    What this is really asking

    “Do the patients in the study look like the patients you’d actually use this test on?”

    Where to look in the article

    • Methods → Participants / Study population

    • Inclusion / exclusion criteria

    • Setting (ED, outpatient, ICU, screening population)

    KEYWORDS:

    • “consecutive patients”

    • “random sample”

    • “suspected [condition]”

    • “referred for evaluation”

    • “inclusion criteria”

    • “exclusion criteria”

    How to decide

    • Yes → broad, real-world population; consecutive or random enrollment

    • No → highly selected, extreme cases only, obvious disease vs obvious non-disease

    • Unclear → patient selection not described

    📌 Tip: If the study only includes very sick patients or clear positives, sensitivity/specificity may be inflated.

    A — Reference standard applied regardless of index test

    (Page 1 of worksheet)

    What this is really asking

    “Did everyone get the gold standard diagnosis — or only people with certain test results?”

    Where to look

    • Methods → Reference standard

    • Flow diagram (if present)

    Keywords

    • “all patients underwent…”

    • “reference standard performed on…”

    • “follow-up used as reference”

    • “verification bias”

    How to decide

    • Yes → everyone had the same reference standard (or adequate follow-up)

    • No → only test-positive patients got the reference standard

    • Unclear → not stated

    📌 Why this matters: If negatives aren’t verified, false negatives can be missed.

    Mbo — Masked (blinded) comparison

    (Page 1 of worksheet)

    What this is really asking

    “Were the test results interpreted independently?”

    Where to look

    • Methods → Blinding

    • Radiology / pathology interpretation description

    Keywords

    • “blinded”

    • “masked”

    • “interpreted independently”

    • “read without knowledge of…”

    How to decide

    • Yes → index test and reference standard read without knowledge of each other

    • No → interpreters knew the other result

    • Unclear → blinding not mentioned

    📌 Tip: If blinding isn’t stated, “Unclear” is acceptable.

    Replication question (Page 3)

    “Are methods described in enough detail to repeat the test?”

    Look for

    • Test protocol details

    • Cutoffs / thresholds

    • Timing of tests

    If the test is vaguely described → No or Unclear

  • Are test characteristics presented?

    (Page 2 of worksheet)

    What this is asking

    “Can I build or see a 2×2 table?”

    LOOK FOR:

    • Sensitivity / specificity

    • Likelihood ratios

    • Predictive values

    • Raw numbers (TP, FP, FN, TN)

    If raw counts aren’t given but can be derived → that still counts.

    How to approach the 2×2 table (don’t panic)

    The worksheet walks you through a worked example using dementia prevalence and explicitly shows how to complete the table step-by-step Diagnostic Article Critical App….

    KEY MINDSET

    You are not expected to memorize formulas.
    You are expected to:

    • understand what the numbers represent

    • know how they relate to clinical decisions

    Interpreting likelihood ratios (clinically)

    From the worksheet language Diagnostic Article Critical App…:

    • LR+ > 1 → increases probability of disease (rule-in)

    • LR− < 1 → decreases probability of disease (rule-out)

    Rough intuition:

    • LR+ ≥ 10 → strong rule-in

    • LR− ≤ 0.1 → strong rule-out

    How to answer “Yes / No / Unclear” without stress

    Use this rule:

    • Yes → explicitly stated

    • No → explicitly violated

    • Unclear → not described

    “Unclear” is not a wrong answer — it often reflects poor reporting.

Additional Resources

Here are some additional resources for understanding and evaluating research so it can be applied to evidence based medicine.

Users' Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice, 3rd ed