Evidence based medicine
tools
INTERACTIVE Tools for EBM
Evidence-based medicine can feel abstract or intimidating when reduced to formulas alone. This page offers quick, reliable calculators for core evidence-based medicine metrics, including sensitivity, specificity, predictive values, likelihood ratios, number needed to treat/harm, and post-test probability. By making these relationships visible and interactive, the tools are intended to support both learning and clinical reasoning—helping users move from study results to meaningful bedside interpretation.
The goal is not just faster calculation, but deeper understanding and greater confidence using evidence in practice.
2x2 Contingency table & Fagan Nomogram
-
A contingency table is a grid that compares test results (positive/negative) against the true disease status(present/absent).
It organizes outcomes into true positives, false positives, false negatives, and true negatives, allowing calculation of sensitivity, specificity, predictive values, and likelihood ratios.Also known as:
Diagnostic 2×2 Table
Diagnostic Test Performance Table
Confusion Matrix
-
Sensitivity – How often the test correctly identifies people with the condition
(True positive rate)Specificity – How often the test correctly identifies people without the condition
(True negative rate)
-
Positive Likelihood Ratio (LR+) – How much more likely a positive test is in someone with the condition than without it
(Higher values strengthen “rule-in” confidence)Negative Likelihood Ratio (LR−) – How much less likely a negative test is in someone with the condition than without it
(Lower values strengthen “rule-out” confidence)
-
Pre-test probability (prevalence) – Baseline probability of disease before testing
Post-test probability – Probability of disease after accounting for test results
(Derived using likelihood ratios)
-
Using PICO to search for evidence
PICO can also be used in reverse—to formulate a focused clinical question that guides literature searching. Clearly defining each PICO element improves search precision, reduces irrelevant results, and helps identify studies that directly address a clinical uncertainty.
-
What it is:
The specific group of patients or clinical condition being studied.Why it matters:
Determines whether the study applies to your patient or population.Where to find it in an article:
Title
Abstract
Methods → Participants / Study Population
Inclusion / Exclusion Criteria
Common keywords/phrases:
“patients with…”
“adults aged…”
“diagnosed with…”
“inclusion criteria”
“baseline characteristics”
-
What it is:
The treatment, test, exposure, or action being evaluated.Why it matters:
Defines what is being done differently from usual care or placebo.Where to find it in an article:
Abstract
Methods → Intervention / Treatment Protocol
Study Design section
Common keywords/phrases:
“treated with…”
“received…”
“intervention group”
“administered”
“exposure”
-
A Fagan nomogram is a visual tool that uses Bayes’ theorem to convert a pre-test probability and a likelihood ratiointo a post-test probability.
It shows how diagnostic test results change the estimated probability of disease, helping clinicians translate test performance into clinical decision-making. -
Absolute Risk Reduction (ARR) – Difference in outcome risk between treatment and control groups
Absolute Risk Increase (ARI) – Increase in outcome risk caused by an intervention
Number Needed to Treat (NNT) – Number of patients who need treatment to prevent one adverse outcome
Number Needed to Harm (NNH) – Number of patients who need treatment for one to experience harm
-
Accuracy
Accuracy – Overall proportion of correct test results (true positives + true negatives)
(Can be misleading when prevalence is very high or very low)
-
Let’s use the JUPITER statin article to practice using the contingency and nomogram tools.
STEP 1 — Define the clinical question
Population: Apparently healthy adults with normal LDL but elevated hs-CRP
Intervention: Rosuvastatin 20 mg daily
Comparison: Placebo
Outcome: Major cardiovascular eventsSTEP 2 — Extract absolute event counts (Results section)
From the NEJM paper:
Rosuvastatin group: 142 events / 8901
Placebo group: 251 events / 8901
These raw counts matter more than hazard ratios for this exercise.
STEP 3 — Populate the contingency table
Treat statin use as the “test” and cardiovascular event as the outcome.
A: Statin Event (+) n=142
B: Statin Event (-) n=8759
C: Placebo Event (+) n=251
D: Placebo Event (-) n=8650
This allows calculation of:
Absolute risk reduction
Relative risk reduction
Number needed to treat (NNT)
STEP 4 — Output
From these inputs, your calculator will show:
ARR ≈ 1.2% over ~2 years
NNT ≈ 95 over ~2 years
Much larger relative reductions than absolute reductions
STEP 5 — Use the Fagan nomogram conceptually
Although JUPITER is a treatment trial (not a diagnostic test), the nomogram analogy helps interpretation:
Pre-test probability: Low baseline event risk (healthy population)
Likelihood of benefit: Statin modestly shifts probability
Post-test probability: Still low absolute event risk
Visually, even a strong “line” does not cross into large absolute benefit territory.
INTERPRETATION:
1. Relative risk reduction exaggerates perceived benefit
44% relative risk reduction sounds dramatic
Absolute risk reduction was ~0.6% per year
2. NNT is large in low-risk populations
~169 treated for 1 year to prevent one composite event
~500 treated for 1 year to prevent one MI
This is easy to miss without a 2×2 table
3. Early trial termination may overestimate benefit
Trial stopped at 1.9 years instead of planned 4
Long-term harms and durability of benefit remain uncertain
4. Harms matter when baseline risk is low
Increased incidence of diabetes observed
When NNT is high, even small harms can shift net benefit
PICO Method
PICO can be used both to deconstruct a study and to construct a focused clinical question. When reading, it helps identify the key elements of a paper. When searching, it helps define the terms needed to find relevant evidence efficiently
-
Using PICO to read a journal article
The PICO framework provides a structured way to extract what matters most from a journal article. By identifying the patient population, intervention, comparison, and outcomes, readers can quickly determine whether a study is relevant, how it was designed, and whether its findings apply to their clinical context.
-
What it is:
The alternative to the intervention (placebo, standard care, another treatment, or no intervention).Why it matters:
Provides the reference point needed to interpret effect size and clinical relevance.Where to find it in an article:
Abstract
Methods → Control / Comparator
Randomization section
Common keywords/phrases:
“compared with…”
“control group”
“placebo”
“standard of care”
“usual care”
Note: Some studies (especially observational ones) may not have a formal comparator.
-
What it is:
The clinical result(s) measured to determine effectiveness or harm.Why it matters:
Ensures the study measures outcomes that are meaningful for patients or decision-making.Where to find it in an article:
Abstract
Methods → Outcomes
Results
Tables and Figures
Common keywords/phrases:
“primary outcome”
“secondary outcome”
“measured by…”
“incidence of…”
“mortality, symptom score, risk reduction”
Critical Appraisal - Diagnostic Studies
-
Big picture: what this worksheet is asking
This worksheet helps you answer two core questions:
Are the study methods valid?
If valid, are the results useful for clinical decision-making?
You are not being asked to judge whether the test is “good” in absolute terms — only whether the study design allows the results to be trusted and interpreted.
-
This worksheet helps you decide whether a diagnostic accuracy study is trustworthy and clinically meaningful. The first section focuses on study validity—whether the right patients were studied, whether a proper reference standard was used, and whether bias was minimized. The second section focuses on interpreting the results using 2×2 tables, sensitivity, specificity, and likelihood ratios to understand how test results change disease probability.
-
General rule
👉 Almost all answers come from the Methods
If you can’t find it in Methods → often “Unclear” is the correct answer.R — Representative spectrum of patients
(Page 1 of worksheet)
What this is really asking
“Do the patients in the study look like the patients you’d actually use this test on?”
Where to look in the article
Methods → Participants / Study population
Inclusion / exclusion criteria
Setting (ED, outpatient, ICU, screening population)
KEYWORDS:
“consecutive patients”
“random sample”
“suspected [condition]”
“referred for evaluation”
“inclusion criteria”
“exclusion criteria”
How to decide
Yes → broad, real-world population; consecutive or random enrollment
No → highly selected, extreme cases only, obvious disease vs obvious non-disease
Unclear → patient selection not described
📌 Tip: If the study only includes very sick patients or clear positives, sensitivity/specificity may be inflated.
A — Reference standard applied regardless of index test
(Page 1 of worksheet)
What this is really asking
“Did everyone get the gold standard diagnosis — or only people with certain test results?”
Where to look
Methods → Reference standard
Flow diagram (if present)
Keywords
“all patients underwent…”
“reference standard performed on…”
“follow-up used as reference”
“verification bias”
How to decide
Yes → everyone had the same reference standard (or adequate follow-up)
No → only test-positive patients got the reference standard
Unclear → not stated
📌 Why this matters: If negatives aren’t verified, false negatives can be missed.
Mbo — Masked (blinded) comparison
(Page 1 of worksheet)
What this is really asking
“Were the test results interpreted independently?”
Where to look
Methods → Blinding
Radiology / pathology interpretation description
Keywords
“blinded”
“masked”
“interpreted independently”
“read without knowledge of…”
How to decide
Yes → index test and reference standard read without knowledge of each other
No → interpreters knew the other result
Unclear → blinding not mentioned
📌 Tip: If blinding isn’t stated, “Unclear” is acceptable.
Replication question (Page 3)
“Are methods described in enough detail to repeat the test?”
Look for
Test protocol details
Cutoffs / thresholds
Timing of tests
If the test is vaguely described → No or Unclear
-
Are test characteristics presented?
(Page 2 of worksheet)
What this is asking
“Can I build or see a 2×2 table?”
LOOK FOR:
Sensitivity / specificity
Likelihood ratios
Predictive values
Raw numbers (TP, FP, FN, TN)
If raw counts aren’t given but can be derived → that still counts.
How to approach the 2×2 table (don’t panic)
The worksheet walks you through a worked example using dementia prevalence and explicitly shows how to complete the table step-by-step Diagnostic Article Critical App….
KEY MINDSET
You are not expected to memorize formulas.
You are expected to:understand what the numbers represent
know how they relate to clinical decisions
Interpreting likelihood ratios (clinically)
From the worksheet language Diagnostic Article Critical App…:
LR+ > 1 → increases probability of disease (rule-in)
LR− < 1 → decreases probability of disease (rule-out)
Rough intuition:
LR+ ≥ 10 → strong rule-in
LR− ≤ 0.1 → strong rule-out
How to answer “Yes / No / Unclear” without stress
Use this rule:
Yes → explicitly stated
No → explicitly violated
Unclear → not described
“Unclear” is not a wrong answer — it often reflects poor reporting.
Additional Resources
Here are some additional resources for understanding and evaluating research so it can be applied to evidence based medicine.
Users' Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice, 3rd ed