Database Credentialed Access
Antibiotic Resistance Microbiology Dataset Mass General Brigham (ARMD-MGB)
Published: Dec. 5, 2025. Version: 1.0.0
When using this resource, please cite:
(show more options)
Wei, Z., & Kanjilal, S. (2025). Antibiotic Resistance Microbiology Dataset Mass General Brigham (ARMD-MGB) (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/2r5k-b955
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
Abstract
The Antibiotic Resistance Microbiology Dataset – MGB (ARMD-MGB) is a de-identified resource derived from electronic health records (EHR) that facilitates research in antimicrobial resistance (AMR). ARMD-MGB encompasses data collected from over 225,000 adult patients over 10 years from hospitals in the Mass General Brigham healthcare system. The focus of the data is on over 970,000 microbiological cultures, with associated antibiotic susceptibilities, and clinical and demographic features of the patients who submitted the samples. Key attributes include organism identity, semi-quantitative antibiotic susceptibility results, susceptibility phenotypes, and de-identified clinical metadata.
The ARMD-MGB dataset is designed to complement the ARMD-Stanford dataset. Cohort inclusion and exclusion criteria, feature descriptions, and encoding are similar across sites. These datasets support studies on antimicrobial stewardship, causal inference, and clinical decision-making, and are designed to be reusable and interoperable, promoting collaboration and innovation in combating AMR.
Background
Antimicrobial resistance (AMR) arises from the multiscale interaction between evolutionary forces, microbiology, the built environment, and human behavior, making it one of the great challenges of the 21st century [1-3]. In 2019, AMR was linked to nearly 5 million deaths worldwide, including at least 1.27 million directly due to resistant infections [4]. In the U.S., over 2.8 million AMR infections occur annually, resulting in more than 35,000 deaths [5] and costs of more than $2 billion [6].
Machine learning (ML) is a subfield of artificial intelligence (AI) that has emerged as one potential avenue by which to address this complex phenomenon. ML is primarily concerned with the development of algorithms that are able to build a predictive model using a training data set, with little to no human input [7]. Interest in applying ML to electronic health record (EHR) data to understand AMR has intensified over the past decade, reflecting the exponential increase in biological and medical data availability, massive improvements in computational power, and critical breakthroughs in algorithm development [8].
Prior ML studies with EHR data have led to increased insight into AMR epidemiology [9,10], optimal treatment selection [11], and the building of AI-supported decision support tools [12,13], among many other topics. However, the wider adoption of these algorithms is limited by the use of single-center datasets, which limit generalizability, and the use of relatively few model types, along with sparse clinical metadata. To truly harness the power of ML, there is an important need for broadly generalizable datasets that contain diverse clinical practice patterns, patient populations, and patterns of AMR.
The ARMD-MGB and ARMD-Stanford [14] projects represent a pioneering effort to overcome this fundamental gap through the release of fully de-identified, population-based, multi-center, harmonized EHR datasets that contain rich metadata for hundreds of thousands of patients, and are specifically engineered for the study of AMR. Both ARMD datasets differ from previously publicly available datasets in this space, which focus only on genetic and genomic determinants of resistance [15,16], or lack deep clinical metadata [17], and/or lack sufficient power for individual-level prediction across a broad population [13].
ARMD-MGB uses the same inclusion and exclusion criteria, as well as an identical feature engineering pipeline as ARMD-Stanford, but there are some important differences between the two.
- ARMD-MGB is specific to patients who received care within the Mass General Brigham healthcare system, a network of 12 academic and community hospitals located in the New England area, and contains data from 2015 through 2024. ARMD-Stanford is specific to the Stanford Health Care, located in California, and contains data from 1999 through 2024.
- ARMD-MGB has granular microbiology data, including fields such as the susceptibility test method, minimum inhibitory concentrations and disk diameters, presence of beta-lactamase enzymes, and uniform breakpoint interpretations (from the CLSI M-100 document), which are not present in the ARMD-Stanford dataset.
- ARMD-MGB categorizes ward type into
inpatient/outpatient/emergency room/urgent care/day surgerywhile ARMD-Stanford categorizes ward types intoinpatient/ICU/outpatient/emergency room.
Methods
Cohort Selection
- Inclusion: Adult patients (age >= 18) who submitted a urine, blood, or respiratory culture for clinical and surveillance purposes to the clinical microbiology laboratory.
- Exclusion: Microbiology cultures with no previous culture within the last 14 days.
Data de-identification
- Patient ages are binned into categories (defined below in the data dictionary).
- Patient ID, Patient Encounter ID, and Microbiology Culture Order ID are mapped to a random integer.
- All dates are randomly shifted for each patient, encounter, and microbiology culture. They remain internally consistent for a given patient and across datasets.
Derivation of susceptibility phenotype labels
The medical record contains microbiological testing results for any specimens sent to the clinical microbiology labs of any hospital within the Massachusetts General Brigham healthcare system, which encompasses 12 hospitals in the New England region. The data includes the body site of collection, the identity of the pathogen, and its susceptibility testing results against standard panels of antibiotics. The dataset contains the metric used for each test (e.g., minimum inhibitory concentration (MIC) vs. disk diameter (DD)) and the numeric value of the corresponding test result, as well as the date and location category of specimen collection (both anonymized but internally consistent).
The microbiology data contains the susceptibility phenotypic labels reported by the laboratory based on their internal protocols, as well as phenotypes derived from clinical breakpoints published by the Clinical and Laboratory Standards Institute (CLSI) in the 2022 edition of the M-100 document [18]. This facilitates comparison across time where changes in clinical breakpoints may have otherwise led to discrepancies in reporting.
Data Description
File descriptions
Identifying feature descriptions (common to all datasets)
anon_id- Description: De-identified patient ID
- Type: Integer
- Encoding: random number
- Units: N/A
- Range: 1 - 226,659
pat_enc_csn_id_coded- Description: De-identified patient encounter ID, assigned every time a patient visits the hospital for any reason
- Type: Integer
- Encoding: random ID de-identification
- Units: N/A
- Range: 1 - 497,096
order_proc_id_coded- Description: De-identified microbiology culture ID, assigned for every unique sample received by the lab
- Type: Integer
- Encoding: random ID de-identification
- Units: N/A
- Range: 1 - 970,165
order_time_jittered_utc_shifted- Description: Microbiology culture order timestamp (anonymized but internally consistent)
- Type: DateTime
- Encoding: N/A
- Units: UTC
- Range: Anonymized
microbiology_cohort_deid_tj.csv
Description: Microbiology culture data for the study cohort.
- Empty values indicate field is N/A for that culture (i.e., culture was negative)
culture_description- Description: Body site of culture
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
organism- Description: Organism name assigned by the clinical microbiology laboratory
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
neg_cx- Description: Flag if the culture was negative
- Type: Binary
- Encoding: N/A
- Units: N/A
- Range: X / NA
mult_org_ast- Description: Flag if multiple organisms were present in the sample AND had antimicrobial susceptibility testing (AST) performed
- Type: Binary
- Encoding: N/A
- Units: N/A
- Range: X / NA
has_AST- Description: Flag if AST was performed
- Type: Binary
- Encoding: N/A
- Units: N/A
- Range: X / NA
prelim_AST- Description: Flag if reported results were preliminary
- Type: Binary
- Encoding: N/A
- Units: N/A
- Range: X / NA
AST_panel- Description: Type of AST method used (broth microdilution, disk diameter, gradient diffusion, PCR, etc.)
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
enzyme_class- Description: Class of enzyme or mutation tested for (ESBL, PBP2a, carbapenemase, etc.)
- Type: Categorical
- Encoding:
ESBL(extended spectrum beta-lactamase);PBP(penicillin binding protein);beta_lactamase;carbapenemase - Units: N/A
- Range: N/A
enzyme- Description: Specific enzyme or mutant tested for
- Type: Categorical
- Encoding:
blaZ;mecA;NDM;KPC;IMP;OXA;VIM - Units: N/A
- Range: N/A
AST_code- Description: Three-letter code derived from the American Society of Microbiology abbreviation list for manuscripts [19] for the antibiotic tested in the AST panel
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
antibiotic- Description: Full name of the antibiotic tested in AST panel
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
AST_inequality- Description: Operator used in AST result
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: < / <= / > / >= / =
AST_val1- Description: Semi-quantitative value determined by AST panel
- Type: Numerical
- Encoding: N/A
- Units: varies
- Range: N/A
AST_val2- Description: Second value if drug tested had two components (e.g., trimethoprim-sulfamethoxazole)
- Type: Numerical
- Encoding: N/A
- Units: varies
- Range: N/A
AST_pheno- Description: Phenotypic label (Susceptible, Intermediate, Resistant, etc.) determined by lab internal protocols
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: Susceptible / Intermediate / Resistant / Other
CLSI_2022_pheno- Description: Phenotypic label determined by the 2022 version of CLSI M-100 [18]
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: Susceptible / Intermediate / Resistant / Other
ADI_deid_tj.csv
Description: Area Deprivation Index (ADI) scores for the patient based on the ZIP code [20, 21].
- Empty values indicate field is missing
adi_score- Description: Actual ADI score
- Type: Numerical
- Encoding: N/A
- Units: N/A
- Range: 1.32 - 78.7
adi_state_rank- Description: Ranking of the ADI score within the state
- Type: Integer
- Encoding: N/A
- Units: N/A
- Range: 1–10
comorbidity_deid_tj.csv
Description: Patient comorbidities
- Empty values indicate field is missing
ICD10- Description: ICD-10 code of diagnosis
- Type: String
- Encoding: N/A
- Units: N/A
- Range: N/A
category- Description: Elixhauser comorbidity category of diagnosis [22]
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
demographics_deid_tj.csv
Description: Patient demographics
- Empty values indicate field is missing
age- Description: Age category of patient at time of microbiology culture collection
- Type: Categorical
- Encoding: N/A
- Units: years
- Range: 18-24 / 25-34 / 35-44 / 45-54 / 55-64 / 65-74 / 75-84 / 85-89 / ≥90
gender- Description: Gender identity of the patient at the time of microbiology culture collection
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: Male / Female / Other / Unknown
nursing_home_visits_deid_tj.csv
Description: Nursing home visits relative to culture time
- Empty values indicate field is missing
nursing_home_visit_culture- Description: Time interval between the last nursing home visit and the current microbiology culture
- Type: Numerical
- Encoding: N/A
- Units: days
- Range: 0 - 160
visit_date_shifted- Description: Date of last nursing home visit relative to culture (anonymized but internally consistent)
- Type: DateTime
- Encoding: N/A
- Units: UTC
- Range: Anonymized
prior_abx_deid_tj.csv
Description: Prior exposures to antimicrobials relative to the current microbiology culture
- Empty values indicate field is missing
last_dose_to_culture- Description: Time interval between last antibiotic dose and current microbiology culture
- Type: Numerical
- Encoding: N/A
- Units: days
- Range: 0 - 3,402
drug_code- Description: Three-letter code derived from the American Society of Microbiology abbreviation list for manuscripts [19] for the prior antibiotic exposure
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
medication_name- Description: Full name of the antibiotic taken or ordered
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
drug_class- Description: Antibiotic class (anti-staph beta-lactam, fluoroquinolone, etc.)
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
last_dose_DT_shifted- Description: Timestamp of prior antibiotic (anonymized but internally consistent)
- Type: DateTime
- Encoding: N/A
- Units: UTC
- Range: Anonymized
prior_micro_deid_tj.csv
Description: Full prior microbiology history for patient relative to current microbiology culture
- Empty values
AST_pheno: N/A for that culture (i.e., no breakpoints exist or were reported)- All other fields (
organism,prior_AST_DTS_shifted, etc.): Patient had no prior microbiology data
organism- Description: Name of the organism identified in a prior culture
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
drug_code- Description: Three-letter code derived from the American Society of Microbiology abbreviation list for manuscripts [19] for the antibiotic tested in the AST panel for the prior organism
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
antibiotic- Description: Full name of the antibiotic tested in AST panel for the prior organism
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
prior_AST_time_to_culture- Description: Time interval between the prior AST result and the current microbiology culture
- Type: Numerical
- Encoding: N/A
- Units: days
- Range: 1 - 3,332
prior_AST_DTS_shifted- Description: Timestamp of prior AST result (anonymized but internally consistent)
- Type: DateTime
- Encoding: N/A
- Units: UTC
- Range: Anonymized
prior_org_deid_tj.csv
Description: Prior organism history for patient relative to current microbiology culture
- Empty values indicate field is missing
prior_org_days_to_culture- Description: Time interval between prior organism and current microbiology culture
- Type: Numerical
- Encoding: N/A
- Units: days
- Range: 1 - 3,417
prior_org- Description: Name of the organism in prior microbiology culture
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
prior_org_specific- Description: Organism category for the organism in prior microbiology culture (categorization schema shown below)
- Type: Categorical
- Encoding: N/A
- Units: N/A
- Range: N/A
prior_org_recorded_time_shifted- Description: Recorded time of prior microbiology culture (anonymized but internally consistent)
- Type: DateTime
- Encoding: N/A
- Units: UTC
- Range: Anonymized
prior_procedures_deid_tj.csv
Description: Prior procedures relative to current microbiology culture. Each procedure has a unique timestamp (not included). Multiple rows with the same procedure indicate a duration. For example, the following snippet indicates the patient was mechanically ventilated for 3 days, 62 to 64 days prior to a culture ordered on 2075-01-19:
| anon_id | pat_enc_csn_id_coded | order_proc_id_coded | procedure_description | procedure_days_culture | order_time_jittered_utc_shifted |
| 54816 | 239397 | 156168 | mech_vent | 64 | 2075-01-19 |
| 54816 | 239397 | 156168 | mech_vent | 63 | 2075-01-19 |
| 54816 | 239397 | 156168 | mech_vent | 62 | 2075-01-19 |
- Empty values indicate field is missing
procedure_description- Description: Type of procedure (surgical, central line, dialysis, ventilation, etc.), inferred from CPT codes
- Type: Categorical
- Encoding:
mechvent(mechanical ventilation);surgical_procedure(of any type);cvc(central venous catheter);dialysis;urethral_catheter - Units: N/A
- Range: N/A
procedure_days_culture- Description: Time interval between procedure and current microbiology culture
- Type: Numerical
- Encoding: N/A
- Units: days
- Range: 1 - 4,663
ward_type_deid_tj.csv
Description: Setting for the clinical encounter when the microbiology culture was taken.
- Empty values indicate field is missing
hosp_ward_IP- Description: Flag for inpatient encounter
- Type: Binary
- Encoding: 1: yes / 0: no
- Units: N/A
- Range: 0 / 1
hosp_ward_OP- Description: Flag for outpatient encounter
- Type: Binary
- Encoding: 1: yes / 0: no
- Units: N/A
- Range: 0 / 1
hosp_ward_ER- Description: Flag for emergency room encounter
- Type: Binary
- Encoding: 1: yes / 0: no
- Units: N/A
- Range: 0 / 1
hosp_ward_UC- Description: Flag for urgent care room encounter
- Type: Binary
- Encoding: 1: yes / 0: no
- Units: N/A
- Range: 0 / 1
hosp_ward_day_surg- Description: Flag for day surgery encounter
- Type: Binary
- Encoding: 1: yes / 0: no
- Units: N/A
- Range: 0 / 1
Definition of organism categories:
| prior_org_specific | genus | species | resistance_profile |
|
Burkholderia sp
|
Burkholderia | ||
|
C_diff
|
Clostridioides | difficile | |
|
Candida_non_albicans
|
Candida | non albicans | |
|
DR_A_baumannii
|
Acinetobacter | baumanni | SAM |
|
DR_P_aeruginosa
|
Pseudomonas | aeruginosa | CAZ | FEP | TZP |
|
DR_S_maltophilia
|
Stenotrophomonas | maltophilia | SXT | MIN |
|
DS_A_baumannii
|
Acinetobacter | baumanni | |
|
DS_C_freundii
|
Citrobacter | freundii | |
|
DS_C_koseri
|
Citrobacter | koseri | |
|
DS_E_cloacae
|
Enterobacter | cloacae | |
|
DS_E_coli
|
Escherichia | coli | |
|
DS_K_aerogenes
|
Klebsiella | aerogenes | |
|
DS_K_oxytoca
|
Klebsiella | oxytoca | |
|
DS_K_pneumoniae
|
Klebsiella | pneumoniae | |
|
DS_M_morganii
|
Morganella | morganii | |
|
DS_P_aeruginosa
|
Pseudomonas | aeruginosa | |
|
DS_P_mirabilis
|
Proteus | mirabilis | |
|
DS_Providencia
|
Providencia | ||
|
DS_S_maltophilia
|
Stenotrophomonas | maltophilia | |
|
DS_S_marcescens
|
Serratia | marcescens | |
|
ESBL_C_freundii
|
Citrobacter | freundii | CRO | CAZ | FEP | TZP |
|
ESBL_C_koseri
|
Citrobacter | koseri | CRO | CAZ | FEP | TZP |
|
ESBL_E_cloacae
|
Enterobacter | cloacae | CRO | CAZ | FEP | TZP |
|
ESBL_E_coli
|
Escherichia | coli | CRO | CAZ | FEP | TZP |
|
ESBL_K_aerogenes
|
Klebsiella | aerogenes | CRO | CAZ | FEP | TZP |
|
ESBL_K_oxytoca
|
Klebsiella | oxytoca | CRO | CAZ | FEP | TZP |
|
ESBL_K_pneumoniae
|
Klebsiella | pneumoniae | CRO | CAZ | FEP | TZP |
|
ESBL_M_morganii
|
Morganella | morganii | CRO | CAZ | FEP | TZP |
|
ESBL_P_mirabilis
|
Proteus | mirabilis | CRO | CAZ | FEP | TZP |
|
ESBL_Providencia
|
Providencia | CRO | CAZ | FEP | TZP | |
|
ESBL_S_marcescens
|
Serratia | marcescens | CRO | CAZ | FEP | TZP |
|
MRSA
|
Staphylococcus | aureus | OXA | FOX |
|
MSSA
|
Staphylococcus | aureus | |
|
P_vulgaris
|
Proteus | vulgaris | |
|
S_mitis_oralis
|
Streptococcus | mitis / oral | |
|
S_pneumoniae
|
Streptococcus | pneumoniae | |
|
Salmonella sp
|
Salmonella | ||
|
Shigella sp
|
Shigella | ||
|
VRE_faecalis
|
Enterococcus | faecalis | VAN |
|
VRE_faecium
|
Enterococcus | faecium | VAN |
|
VSE_faecalis
|
Enterococcus | faecalis | |
|
VSE_faecium
|
Enterococcus | faecium |
Drug codes used for the categorization:
- CAZ - Ceftazidime
- CRO - Ceftriaxone
- FEP - Cefepime
- FOX - Cefoxitin
- OXA - Oxacillin
- SAM - Ampicillin-sulbactam
- SXT - Trimethoprim-sulfamethoxazole
- TZP - Piperacillin-tazobactam
- VAN - Vancomycin
Usage Notes
Getting Started
The ARMD-MGB dataset is provided as a collection of de-identified CSV files. Each table can be linked using the anon_id (de-identified patient identifier) and, where applicable, the pat_enc_csn_id_coded (encounter identifier) or order_proc_id_coded (microbiology accession identifier).
Below is a minimal example demonstrating how to load and interpret key variables:
Load microbiology culture data
import pandas as pd
micro = pd.read_csv("microbiology_cohort_deid_tj.csv")
print(micro.head())
Interpret antibiotic susceptibility phenotype
phenotype_counts = micro["CLSI_2022_pheno"].value_counts()
print(phenotype_counts)
Intended Uses
ARMD-MGB is intended to support research on:
- Antimicrobial resistance (AMR) epidemiology, including longitudinal and organism-specific resistance trends.
- Antibiotic stewardship and prescribing practices, by linking prior antimicrobial exposure to subsequent resistance outcomes.
- Causal inference and predictive modeling, through integration of microbiological, clinical, and environmental covariates.
- Cross-site and temporal benchmarking, when used in conjunction with ARMD-Stanford or other harmonized datasets.
Because ARMD-MGB adheres to a harmonized schema shared with ARMD-Stanford, analyses can be conducted across both datasets to assess reproducibility and generalizability across health systems.
Limitations
- Geographic and institutional context: Data originate from the Mass General Brigham (MGB) healthcare system in New England; therefore, resistance patterns may differ from other regions.
- Temporal heterogeneity: Changes in clinical practice and diagnostic platforms over the 10-year period may influence detection rates.
Release Notes
Version 1.0.0: Initial public release of the dataset.
Ethics
This study was approved by the Institutional Review Board (IRB) of Massachusetts General Brigham healthcare, protocol 2017P000682.
Conflicts of Interest
The authors declare no conflicts of interest
References
- Murray CJ, Ikuta KS, Sharara F, Swetschinski L, Aguilar GR, et al. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet. 2022;399(10325):629–55.
- CDC. Antibiotic Resistance Threats in the United States, 2019 [Internet]. 2019 Nov p. 1–148. Available from: https://www.cdc.gov/antimicrobial-resistance/media/pdfs/2019-ar-threats-report-508.pdf?CDC_AAref_Val=https://www.cdc.gov/drugresistance/pdf/threats-report/2019-ar-threats-report-508.pdf
- Haredasht FN, Amrollahi F, Maddali MV, Marshall N, Ma SP, Cooper LN, et al. Antibiotic Resistance Microbiology Dataset (ARMD): A Resource for Antimicrobial Resistance from EHRs. Sci Data. 2025;12(1):1299.
- Schechner V, Temkin E, Harbarth S, Carmeli Y, Schwaber MJ. Epidemiological Interpretation of Studies Examining the Effect of Antibiotic Usage on Resistance. Clinical Microbiology Reviews. 2013 Apr 3;26(2):289–307.
- Darby EM, Trampari E, Siasat P, Gaya MS, Alav I, Webber MA, et al. Molecular mechanisms of antibiotic resistance revisited. Nat Rev Microbiol. 2023;21(5):280–95.
- Gontjes KJ, Gibson KE, Lansing BJ, Mantey J, Jones KM, Cassone M, et al. Association of Exposure to High-risk Antibiotics in Acute Care Hospitals With Multidrug-Resistant Organism Burden in Nursing Homes. Jama Netw Open. 2022;5(2):e2144959.
- Zhu YG, Zhao Y, Li B, Huang CL, Zhang SY, Yu S, et al. Continental-scale pollution of estuaries with antibiotic resistance genes. Nat Microbiol. 2017 Jan 30;2(4):16270.
- Anahtar MN, Yang JH, Kanjilal S. Applications of Machine Learning to the Problem of Antimicrobial Resistance: an Emerging Model for Translational Research. J Clin Microbiol. 2021;59(7):e01260-20.
- Stracy M, Snitser O, Yelin I, Amer Y, Parizade M, Katz R, et al. Minimizing treatment-induced emergence of antibiotic resistance in bacterial infections. Science. 2022;375(6583):889–94.
- Yelin I, Snitser O, Novich G, Katz R, Tal O, Parizade M, et al. Personal clinical history predicts antibiotic resistance of urinary tract infections. Nat Med. 2019 Jul 5;25(7):1143–52.
- Corbin CK, Sung L, Chattopadhyay A, Noshad M, Chang A, Deresinksi S, et al. Personalized antibiograms for machine learning driven antibiotic selection. Commun Medicine. 2022;2(1):38.
- Kanjilal S, Oberst M, Boominathan S, Zhou H, Hooper DC, Sontag D. A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection. Sci Transl Med. 2020;12(568):eaay5067.
- Jones N, Shih MC, Healey E, Zhai CW, Advani S, Smith-McLallen A, et al. Use of Machine Learning to Assess the Management of Uncomplicated Urinary Tract Infection. JAMA Netw Open. 2025;8(1):e2456950.
- Burkov A. The Hundred-Page Machine Learning Book. 1st ed. Andriy Burkov; 2019.
- National Database of Antibiotic Resistant Organisms (NDARO) - Pathogen Detection - NCBI [Internet]. [cited 2025 Nov 3]. Available from: https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/
- Alcock BP, Raphenya AR, Lau TTY, Tsang KK, Bouchard M, Edalatmand A, et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 2019;48(D1):D517–25.
- ResistanceMap [Internet]. [cited 2025 Nov 3]. Available from: https://resistancemap.onehealthtrust.org/
- CLSI. M-100 document. 2023.
- American Society of Microbiology. Writing your Paper: Abbreviations and Conventions. Available from: https://journals.asm.org/writing-your-paper#abbreviations
- Kind AJH, Buckingham W. Making Neighborhood Disadvantage Metrics Accessible: The Neighborhood Atlas. New England Journal of Medicine, 2018. 378: 2456-2458. DOI: 10.1056/NEJMp1802313.
- University of Wisconsin School of Medicine and Public Health. 2022 Area Deprivation Index v4.0.1. Downloaded from https://www.neighborhoodatlas.medicine.wisc.edu/ 05/15/2025
- Moore BJ, White S, Washington R, Coenen N, Elixhauser A. Identifying Increased Risk of Readmission and In-hospital Mortality Using Hospital Administrative Data. Méd Care. 2017;55(7):698–705.
Access
Access Policy:
Only credentialed users who sign the DUA can access the files.
License (for files):
PhysioNet Credentialed Health Data License 1.5.0
Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0
Required training:
CITI Data or Specimens Only Research
Discovery
DOI (version 1.0.0):
https://doi.org/10.13026/2r5k-b955
DOI (latest version):
https://doi.org/10.13026/53vq-f537
Topics:
medical informatics
antimicrobial resistance
electronic health records
Project Website:
https://github.com/AlexWei21/Stanford_Data_Harmonization
Corresponding Author
Files
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- sign the data use agreement for the project