Database Schema

Data Dictionary

This document includes the details of every table and field in the
database. The details of possible data types are described in
Data Types.

 

Relationships between tables in the SHARED schema.

 

Patient

A study participant

Req’d Name Data Type Description
patient_id uuid Unique identifier
sex enum(male, female, other) The participant’s sex at birth
ethnicity enum(am-nat, asian, black, hisp, pa-isl, white) The participant’s ethnicity: American Native, Asian, Black/African American, Hispanic/Latino, Pacific Islander, or White
year_of_birth integer Participant’s year of birth (e.g: 1985)

Data Source

A collaborating site (provides participant data)

Req’d Name Data Type Description
source_id uuid Unique identifier
collaborator_name string Name of the collaborator
country VARCHAR(2) ISO 3166-1 alpha-2 code
submission year Integer submission year
data merger Integer

 

Cases

A collection of related data from a study participant

Req’d Name Data Type Description
case_id uuid Unique identifier
patient_id foreign key (Patient) The person who contributed the data in this case.
source_id foreign key (DataSource) The study the participant entered the database with
country string The country where the person participated in the study (as an ISO 3166-1 alpha-2 code)
study_participant_id string The unique identifier that the source study gave to the person contributing this data
data_modified bool Has the data been modified?
upload_date date date when the data is uploaded

LossToFollowUp

Records data about participants leaving the study

Req’d Name Data Type Description
ltfu_id uuid unique identifier
case_id foreign key (Case) A collection of related data from a study participant
ltfu_year integer The year the participant was lost to follow-up (e.g: 2012)
died bool Is the participant deceased?
cod enum(liv, aid, odo, can, cir, res, dia, gen, tra, cer, dig, oth) Cause of death (if applicable; blank otherwise)

BehaviorData

Behavioral information about a participant

Req’d Name Data Type Description
behaviordata_id uuid Unique identifier
case_id foreign key (Case) A collection of related data from a study participant
sex_ori enum (heterosexual, homosexual, bisexual, other) Participant’s sexual orientation
idu bool Injection drug use? (ever)
ndu bool Non-injection drug use? (ever)
idu_recent bool Injection drug use in the past 6 months?
ndu_recent bool Non-injection drug use in the past 6 months?
prison bool Has the participant been in prison (ever)?

 

ClinicalData

Participants’ test results and relevant medical history

Req’d Name Data Type Description
clinicaldata_id uuid Unique identifier
case_id foreign key(Case) A collection of related data from a study participant
clinical_kind string (bl, eot,fw4 .etc) Is this clinical data from baseline or follow-up?
hiv bool Is the participant co-infected with HIV?
hbv bool Is the participant co-infected with HBV?
vl float Viral Load (in IU/mL)
ost bool Has the participant undergone opioid substitution therapy in the last six months?
cirr bool Does the participant have cirrhosis?
fibrosis enum(F0, F0-F1, F1, F2, F3, F4) METAVIR fibrosis score
inflammation enum(A0, A1, A2, A3) METAVIR inflammation score
metavir_by enum(fibroscan, biopsy, clinical, other) Method used to determine metavir score:
stiff float Liver stiffness (in kPa).
alt float Alanine aminotransferase level (in U/L).
ast float Aspartate aminotransferase level (in U/L).
crt float Creatinine level (in mg/dL).
egfr float Estimated glomerular filtration rate (in mL/min).
ctp float Child-Turcotte-Pugh score.
meld float MELD score.
ishak float Ishak score.
bil float Bilirubin test result, in mg/dL.
hemo float Hemoglobin test result, in g/dL.
alb float Albumin test result, in g/dL.
inr float International Normalized Ratio test result.
phos float Phosphate test result, in mg/dL.
urea float Urea test result, in ng/dL.
plate float Platelet count, in cells/mm3.
CD4 float CD4 count, in cells/mm3.
crp float C-Reactive Protein test result, in mg/L.
il28b enum(TT,CT,CC) The participant’s IL28B-rs12979860 genotype: TT, CT, CC
asc bool Did the participant have ascites before or during treatment?
var_bleed bool Did the participant have variceal bleeding before or during treatment?
hep_car bool Did the participant have hepatocellular carcinoma before or during treatment?
transpl bool Has the participant had a liver transplant?

 

TreatmentHistory

Information about a participant’s treatment

Req’d Name Data Type Description
treatmenthistory_id uuid Unique identifier
case_id foreign key (Case) A collection of related data from a study participant
clinical_kind string (bl, eot,fw4 .etc) Is this clinical data from baseline or follow-up?
treatment_status enum(TN, TE)

treatment status associated with submitted sequence

TN: treatment naïve(first time treatment);

TE: treatment experienced

duration_act integer The treatment’s actual duration (in days), if different from the scheduled duration
current_regimen_id foreign key (Regimen) The drug regimen taken by the participant
current_treatment_response enum(SVR, NR, EOT, BT, RL, RI, VF, DC) Viral response: sustained viral response, non-responsive, detectable viral load at end-of-treatment, viral-breakthrough during treatment, viral relapse, reinfection, virological failure, or discontinued
prev_regimen_id foreign key (Regimen) If the participant has been treated before, what is the last regimen they were on?
prev_regimen_duration integer Previous treatment’s actual duration (in days)
pprev_regimen_id foreign key (Regimen) If the participant has been treated before, what regimen were they on before-last?
pprev_regimen_duration integer Previous previous treatment’s actual duration (in days)

 

Regimen

A collection of kinds and ammounts of drugs used in treatment

Req’d Name Data Type Description
regimen_id string DAA components name, e.g. SOF+LDV
name string The trade name of this treatment, if applicable, e.g. Harvoni
duration_planned integer planned duration of the regimen (in days)

 

RegimenDrugInclusion

The drugs in a regimen, and their doses and durations

Req’d Name Data Type Description
DAA_id enum(ASV, BOC, DCV, DAS, DAS_XR, EBR, GLP, GZR, LDV, NPV, OMB, PAR, PEG, PIB, RBV, RIT, SIM, SOF, TVR, VAN, VEL, VOX, other, none) The three letter code for a DAA component.
regimen_id foreign key(Regimen) DAA components name, e.g. SOF+LDV
dose float Dosage of the medication prescribed (in mg)
frequency enum(QD, BID, TID, QID, QWK) How often is this medication taken
duration integer planned duration of the DAA

 

ClinicalIsolate

Isolate information

Req’d Name Data Type Description
clinical_isolate_id uuid unique identifier
case_id foreign key (Case) A collection of related data from a study participant
seq_kind string (bl, eot, fw4 ,vf.etc) Whether this isolate is from a baseline sample, an end-of-treatment-sample, or a follow up sample 4, 12, or 24 weeks after end-of-treatment.
clinical_gt string genotype as determined by clinical array or by submitting collaborator
clinical_subtype string subtype as determined by clinical array or by submitting collaborator

Sequence

Sequences and data needed for rapid alignment

Req’d Name Data Type Description
seq_id uuid Unique identifier
clinical_isolate_id foreign key (ClinicalIsolate) Clinical_isolate the sequence was obtained from
seq_genotype enum(1, 2, 3, 4, 5, 6, mixed, recombinant, indeterminate) sequence derived genotype
seq_subgenotype string sequence derived sub-genotype
seq_method enum(sanger, ngs) The sequencing method used on this isolate
cutoff float The cutoff-fraction used to generate a consensus sequence; ‘5%’ is stored as ‘0.05’
raw_nt_seq string The raw nucleotide in the assembled sequence
amino_acid_seq string amino acid sequence derived from raw_nt_seq

 

ReferenceSequence

Reference nucleotide sequences used for alignment

Req’d Name Data Type Description
reference_id string The reference sequence’s genotype
reference_strain string strain of reference sequence
genebank string

Genbank accession number

e.g. NC_004102

length_bp integer length
pos_NS3-4a_start integer start position
pos_NS3-4a_end integer end position
pos_NS5a_start integer start position
pos_NS5a_end integer end position
pos_NS5b_start integer start position
pos_NS5b_end integer end position
genome-nt-seq string Raw nucleotide sequence of whole genome
ns3-nt-seq string ns3 nucleotide sequence
ns5a-nt-seq string ns5a nucleotide sequence
ns5b-nt-seq string ns5b nucleotide sequence
ns3-aa-seq string ns3 amino acid sequence
ns5a-aa-seq string ns5a amino acid sequence
ns5baa-seq string ns5b amino acid sequence

 

Alignment

A gene found in a sequence

Req’d Name Data Type Description
alignment_id uuid Unique identifier
seq_id foreign key (Sequence) The sequence used for alignment
reference_id foreign key (ReferenceSequence) The reference sequence this alignment was performed with respect to
nt_start integer The first position of the alignment in the raw nucleotide sequence
nt_end integer The last position of the alignment in the raw nucleotide sequence
target_gene string The name of the gene this alignment is in

 

Substitution

A substitution, insertion, or deletion in an RNA sequence

Req’d Name Data Type Description
alignment_id foreign key (Alignment) The alignment this substitution is found in
position integer Nucleotide position (with respect to the reference sequence)
kind enum (simple, insertion, deletion) Kind of subsitution (‘simple’ for a single nucleotide polymorphism, insertion or deletion for gaps)
sub_aa string For a simple substitution, the new amino acid (blank for insertions and deletions)
insertion string For an insertion, the inserted Amino Acid sequence (blank for SNPs and deletions)
deletion_length integer For a deletion, the number of deleted amino acids (blank for SNPs and insertions)

 

Susceptibility

Susceptibility test results

Req’d Name Data Type Description
susceptibility_id uuid Unique id
seq_id foreign key (Sequence) TODO! The sequence being tested
susc_method enum(luciferase, qpcr, bdna, beta-gal) Method used to measure susceptibility
medication enum(ASV, BOC, DCV, DAS, EBR, GLP, GZR, LDV, OMB, PAR, PEG, PIB, RBV, RIT, SIM, SOF, TVR, VAN, VEL, VOX) The three letter id of the medication being tested
result float Concentration of medication required for inhibition (in nM)
result_bound enum (<, =, >)
ic enum (50, 90, 95) percent inhibition
fold float Fold-change compared to wild type
fold_bound enum (<, =, >) Represents uncertainty in the fold-change measurement

 

Literature

A reference to a publication

Req’d Name Data Type Description
ref_lit_id uuid Unique id
author string
title string
journal string
year string Year of publication
url string URL to the reference online
publication_dt date Null for unpublished results
pubmed_id string

 

SusceptibilityReference

Marks the reference supporting a susceptibility result

Req’d Name Data Type Description
susceptibility_id foreign key (Susceptibility) The marked susceptibility
ref_lit_id foreign key (Literature) The associated literature reference