Skip to content

Configuration

Overview

Simulation parameters are defined in YAML files under config/. config/_default.yaml contains the global defaults, while each config/{folder}.yaml file defines scenarios for one output folder. Scenario files use bare scenario names; the folder name is inferred from the filename.

Each scenario inherits the defaults and overrides only the values that differ. The preferred authoring style is the hierarchical schema shown below. The loader still accepts older flat keys such as A1 or censor_age for compatibility, but new configs should use the sectioned form.

Defaults

Top-level globals

Parameter Type Default Description
seed int 42 Base random seed; replicate seeds are derived from this value
replicates int 3 Number of independent replicates per scenario
folder str base Output folder under results/
N int 100000 Population size per generation
G_ped int 6 Recorded pedigree generations
G_pheno int 3 Last G_pheno generations to phenotype
G_sim int 8 Total simulated generations; G_sim - G_ped is burn-in
standardize str global Liability standardization mode: none, global, or per_generation
plot_format str png Plot extension, usually png or pdf
drop_from str / null null Reuse another scenario's pedigree and gene-drop outputs
use_gene_drop bool false Use tstrait-derived A1 instead of parametric A1 downstream

See ACE Model § Standardisation for how standardize interacts with threshold and hazard-bearing models.

Pedigree

pedigree:
  mating_lambda: 0.5
  p_mztwin: 0.02
  assort1: 0
  assort2: 0
  assort_matrix: null
  trait1:
    A: 0.5
    C: 0.0
    E: 0.5
  trait2:
    A: 0.4
    C: 0.2
    E: 0.4
  rA: 0.0
  rC: 0.0
  rE: 0.0
Parameter Description
mating_lambda Zero-truncated Poisson mating-count parameter; default gives about 23% multi-partner individuals
p_mztwin Probability of monozygotic twin birth
assort1, assort2 Mate correlation on trait 1 and trait 2 liability
assort_matrix Optional full 2x2 female/male mate-correlation matrix
trait{1,2}.A Additive genetic variance component
trait{1,2}.C Shared/common environment variance component
trait{1,2}.E Unique environment variance component
rA, rC, rE Cross-trait correlations for A, C, and E

Phenotype

Each trait is configured independently under phenotype.trait1 and phenotype.trait2:

phenotype:
  trait1:
    model: frailty
    params:
      distribution: weibull
      scale: 2160
      rho: 0.8
    beta: 1.0
    beta_sex: 0.0
  trait2:
    model: frailty
    params:
      distribution: weibull
      scale: 333
      rho: 1.2
    beta: 1.5
    beta_sex: 0.0

model must be one of frailty, cure_frailty, adult, or first_passage. params is model-specific; threshold-based families (adult and cure_frailty) require params.prevalence. See Phenotype Models for the full model catalogue, required parameters, supported prevalence forms, and standardize_hazard rules.

Censoring

censoring:
  max_age: 80
  gen_censoring:
    0: [80, 80]
    1: [80, 80]
    2: [80, 80]
    3: [40, 80]
    4: [0, 80]
    5: [0, 45]
  death_scale: 164
  death_rho: 2.73
Parameter Description
max_age Maximum follow-up age
gen_censoring Per-generation [left, right] observation windows
death_scale, death_rho Weibull competing-risk mortality parameters

Sampling and analysis

sampling:
  N_sample: 0
  case_ascertainment_ratio: 1
  pedigree_dropout_rate: 0

analysis:
  max_degree: 2
  estimate_inbreeding: false
Parameter Description
sampling.N_sample Subsample size; 0 keeps all phenotyped individuals
sampling.case_ascertainment_ratio Case sampling weight relative to controls
sampling.pedigree_dropout_rate Fraction of individuals removed from the pedigree before downstream stages
analysis.max_degree Maximum relationship degree to extract
analysis.estimate_inbreeding Compute exact inbreeding coefficients and exact pairwise kinship

tstrait and gene drop

The gene-drop branch replaces the parametric trait-1 additive component with a tstrait-derived genetic value computed from founder haplotypes dropped through the simACE pedigree.

tstrait:
  num_causal: 1000
  frac_causal: null
  maf_threshold: 0.01
  alpha: -0.5
  effect_mean: 0.0
  effect_var: 1.0
  trait_id: 0
  share_architecture: false
Parameter Description
use_gene_drop Top-level switch that makes downstream stages read pedigree.full.tstrait.parquet
drop_from Top-level scenario name to reuse an existing drop/graft
tstrait.num_causal Absolute number of causal sites; mutually exclusive with frac_causal
tstrait.frac_causal Fraction of MAF-eligible sites to use as causal; mutually exclusive with num_causal
tstrait.maf_threshold Minimum minor-allele frequency filter; 0 disables filtering
tstrait.alpha Effect-size frequency-dependence exponent
tstrait.effect_mean, tstrait.effect_var Raw effect-size distribution parameters before MAF scaling
tstrait.trait_id Single-trait selector; trait 2 remains parametric
tstrait.share_architecture Share causal sites and effects across replicates

The gene-drop branch derives heritability from the standard A/C/E components: \(h^2 = A_1 / (A_1 + C_1 + E_1)\). There is no separate tstrait.h2 parameter.

tskit_preprocess is a standalone top-level block for canonicalizing source tree sequences:

Parameter Default Description
tskit_preprocess.source_dir /data/Documents/humanity_sim/simhumanity_trees_RO Source directory for per-chromosome SimHumanity .trees files
tskit_preprocess.output_dir /data/Documents/humanity_sim/preprocessed_p2 Output directory for canonicalized chromosomes, concatenated trees, and site catalog
tskit_preprocess.pop p2 Founder population to filter
tskit_preprocess.chroms 1..22 Autosomes to include

Defining scenarios

Per-folder files contain only scenario dictionaries. For example, config/base.yaml:

baseline10K:
  seed: 1042
  N: 10000

baseline100K_sample5K:
  seed: 2042
  N: 100000
  sampling:
    N_sample: 5000

Nested sections are merged over the defaults, so a scenario can override only one field inside a section:

high_heritability:
  folder: heritability
  seed: 4042
  pedigree:
    trait1:
      A: 0.8
      C: 0.0
      E: 0.2
    trait2:
      A: 0.8
      C: 0.0
      E: 0.2

Run a scenario by targeting its resolved folder and scenario name:

snakemake --cores 4 results/base/baseline10K/scenario.done