Project Structure¶

Repository layout¶

simACE/
├── Snakefile                            # Root entry point (no -s flag needed)
├── config/
│   ├── _default.yaml                    # Default simulation parameters
│   └── {folder}.yaml                    # Per-folder scenario definitions
│
├── simace/                              # Simulation package (pip install -e .)
│   ├── __init__.py                       # Package init
│   ├── config.py                         # Config loading and parameter coercion
│   ├── core/                             # Shared infrastructure
│   │   ├── cli_base.py                   # Shared CLI boilerplate (add_logging_args, init_logging)
│   │   ├── compute_hazard_terms.py       # Baseline hazard functions (Weibull, Gompertz, etc.)
│   │   ├── numerics.py                   # safe_corrcoef, safe_linregress, scipy.special wrappers
│   │   ├── parquet.py                    # save_parquet and parquet I/O helpers
│   │   ├── parquet_to_tsv.py             # `simace-parquet-to-tsv` CLI entry point
│   │   ├── pedigree_graph.py             # Sparse-matrix pedigree relationship extraction
│   │   ├── relationships.py              # Relationship registry (REL_REGISTRY, PAIR_KINSHIP, PAIR_TYPES)
│   │   ├── schema.py                     # Pipeline data-schema contracts (phenotype → censor → sample handoff)
│   │   └── yaml_io.py                    # load_yaml, dump_yaml helpers
│   ├── simulation/
│   │   ├── simulate.py                   # Pedigree simulation (mating, reproduce, run_simulation)
│   │   └── mate_correlation.py           # Assortative-mating helpers
│   ├── phenotyping/
│   │   ├── phenotype.py                  # PhenotypeModel ABC + frailty / cure-frailty / ADuLT / first-passage models
│   │   ├── threshold.py                  # Liability-threshold binary phenotype
│   │   └── hazards.py                    # Baseline-hazard registry (Weibull, exponential, Gompertz, ...)
│   ├── censoring/
│   │   └── censor.py                     # Age-window and competing-risk death censoring
│   ├── sampling/
│   │   ├── dropout.py                    # Pedigree dropout (random individual removal)
│   │   └── sample.py                     # Subsampling with case ascertainment bias
│   ├── analysis/
│   │   ├── stats/                        # Per-concern stats package (split from old stats.py)
│   │   │   ├── runner.py                 # Orchestrator (computes phenotype_stats.yaml)
│   │   │   ├── correlations.py           # Pairwise correlations and parent-offspring regressions
│   │   │   ├── tetrachoric.py            # Tetrachoric correlations + Falconer h²
│   │   │   ├── pedigree.py               # Pair counts and family-structure stats
│   │   │   ├── incidence.py              # Cumulative incidence curves
│   │   │   ├── censoring.py              # Censoring confusion / cascade
│   │   │   └── sampling.py               # Sample-summary statistics
│   │   ├── validate.py                   # Structural + statistical validation
│   │   └── gather.py                     # Gather validation results into validation_summary.tsv
│   └── plotting/
│       ├── plot_utils.py                 # Shared plotting helpers (finalize_plot, violin, heatmap)
│       ├── plot_style.py                 # Color palette and shared style tokens
│       ├── plot_phenotype.py             # Phenotype plot orchestrator + CLI
│       ├── plot_distributions.py         # Mortality, age-at-onset, cumulative incidence
│       ├── plot_liability.py             # Joint liability, violin, affection plots
│       ├── plot_correlations.py          # Tetrachoric + parent-offspring correlations
│       ├── plot_heritability.py          # Heritability plots (by generation, sex, etc.)
│       ├── plot_pedigree_counts.py       # Pedigree relationship pair counts diagram
│       ├── plot_validation.py            # Validation summary plots
│       ├── compare_scenarios.py          # Cross-scenario comparison plots
│       ├── plot_atlas.py                 # Multi-page PDF atlas with figure captions
│       ├── atlas_manifest.py             # Atlas registry + dispatch
│       ├── plot_pipeline.py              # Pipeline DAG diagram
│       └── plot_table1.py                # Epidemiological Table 1
│
├── fitACE/                              # Sister repo with model fitting (gitignored, see Repo Map)
│
├── workflow/
│   ├── common.py                         # Shared helpers (get_param, get_folder, etc.)
│   └── rules/simace/                     # Modular Snakemake rule files
│       ├── targets.smk                   # Target rules: all, scenario, per-stage sentinels
│       ├── simulate.smk, dropout.smk     # Pedigree simulation and dropout
│       ├── phenotype.smk, sample.smk     # Phenotyping and sampling
│       ├── validate.smk, stats.smk       # Validation and statistics
│       ├── examples.smk                  # Example-page targets (minimal-ace, with-c, ...)
│       ├── tskit_preprocess.smk          # tskit founder preprocessing for gene-drop
│       ├── tstrait_phenotype.smk         # tstrait-based phenotype models
│       ├── genotype_drop.smk             # Gene-drop pipeline (tskit-based recombination)
│       └── utils.smk                     # Shared Snakemake utilities
├── scripts/                             # Standalone helper scripts (regen_rulegraph.sh, run_epimight.py, bench_*.py, etc.)
├── tests/                               # Mirrors simace/ sub-package structure
├── external/                            # Reference implementations (gitignored)
├── results/{folder}/{scenario}/         # Per-scenario simulation outputs
├── logs/{folder}/{scenario}/            # Log files
└── benchmarks/{folder}/{scenario}/      # Runtime and memory benchmarks

Repo map¶

simACE is the umbrella working directory; model fitting lives in nested checkouts of sister repos (gitignored from simACE — no submodules):

Repo	Visibility	Local path	Role
`simACE`	public	`.` (this repo)	Simulation pipeline: simulate → phenotype → censor → sample → validate → stats → plot
`fitACE`	private	`./fitACE/`	Model fitting (EPIMIGHT, PA-FGRS, sparseREML, iter_reml, Stan, PCGC). Consumes simACE outputs.
`ace_iter_reml`	private	`./fitACE/fitace/ace_iter_reml/`	C++ PCG-AI-REML binary.
`tetraher_simace`	private	`./external/tetraher_simace/`	Fork of LDAK 6.2 (grouping + warm-start + OMP opt-in).

Each nested repo has its own origin wired to the matching GitHub repo. Build artifacts (build-fp*/, ldak6.2.simace, Stan binaries) are gitignored — rebuild from source.