simace.core¶
schema¶
simace.core.schema
¶
Schema contracts for the phenotype → censor → sample handoff.
Each pipeline stage produces a DataFrame whose shape the next stage relies on:
PEDIGREE — output of simulate / dropout PHENOTYPE — output of run_phenotype (PEDIGREE + raw event times) CENSORED — output of run_censor / run_sample (PHENOTYPE + censoring cols)
Dtypes are checked at the coarse numpy.dtype.kind level (i integer,
f float, b bool). This tolerates the int32/int8/float32 narrowing
applied by the parquet writer at save time without losing the contract.
assert_schema
¶
Verify df carries every column in schema with a compatible dtype kind.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
DataFrame to check.
TYPE:
|
schema
|
Mapping of required column name → allowed
TYPE:
|
where
|
Stage label included in the error message (e.g.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If columns are missing or have an unexpected dtype kind. Extra columns are allowed — stages are free to pass through additional fields. |
Source code in simace/core/schema.py
parquet¶
simace.core.parquet
¶
Parquet writer with pedigree-aware dtype narrowing.
save_parquet
¶
Save DataFrame as parquet with optimized dtypes and zstd compression.
Calls _optimize_dtypes before writing to minimize file size.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
DataFrame to save.
TYPE:
|
path
|
Output file path.
TYPE:
|
**kwargs
|
Extra keyword arguments passed to
TYPE:
|
Source code in simace/core/parquet.py
yaml_io¶
simace.core.yaml_io
¶
YAML serialization helpers: numpy → Python conversion, fast loader, and file I/O wrappers.
to_native
¶
Recursively convert numpy types to native Python types for YAML serialization.
| PARAMETER | DESCRIPTION |
|---|---|
obj
|
Value or nested structure (dict, list, ndarray, numpy scalar).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
Equivalent structure with all numpy types replaced by Python builtins. |
Source code in simace/core/yaml_io.py
yaml_loader
¶
load_yaml
¶
dump_yaml
¶
Dump obj to path as YAML, normalizing numpy types via to_native.
Source code in simace/core/yaml_io.py
numerics¶
simace.core.numerics
¶
Numerical helpers: safe and numba-accelerated correlation/regression.
safe_corrcoef
¶
Compute Pearson correlation, returning nan if either array has zero variance.
| PARAMETER | DESCRIPTION |
|---|---|
x
|
First array of observations.
TYPE:
|
y
|
Second array of observations, same length as x.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
Pearson correlation coefficient, or nan if variance is near-zero. |
Source code in simace/core/numerics.py
safe_linregress
¶
Run linear regression, returning None if x has zero variance.
| PARAMETER | DESCRIPTION |
|---|---|
x
|
Independent variable array.
TYPE:
|
y
|
Dependent variable array, same length as x.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
|
Source code in simace/core/numerics.py
fast_linregress
¶
Fast linear regression via numba-accelerated core.
| PARAMETER | DESCRIPTION |
|---|---|
x
|
Independent variable array.
TYPE:
|
y
|
Dependent variable array, same length as x.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple[float, float, float, float, float]
|
Tuple of (slope, intercept, r, stderr, pvalue). |
Source code in simace/core/numerics.py
fast_pearsonr
¶
Compute Pearson r with two-sided p-value via numba-accelerated core.
| PARAMETER | DESCRIPTION |
|---|---|
x
|
First array of observations.
TYPE:
|
y
|
Second array of observations, same length as x.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple[float, float]
|
Tuple of (correlation, p-value). |
Source code in simace/core/numerics.py
relationships¶
simace.core.relationships
¶
Relationship-pair and sex vocabulary used across stats and plotting.
compute_hazard_terms¶
simace.core.compute_hazard_terms
¶
Baseline hazard computation for parametric survival models.
compute_hazard_terms
¶
Compute log-baseline-hazard and cumulative baseline hazard.
Returns (const, H_base) where: const = log h0(t) — event term: delta * (const + betaL) H_base = H0(t) — survival term: H_base * exp(betaL)
Supported models and their required params
"weibull" : {"scale": s, "rho": rho} "exponential" : {"rate": lam} or {"scale": s} "gompertz" : {"rate": b, "gamma": g} "lognormal" : {"mu": mu, "sigma": sigma} "loglogistic" : {"scale": alpha, "shape": k} "gamma" : {"shape": k, "scale": theta} "first_passage": {"drift": mu, "shape": lam}
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
unknown model name or missing required parameter. |
Source code in simace/core/compute_hazard_terms.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
cli_base¶
simace.core.cli_base
¶
Shared CLI boilerplate for simace entry points.
add_logging_args
¶
Add standard -v/--verbose and -q/--quiet arguments.
| PARAMETER | DESCRIPTION |
|---|---|
parser
|
Argument parser to add logging flags to.
TYPE:
|
Source code in simace/core/cli_base.py
init_logging
¶
Derive log level from parsed args and call setup_logging().
| PARAMETER | DESCRIPTION |
|---|---|
args
|
Parsed namespace containing
TYPE:
|
Source code in simace/core/cli_base.py
parquet_to_tsv¶
simace.core.parquet_to_tsv
¶
Convert parquet files to TSV (optionally gzipped) for use in R.
convert
¶
Read a parquet file and write it as a TSV.
| PARAMETER | DESCRIPTION |
|---|---|
parquet_path
|
Path to the input
TYPE:
|
output_path
|
Path for the output file. If None, replaces the
TYPE:
|
float_precision
|
Number of decimal places for float columns.
TYPE:
|
gzip
|
Whether to gzip-compress the output.
TYPE:
|
Source code in simace/core/parquet_to_tsv.py
cli
¶
Command-line interface: parquet-to-tsv.