Statistical Analysis Domain

Overview

The statistical analysis domain provides hypothesis testing, ANOVA, non-parametric tests, and experimental design tools. Use it when you need to determine whether observed differences between groups are statistically significant, quantify effect sizes, or design experiments with adequate statistical power.

When to use this domain:

Comparing means or distributions between two or more groups
Testing associations between categorical variables
Checking whether data meets normality assumptions before other analyses
Quantifying the practical magnitude of a difference (effect size)
Estimating required sample sizes for planned studies

Source: src/localdata_mcp/domains/statistical_analysis/

Available Analyses

Method	Class / Function	Description
One-sample t-test	`HypothesisTestingTransformer`	Test whether a sample mean differs from a known value
Independent t-test	`HypothesisTestingTransformer`	Compare means from two independent groups
Paired t-test	`HypothesisTestingTransformer`	Compare means from two related measurements
Chi-square test	`HypothesisTestingTransformer`	Test independence between two categorical variables
Normality tests	`HypothesisTestingTransformer`	Shapiro-Wilk and Kolmogorov-Smirnov normality checks
Pearson / Spearman correlation	`HypothesisTestingTransformer`	Test linear and rank correlations between numeric variables
One-way ANOVA	`ANOVAAnalysisTransformer`	Compare means across three or more groups
Two-way ANOVA	`ANOVAAnalysisTransformer`	Test main effects and interactions of two factors
Tukey HSD post-hoc	`ANOVAAnalysisTransformer`	Pairwise comparisons after significant ANOVA
Bonferroni post-hoc	`ANOVAAnalysisTransformer`	Conservative pairwise corrections
Mann-Whitney U	`NonParametricTestTransformer`	Non-parametric two-group comparison
Wilcoxon signed-rank	`NonParametricTestTransformer`	Non-parametric paired comparison
Kruskal-Wallis H	`NonParametricTestTransformer`	Non-parametric multi-group comparison
Friedman test	`NonParametricTestTransformer`	Non-parametric repeated-measures test
Cohen’s d	`ExperimentalDesignTransformer`	Standardized mean difference effect size
Eta-squared / Omega-squared	`ANOVAAnalysisTransformer`	ANOVA effect size measures
Cramer’s V	`HypothesisTestingTransformer`	Effect size for chi-square associations
Confidence intervals	`ExperimentalDesignTransformer`	Interval estimates for means and correlations
Power analysis	`ExperimentalDesignTransformer`	Required sample size for a given power level

MCP Tool Reference

The domain exposes three MCP tools via src/localdata_mcp/datascience_tools.py.

`tool_hypothesis_test`

Run hypothesis tests on data retrieved from a SQL query.

Parameters:

Parameter	Type	Default	Description
`engine`	`Engine`	required	SQLAlchemy engine from an active connection
`query`	`str`	required	SQL query returning the data to analyse
`test_type`	`str`	`"auto"`	Test to run: `"auto"`, `"ttest_1samp"`, `"ttest_ind"`, `"ttest_rel"`, `"chi2"`, `"normality"`, `"correlation"`
`column`	`str`	`None`	Target numeric column for focused testing
`group_column`	`str`	`None`	Column defining groups for two-sample tests
`alpha`	`float`	`0.05`	Significance level
`alternative`	`str`	`"two-sided"`	Direction: `"two-sided"`, `"less"`, `"greater"`
`max_rows`	`int`	`None`	Row cap (default 500,000)

Returns: dict with keys:

test_results — list of test result objects, each with test_name, statistic, p_value, degrees_of_freedom, effect_size, interpretation
assumptions_checked — dict of assumption check results
effect_sizes — dict of calculated effect sizes
alpha_level — significance level used
correction_applied — multiple comparison correction method if any

`tool_anova_analysis`

Perform one-way or two-way ANOVA with post-hoc tests.

Parameters:

Parameter	Type	Default	Description
`engine`	`Engine`	required	SQLAlchemy engine
`query`	`str`	required	SQL query returning the data
`dependent_var`	`str`	required	Numeric outcome column
`group_var`	`str`	required	Categorical grouping column
`alpha`	`float`	`0.05`	Significance level
`max_rows`	`int`	`None`	Row cap

Underlying ANOVAAnalysisTransformer also accepts:

Parameter	Type	Default	Description
`anova_type`	`str`	`"one_way"`	`"one_way"`, `"two_way"`, `"auto"`
`post_hoc`	`str`	`"tukey"`	`"tukey"`, `"bonferroni"`, `"scheffe"`, `None`
`effect_size`	`str`	`"eta_squared"`	`"eta_squared"`, `"partial_eta_squared"`, `"omega_squared"`
`check_assumptions`	`bool`	`True`	Run Shapiro-Wilk and Levene’s tests before ANOVA

Returns: dict with keys:

anova_results — F-statistic, p-value, degrees of freedom, group means and sizes, interpretation
post_hoc_results — pairwise comparisons with adjusted p-values and confidence intervals
effect_sizes — eta-squared and omega-squared per factor
assumptions_checked — normality and homoscedasticity check results

`tool_effect_sizes`

Calculate standardized effect sizes (Cohen’s d, Cramer’s V, correlation r).

Parameters:

Parameter	Type	Default	Description
`engine`	`Engine`	required	SQLAlchemy engine
`query`	`str`	required	SQL query
`column`	`str`	required	Numeric column to analyse
`group_column`	`str`	required	Categorical grouping column
`max_rows`	`int`	`None`	Row cap

Returns: dict with keys:

effect_sizes — Cohen’s d (two-group) or Cramer’s V (categorical), with effect_description (negligible, small, medium, large)
confidence_intervals — interval estimates for means and correlations
power_analysis — power curve across sample sizes for the observed effect
sample_sizes — required n per group for small/medium/large effects

Method Details

T-tests

One-sample t-test (ttest_1samp): Tests whether the mean of a single sample differs from a hypothesised population mean (default 0). Requires at least 3 observations.

Independent t-test (ttest_ind): Compares means from two separate groups. Requires a numeric column and a binary categorical grouping column. The equal_var parameter (default True) switches between Student’s t and Welch’s correction.

Paired t-test (ttest_rel): Compares two numeric columns measured on the same subjects. Cohen’s d is computed from the paired differences.

Effect size interpretation for Cohen’s d:

Range	Label
< 0.2	negligible
0.2 – 0.5	small
0.5 – 0.8	medium
≥ 0.8	large

Chi-square Test

Tests whether two categorical variables are independent. A contingency table is constructed automatically. Cramer’s V is reported as the effect size.

Effect size interpretation for Cramer’s V (2×2 table):

Range	Label
< 0.1	negligible
0.1 – 0.3	small
0.3 – 0.5	medium
≥ 0.5	large

Normality Tests

Two tests run in parallel:

Shapiro-Wilk — preferred for n ≤ 5,000; sensitive to small departures from normality in large samples
Kolmogorov-Smirnov — used for all sample sizes; slightly less powerful than Shapiro-Wilk for small samples

If p > alpha, data is treated as approximately normal.

ANOVA

One-way ANOVA: Tests whether at least one group mean differs from the others. Uses scipy.stats.f_oneway. Assumptions are checked automatically (normality via Shapiro-Wilk per group; homoscedasticity via Levene’s test).

Post-hoc tests run only when ANOVA is significant and there are more than two groups:

Tukey HSD — controls familywise error rate; appropriate when group sizes are roughly equal
Bonferroni — more conservative; better when only a few specific comparisons are planned
Scheffé — most conservative; appropriate for all possible contrasts

Two-way ANOVA: Uses statsmodels OLS with interaction term. Reports eta-squared and partial eta-squared per factor.

Effect size interpretation for eta-squared:

Range	Label
< 0.01	negligible
0.01 – 0.06	small
0.06 – 0.14	medium
≥ 0.14	large

Non-Parametric Tests

Use these when normality assumptions are violated or data is ordinal.

Mann-Whitney U: Non-parametric alternative to the independent t-test. Effect size is rank-biserial correlation r.

Wilcoxon signed-rank: Non-parametric alternative to the paired t-test. Requires at least 6 paired observations.

Kruskal-Wallis H: Non-parametric alternative to one-way ANOVA. Effect size is an eta-squared analogue.

Friedman test: Non-parametric repeated-measures test across three or more conditions. Effect size is Kendall’s W.

Confidence Intervals

Computed using t-distribution critical values for means and Fisher’s z-transformation for correlations. Default confidence level is 95%.

Power Analysis

Power curves are calculated for sample sizes from 10 to 500. Required sample size is solved analytically for the desired power (default 0.80) at alpha = 0.05. Supports t-test, ANOVA, and correlation test types.

Composition

After running statistical analysis, consider chaining:

Next step	Purpose
`regression_modeling`	Model the relationship quantified by a significant correlation or group difference
`pattern_recognition` (clustering)	Explore whether statistically different groups correspond to natural data clusters
`business_intelligence` (A/B test)	Frame a group comparison as a controlled experiment with business metrics
`sampling_estimation` (bootstrap)	Obtain distribution-free confidence intervals when normality is violated

The test_results list and effect_sizes dict from this domain pass naturally into regression feature selection and experimental design planning.

Examples

Basic hypothesis test on sales data

result = tool_hypothesis_test(
    engine=engine,
    query="SELECT revenue, region FROM sales WHERE year = 2024",
    test_type="ttest_ind",
    column="revenue",
    group_column="region",
    alpha=0.05,
)

# Inspect the first test result
first = result["test_results"][0]
print(first["test_name"])        # "Independent t-test (revenue by region)"
print(first["p_value"])          # e.g. 0.0023
print(first["effect_size"])      # Cohen's d
print(first["interpretation"])   # "Significant difference between groups (medium effect)"

ANOVA across multiple product categories

result = tool_anova_analysis(
    engine=engine,
    query="SELECT satisfaction_score, product_category FROM survey",
    dependent_var="satisfaction_score",
    group_var="product_category",
    alpha=0.05,
)

# Check overall significance
for key, anova in result["anova_results"].items():
    print(f"{key}: F={anova['f_statistic']:.3f}, p={anova['p_value']:.4f}")

# Inspect post-hoc comparisons
for key, post_hoc in result["post_hoc_results"].items():
    for comp in post_hoc["comparisons"]:
        if comp["significant"]:
            print(f"{comp['group1']} vs {comp['group2']}: p={comp['p_value']:.4f}")

Effect size calculation before running a study

# Step 1: estimate effect size from pilot data
effect_result = tool_effect_sizes(
    engine=engine,
    query="SELECT conversion, variant FROM pilot_experiment",
    column="conversion",
    group_column="variant",
)

# Step 2: use power analysis to determine required sample size
power = effect_result["power_analysis"]["ttest"]
print(f"Required n per group: {power['required_sample_size']}")
print(power["interpretation"])

Multi-step workflow: normality check then appropriate test

# 1. Check normality first
normality = tool_hypothesis_test(
    engine=engine,
    query="SELECT response_time FROM api_logs",
    test_type="normality",
)

is_normal = all(
    r["p_value"] > 0.05
    for r in normality["test_results"]
    if "Shapiro" in r["test_name"]
)

# 2. Choose parametric or non-parametric test accordingly
query = "SELECT response_time, server_zone FROM api_logs"
if is_normal:
    result = tool_hypothesis_test(
        engine=engine, query=query,
        test_type="ttest_ind",
        column="response_time",
        group_column="server_zone",
    )
else:
    # Use NonParametricTestTransformer directly
    from localdata_mcp.domains.statistical_analysis import NonParametricTestTransformer
    import pandas as pd
    df = pd.read_sql(query, engine)
    transformer = NonParametricTestTransformer(test_type="mann_whitney", alpha=0.05)
    transformer.fit(df)
    result = transformer.transform(df).iloc[0].to_dict()

Statistical Analysis Domain

Overview

Available Analyses

MCP Tool Reference

tool_hypothesis_test

tool_anova_analysis

tool_effect_sizes

Method Details

T-tests

Chi-square Test

Normality Tests

ANOVA

Non-Parametric Tests

Confidence Intervals

Power Analysis

Composition

Examples

Basic hypothesis test on sales data

ANOVA across multiple product categories

Effect size calculation before running a study

Multi-step workflow: normality check then appropriate test

`tool_hypothesis_test`

`tool_anova_analysis`

`tool_effect_sizes`