Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Aug 31, 2025

When a nominal vector has many levels with one observation per level, calling report() on a Kruskal-Wallis test would fail with a data frame construction error because the confidence interval calculation for the effect size would fail when all bootstrapped values are identical.

Problem

The original issue occurred when:

  1. Creating a factor with many levels (e.g., as.factor(1:n))
  2. Each level has only one observation
  3. Running kruskal.test() works fine and is fast
  4. But report() on the result would fail with: "arguments imply differing number of rows: 1, 0"

This happened because effectsize::rank_epsilon_squared() uses bootstrap methods to compute confidence intervals, but when there's one observation per group, all bootstrapped effect sizes equal 1, making CI calculation impossible.

Solution

Added error handling in .report_effectsize_kruskal() to:

  • Catch CI calculation failures due to degenerate cases
  • Fallback to ci = NULL when bootstrap CI fails
  • Report the effect size without confidence intervals in these edge cases
  • Maintain backward compatibility for normal cases

Before and After

Before (fails):

library("report")
n <- 10
set.seed(123)
df <- data.frame(a = as.factor(1:n), b = rnorm(n))
test <- kruskal.test(df$b, df$a)
report(test)  # Error: arguments imply differing number of rows: 1, 0

After (works):

library("report")
n <- 10
set.seed(123)
df <- data.frame(a = as.factor(1:n), b = rnorm(n))
test <- kruskal.test(df$b, df$a)
result <- report(test, data = df)
print(result)
#> [1] "All values of t are equal to  1 \n Cannot calculate confidence intervals"
#> Effect sizes were labelled following Field's (2013) recommendations.
#> 
#> The Kruskal-Wallis rank sum test testing the difference in ranks between df$b
#> and df$a suggests that the effect is statistically not significant, and large
#> (Kruskal-Wallis chi2 = 9.00, p = 0.437; Epsilon squared (rank) = 1.00)

# Performance test with larger n
n <- 100
set.seed(456)  
df_large <- data.frame(a = as.factor(1:n), b = rnorm(n))
test_large <- kruskal.test(df_large$b, df_large$a)

# This now completes quickly instead of taking a long time
system.time({
    result_large <- report(test_large, data = df_large, verbose = FALSE)
})
#> [1] "All values of t are equal to  1 \n Cannot calculate confidence intervals"
#>    user  system elapsed 
#>   0.161   0.002   0.164

Key Improvements

  • Performance: Degenerate cases now complete in <1 second vs. failing before
  • Reliability: No more crashes with data frame construction errors
  • Graceful degradation: Effect size still reported even when CI can't be calculated
  • Backward compatibility: Normal cases continue to work with CI as before

Note: For optimal results, users should provide data manually report(test, data = df) or use the formula interface kruskal.test(y ~ group, data = df) as suggested by existing warning messages in the report package.

Fixes #454.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: rempsyc <13123390+rempsyc@users.noreply.github.com>
Copilot AI changed the title [WIP] Fail to compute report for Kruskal test in some cases Fix Kruskal-Wallis test reporting failure with degenerate cases (one observation per group) Aug 31, 2025
Copilot AI requested a review from rempsyc August 31, 2025 05:40
Copilot finished work on behalf of rempsyc August 31, 2025 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fail to compute report for Kruskal test in some cases

2 participants