countNa computes the number of missing values in a data frame. It counts the number of missings in each column, the number of rows in which a value in at least one columns is missing, and the expected number of rows with at least one missing value (computed under the assumption of independence of missingness in individual columns). Number of rows left are also given. Optionally, combinations of columns reaching highest joint missingness is also reported (if combColCount > 1 and the number of columns in x is at least 2).

countNa(d, sort = TRUE, decreasing = FALSE, combColCount = 3)

Arguments

d

a data frame

sort

sort columns of 'x' by the number of missings?

decreasing

if sorting by the number of missing, should the sort be decreasing or increasing?

combColCount

maximum number of columns to combine when finding a combination of columns reaching the highest number of missings

Value

A data frame (or a list of two data frames) describing the missingness. The first data frame consists of rows describes the missingness in individual columns, plus the missingness in the combination of all columns (in a row called 'any'), plus the average missingness (in a row called 'average'). The second data frame (if requested) describes the combinations of at most combColCount columns of x reaching highest joint missingness.

Examples

d<-data.frame(x1=1,x2=2,x3=1:4,y=c(1,NA,2,NA),z=c(NaN,NaN,3,4)) d
#> x1 x2 x3 y z #> 1 1 2 1 1 NaN #> 2 1 2 2 NA NaN #> 3 1 2 3 2 3 #> 4 1 2 4 NA 4
countNa(d)
#> $columns #> names missing missingPercent left leftPercent #> 1 x1 0.0000000 0.00000 4.000000 100.00000 #> 2 x2 0.0000000 0.00000 4.000000 100.00000 #> 3 x3 0.0000000 0.00000 4.000000 100.00000 #> 4 y 2.0000000 50.00000 2.000000 50.00000 #> 5 z 2.0000000 50.00000 2.000000 50.00000 #> 6 any 3.0000000 75.00000 1.000000 25.00000 #> 7 expected 0.9685669 24.21417 3.031433 75.78583 #> #> $columnCombinations #> missing cols #> 1 3 z+y #> 2 3 x1+z+y #> 3 2 x1+y #> 4 2 x1+z #>