r - Count missing values after first recorded measurement -


i have environmental data missing values. measurement of of these variables started @ different years.

with script “sapply(df, function(x) sum(is.na(x)))" number of missing values each column. wish count missing values time point when @ least 1 measurement available. example o3 missing values should 3 time measurement of o3 started. n addition want extract first date when measurement available(example temp on 01-03-1990 , 03 on 09-03-1990). in short wish is:

1.  extract first date of available measurement each column. 2.  count number of missing values after @ least 1 measurement available. 

sample data follows

> dput(df) structure(list(date = structure(c(7364, 7365, 7366, 7367, 7368,  7369, 7370, 7371, 7372, 7373, 7374, 7375, 7376, 7377, 7378, 7379,  7380, 7381, 7382, 7383, 7384), class = "date"), no2 = c(51.7008334795634,  33.8999998569489, 29.7854166030884, 29.0558333396912, 28.5108333031336,  31.9637500842412, 36.1283330917358, 24.6608331998189, 33.2682609558105,  na, na, na, 53.1133330663045, 54.1575004259745, 43.7712502479553,  31.0166666905085, 31.9995832443237, 33.3491666316986, na, na,  35.5604347353396), temp = c(1.12583327293396, 0.230416655540466,  -0.415833324193954, 3.50333333015442, 4.88708353042603, 3.54916667938232,  2.15291666984558, 6.84916687011719, 3.79416656494141, 1.50416672229767,  0.736666679382324, 3.33291673660278, -0.466250002384186, 1.47374999523163,  6.84124994277954, 9.93249988555908, na, na, na, 6.88000011444092,  6.19999980926514), humidity = c(na, 75.1428604125977, 64.375,  na, 82.125, 61.375, 71.5, 68.25, na, 74, 82.375, 82.5, 60.875,  80, 82.625, 88.75, 78.5, 73.125, 68.5, 49.2811088562012, 79.8091659545898 ), o3 = c(na, na, na, na, na, na, na, na, 63.0712509155273, 69.6487503051758,  60.903751373291, na, 72.942497253418, na, na, 66.2587509155273,  78.3262481689453, 101.066246032715, 112.137496948242, 77.0224990844727,  68.5950012207031)), .names = c("date", "no2", "temp", "humidity",  "o3"), row.names = c("60", "61", "62", "63", "64", "65", "66",  "67", "68", "69", "70", "71", "72", "73", "74", "75", "76", "77",  "78", "79", "80"), class = "data.frame") 

to first non-missing value:

first <- sapply(df, function(x) which(!is.na(x))[1]) dateoffirst <- df$date[first] 

and number of na's after first run of na's total number of na's, take away length of initial run

numberofmissing <- sapply(df, function(x) sum(is.na(x))) - (first-1) 

Comments

Popular posts from this blog

javascript - Count length of each class -

What design pattern is this code in Javascript? -

hadoop - Restrict secondarynamenode to be installed and run on any other node in the cluster -