In the code below, we highlight the basic estimation procedures used to compute population estimates of forest attributes from Forest Inventory and Analysis (FIA) data. We will demonstrate these procedures with the fiaRI
dataset (included in the rFIA
package) so that you can follow along. Download the R script for these examples here.
Our goal here is to estimate total tree biomass, total treecarbon (aboveground) and total forested area in the state of Rhode Island for the year 2018. From these totals, we will compute ratios of average tree biomass/ forested acre and average tree carbon/ forested acre. We will do this with and without sampling errors (without being much simpler), and show you how we handle grouped estimates in both cases. All estimates will be computed for live trees.
rFIA
will vary slightly from that presented below as we designed rFIA
to be as flexible and computationally efficient as possible. Despite the differences in syntax and structure, the estimation procedures presented below are identical to those in rFIA
. You can find and download the full source code for rFIA
from our GitHub Repo.
Data Preparation
First, let’s load some packages and the fiaRI
dataset:
# Load some packages
library(rFIA)
library(dplyr)
# Load the fiaRI dataset (included in rFIA)
data(fiaRI)
db <- fiaRI
To compute population estimates of current area and current biomass/carbon from the FIADB, we need to identify and subset the necessary portions. To do this, we will use what FIA calls an EVALID, hence the name ‘EVALIDator’.
ids <- db$POP_EVAL %>%
select('CN', 'END_INVYR', 'EVALID') %>%
inner_join(select(db$POP_EVAL_TYP, c('EVAL_CN', 'EVAL_TYP')), by = c('CN' = 'EVAL_CN')) %>%
## Now we filter out everything except current area and
## current volume ids
filter(EVAL_TYP %in% c('EXPCURR', 'EXPVOL'))
## Now that we have those EVALIDs, let's use clipFIA to subset
db <- clipFIA(db, evalid = ids$EVALID)
## Warning in clipFIA(db, evalid = ids$EVALID): Returning subset by EVALID only.
## For most recent subset, specify `evalid = NULL` and `mostRecent = TRUE`.
Since we need some information stored in each of these tables to compute estimates, we will join them into one big dataframe (let’s call that data
) that we can operate on.
## Select only the columns we need from each table, to keep things slim
PLOT <- select(db$PLOT, CN, MACRO_BREAKPOINT_DIA)
COND <- select(db$COND, PLT_CN, CONDID, CONDPROP_UNADJ, PROP_BASIS, COND_STATUS_CD, OWNGRPCD)
TREE <- select(db$TREE, PLT_CN, CONDID, SUBP, TREE, STATUSCD, DRYBIO_AG, CARBON_AG, TPA_UNADJ, DIA, SPCD)
POP_ESTN_UNIT <- select(db$POP_ESTN_UNIT, CN, EVAL_CN, AREA_USED, P1PNTCNT_EU)
POP_EVAL <- select(db$POP_EVAL, EVALID, EVAL_GRP_CN, ESTN_METHOD, CN, END_INVYR, REPORT_YEAR_NM)
POP_EVAL_TYP <- select(db$POP_EVAL_TYP, EVAL_TYP, EVAL_CN)
POP_PLOT_STRATUM_ASSGN <- select(db$POP_PLOT_STRATUM_ASSGN, STRATUM_CN, PLT_CN)
POP_STRATUM <- select(db$POP_STRATUM, ESTN_UNIT_CN, EXPNS, P2POINTCNT,
ADJ_FACTOR_MICR, ADJ_FACTOR_SUBP, ADJ_FACTOR_MACR, CN, P1POINTCNT)
## Join the tables
data <- PLOT %>%
## Add a PLT_CN column for easy joining
mutate(PLT_CN = CN) %>%
## Join COND & TREE
left_join(COND, by = 'PLT_CN') %>%
left_join(TREE, by = c('PLT_CN', 'CONDID')) %>%
## Population tables
left_join(POP_PLOT_STRATUM_ASSGN, by = 'PLT_CN') %>%
left_join(POP_STRATUM, by = c('STRATUM_CN' = 'CN')) %>%
left_join(POP_ESTN_UNIT, by = c('ESTN_UNIT_CN' = 'CN')) %>%
left_join(POP_EVAL, by = c('EVAL_CN' = 'CN')) %>%
left_join(POP_EVAL_TYP, by = 'EVAL_CN')
Now let’s make a column that will adjust for non-response in our sample (See Bechtold and Patterson (2005), 3.4.3 ‘Nonsampled Plots and Plot Replacement’). Since we know there are no macroplots in RI, we don’t really need to worry about that here, but we will show you anyways.
## Make some adjustment factors
data <- data %>%
mutate(
## AREA
aAdj = case_when(
## When NA, stay NA
is.na(PROP_BASIS) ~ NA_real_,
## If the proportion was measured for a macroplot,
## use the macroplot value
PROP_BASIS == 'MACR' ~ as.numeric(ADJ_FACTOR_MACR),
## Otherwise, use the subpplot value
PROP_BASIS == 'SUBP' ~ ADJ_FACTOR_SUBP),
## TREE
tAdj = case_when(
## When DIA is na, adjustment is NA
is.na(DIA) ~ ADJ_FACTOR_SUBP,
## When DIA is less than 5", use microplot value
DIA < 5 ~ ADJ_FACTOR_MICR,
## When DIA is greater than 5", use subplot value
DIA >= 5 ~ ADJ_FACTOR_SUBP
))
Next, we need to construct what Bechtold and Patterson (2005) called a ‘domain indicator function’. (see Eq. 4.1, pg. 47 of the publication). This is essentially just a vector which indicates whether a tree (or plot, condition, etc.) is within our domain of interest (live trees on forest land).
To construct the domain indicator, we just need a vector which is the same length as our joined table (data
), and takes a value of 1 if the stem (or condition) is in the domain and 0 otherwise. We build seperate domain indicators for estimating tree totals and area totals, because we can specify different domains of interest for both. For example, if we used our tree domain (live trees on forest land) to estimate area, then we would not actually be estimating the full forested area in RI. Instead we would estimate the forested area ONLY where live trees are currently present.
## Build a domain indicator for land type and live trees
## Land type (all forested area)
data$aDI <- if_else(data$COND_STATUS_CD == 1, 1, 0)
## Live trees only (on forested area)
data$tDI <- if_else(data$STATUSCD == 1, 1, 0) * data$aDI
## Now, let's just rename the END_INVYR column to 'YEAR'
## for a pretty output like rFIA
data <- data %>%
mutate(YEAR = END_INVYR) %>%
## remove any NAs
filter(!is.na(YEAR))
Without Sampling Errors
Now we are ready to start computing estimates. If we don’t care aboute sampling errors, we can use the EXPNS
column in the POP_STRATUM
table to make our lives easier. EXPNS
is an expansion factor which descibes the area, in acres, that a stratum represents divided by the number of sampled plots in that stratum (see Bechtold and Patterson (2005), section 4.2 for more information on FIA stratification procedures). When summed across summed across all plots in the population of interest, EXPNS
allows us to easily obtain estimates of population totals, without worrying about fancy stratifaction procedures and variance estimators.
No grouping variables
First we compute totals for biomass, carbon, and forested area:
## Estimate Tree totals
tre_bio <- data %>%
filter(EVAL_TYP == 'EXPVOL') %>%
## Make sure we only have unique observations of plots, trees, etc.
distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, SUBP, TREE, .keep_all = TRUE) %>%
## Plot-level estimates first (multiplying by EXPNS here)
group_by(YEAR, ESTN_UNIT_CN, ESTN_METHOD, STRATUM_CN, PLT_CN) %>%
summarize(bioPlot = sum(DRYBIO_AG * TPA_UNADJ * tAdj * tDI * EXPNS / 2000, na.rm = TRUE),
carbPlot = sum(CARBON_AG * TPA_UNADJ * tAdj * tDI * EXPNS / 2000, na.rm = TRUE)) %>%
## Now we simply sum the values of each plot (expanded w/ EXPNS)
## to obtain population totals
group_by(YEAR) %>%
summarize(BIO_AG_TOTAL = sum(bioPlot, na.rm = TRUE),
CARB_AG_TOTAL = sum(carbPlot, na.rm = TRUE))
## Estimate Area totals
area_bio <- data %>%
filter(EVAL_TYP == 'EXPCURR') %>%
## Make sure we only have unique observations of plots, trees, etc.
distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, .keep_all = TRUE) %>%
## Plot-level estimates first (multiplying by EXPNS here)
group_by(YEAR, ESTN_UNIT_CN, ESTN_METHOD, STRATUM_CN, PLT_CN) %>%
summarize(forArea = sum(CONDPROP_UNADJ * aAdj * aDI * EXPNS, na.rm = TRUE)) %>%
## Now we simply sum the values of each plot (expanded w/ EXPNS)
## to obtain population totals
group_by(YEAR) %>%
summarize(AREA_TOTAL = sum(forArea, na.rm = TRUE))
Then we can join these tables up, and produce ratio estimates:
bio <- left_join(tre_bio, area_bio) %>%
mutate(BIO_AG_ACRE = BIO_AG_TOTAL / AREA_TOTAL,
CARB_AG_ACRE = CARB_AG_TOTAL / AREA_TOTAL) %>%
## Reordering the columns
select(YEAR, BIO_AG_ACRE, CARB_AG_ACRE, BIO_AG_TOTAL, CARB_AG_TOTAL, AREA_TOTAL)
## Joining, by = "YEAR"
Comparing with rFIA
, we get a match!
biomass(clipFIA(fiaRI), totals = TRUE)
## # A tibble: 1 x 37
## YEAR NETVOL_ACRE SAWVOL_ACRE BIO_AG_ACRE BIO_BG_ACRE BIO_ACRE CARB_AG_ACRE
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2018 2491. 1419. 70.4 14.0 84.4 35.2
## # ... with 30 more variables: CARB_BG_ACRE <dbl>, CARB_ACRE <dbl>,
## # NETVOL_TOTAL <dbl>, SAWVOL_TOTAL <dbl>, BIO_AG_TOTAL <dbl>,
## # BIO_BG_TOTAL <dbl>, BIO_TOTAL <dbl>, CARB_AG_TOTAL <dbl>,
## # CARB_BG_TOTAL <dbl>, CARB_TOTAL <dbl>, AREA_TOTAL <dbl>,
## # NETVOL_ACRE_SE <dbl>, SAWVOL_ACRE_SE <dbl>, BIO_AG_ACRE_SE <dbl>,
## # BIO_BG_ACRE_SE <dbl>, BIO_ACRE_SE <dbl>, CARB_AG_ACRE_SE <dbl>,
## # CARB_BG_ACRE_SE <dbl>, CARB_ACRE_SE <dbl>, NETVOL_TOTAL_SE <dbl>,
## # SAWVOL_TOTAL_SE <dbl>, BIO_AG_TOTAL_SE <dbl>, BIO_BG_TOTAL_SE <dbl>,
## # BIO_TOTAL_SE <dbl>, CARB_AG_TOTAL_SE <dbl>, CARB_BG_TOTAL_SE <dbl>,
## # CARB_TOTAL_SE <dbl>, AREA_TOTAL_SE <dbl>, nPlots_TREE <dbl>,
## # nPlots_AREA <dbl>
bio
## # A tibble: 5 x 6
## YEAR BIO_AG_ACRE CARB_AG_ACRE BIO_AG_TOTAL CARB_AG_TOTAL AREA_TOTAL
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2014 68.0 34.0 25033359. 12516679. 367884.
## 2 2015 69.1 34.6 25559575. 12779788. 369844.
## 3 2016 70.6 35.3 25852439. 12926219. 366364.
## 4 2017 70.8 35.4 26063235. 13031617. 368373.
## 5 2018 70.4 35.2 25823833. 12911916. 366959.
Adding grouping variables
To add grouping variables to the above procedures, we can simply add the names of the variables we wish to group by to the group_by
call:
## Grouping by Ownership group (OWNGRPCD)
## Estimate Tree totals
tre_bioGrp <- data %>%
filter(EVAL_TYP == 'EXPVOL') %>%
## Make sure we only have unique observations of plots, trees, etc.
distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, SUBP, TREE, .keep_all = TRUE) %>%
## Plot-level estimates first (multiplying by EXPNS here)
group_by(YEAR, OWNGRPCD, ESTN_UNIT_CN, ESTN_METHOD, STRATUM_CN, PLT_CN) %>%
summarize(bioPlot = sum(DRYBIO_AG * TPA_UNADJ * tAdj * tDI * EXPNS / 2000, na.rm = TRUE),
carbPlot = sum(CARBON_AG * TPA_UNADJ * tAdj * tDI * EXPNS / 2000, na.rm = TRUE)) %>%
## Now we simply sum the values of each plot (expanded w/ EXPNS)
## to obtain population totals
group_by(YEAR, OWNGRPCD) %>%
summarize(BIO_AG_TOTAL = sum(bioPlot, na.rm = TRUE),
CARB_AG_TOTAL = sum(carbPlot, na.rm = TRUE))
## Estimate Area totals
area_bioGrp <- data %>%
filter(EVAL_TYP == 'EXPCURR') %>%
## Make sure we only have unique observations of plots, trees, etc.
distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, .keep_all = TRUE) %>%
## Plot-level estimates first (multiplying by EXPNS here)
group_by(YEAR, OWNGRPCD, ESTN_UNIT_CN, ESTN_METHOD, STRATUM_CN, PLT_CN) %>%
summarize(forArea = sum(CONDPROP_UNADJ * aAdj * aDI * EXPNS, na.rm = TRUE)) %>%
## Now we simply sum the values of each plot (expanded w/ EXPNS)
## to obtain population totals
group_by(YEAR, OWNGRPCD) %>%
summarize(AREA_TOTAL = sum(forArea, na.rm = TRUE))
## Now we can simply join these two up, and produce ratio estimates
bioGrp <- left_join(tre_bioGrp, area_bioGrp) %>%
mutate(BIO_AG_ACRE = BIO_AG_TOTAL / AREA_TOTAL,
CARB_AG_ACRE = CARB_AG_TOTAL / AREA_TOTAL) %>%
## Reordering the columns
select(YEAR, OWNGRPCD, BIO_AG_ACRE, CARB_AG_ACRE, BIO_AG_TOTAL, CARB_AG_TOTAL, AREA_TOTAL)
## Joining, by = c("YEAR", "OWNGRPCD")
## Now let's compare with rFIA.... looks like a match!
biomass(clipFIA(fiaRI), totals = TRUE, grpBy = OWNGRPCD)
## # A tibble: 2 x 38
## YEAR OWNGRPCD NETVOL_ACRE SAWVOL_ACRE BIO_AG_ACRE BIO_BG_ACRE BIO_ACRE
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2018 30 2615. 1614. 71.8 14.4 86.2
## 2 2018 40 2436. 1332. 69.7 13.8 83.6
## # ... with 31 more variables: CARB_AG_ACRE <dbl>, CARB_BG_ACRE <dbl>,
## # CARB_ACRE <dbl>, NETVOL_TOTAL <dbl>, SAWVOL_TOTAL <dbl>,
## # BIO_AG_TOTAL <dbl>, BIO_BG_TOTAL <dbl>, BIO_TOTAL <dbl>,
## # CARB_AG_TOTAL <dbl>, CARB_BG_TOTAL <dbl>, CARB_TOTAL <dbl>,
## # AREA_TOTAL <dbl>, NETVOL_ACRE_SE <dbl>, SAWVOL_ACRE_SE <dbl>,
## # BIO_AG_ACRE_SE <dbl>, BIO_BG_ACRE_SE <dbl>, BIO_ACRE_SE <dbl>,
## # CARB_AG_ACRE_SE <dbl>, CARB_BG_ACRE_SE <dbl>, CARB_ACRE_SE <dbl>,
## # NETVOL_TOTAL_SE <dbl>, SAWVOL_TOTAL_SE <dbl>, BIO_AG_TOTAL_SE <dbl>,
## # BIO_BG_TOTAL_SE <dbl>, BIO_TOTAL_SE <dbl>, CARB_AG_TOTAL_SE <dbl>,
## # CARB_BG_TOTAL_SE <dbl>, CARB_TOTAL_SE <dbl>, AREA_TOTAL_SE <dbl>,
## # nPlots_TREE <dbl>, nPlots_AREA <dbl>
bioGrp
## # A tibble: 15 x 7
## # Groups: YEAR [5]
## YEAR OWNGRPCD BIO_AG_ACRE CARB_AG_ACRE BIO_AG_TOTAL CARB_AG_TOTAL AREA_TOTAL
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2014 30 69.9 34.9 7513432. 3756716. 107532.
## 2 2014 40 67.3 33.6 17519927. 8759963. 260353.
## 3 2014 NA NaN NaN 0 0 0
## 4 2015 30 70.7 35.3 7441888. 3720944. 105299.
## 5 2015 40 68.5 34.2 18117688. 9058844. 264546.
## 6 2015 NA NaN NaN 0 0 0
## 7 2016 30 71.8 35.9 7666303. 3833151. 106763.
## 8 2016 40 70.1 35.0 18186136. 9093068. 259601.
## 9 2016 NA NaN NaN 0 0 0
## 10 2017 30 70.5 35.3 8015385. 4007692. 113648.
## 11 2017 40 70.9 35.4 18047850. 9023925. 254725.
## 12 2017 NA NaN NaN 0 0 0
## 13 2018 30 71.8 35.9 8090815. 4045408. 112702.
## 14 2018 40 69.7 34.9 17733018. 8866509. 254256.
## 15 2018 NA NaN NaN 0 0 0
select
calls in
Data Preperation, otherwise the variable will not be found in data
.
With Sampling Errors
Computing estimates with associated sampling errors is a bit more complex than what we saw above, as we can no longer rely on EXPNS
to do the heavy lifting for us. In short, we will add a few additional steps when computing tree totals and area totals, summarizing at the strata and estimation unit level along the way. When adding grouping variables, we will need to modify our code further, treating each group as a unique population and summarizing these populations individually. We will follow the procedures outlined by Bechtold and Patterson (2005) (see section 4) for our computations.
Before we get started, Let’s check out what type of estimation methods were used to determine values in the POP tables, as this will influence which variance estimators we use.
unique(db$POP_EVAL$ESTN_METHOD)
## [1] "Post-Stratification"
Looks like just post-stratification
, so we will use the stratified random sampling estimation approach (known weights)
No grouping variables
First we estimate both tree and area attributes at the plot level, and then join these estimates before proceeding to the strata level so that we can compute the covariance between area and tree attributes
## Estimate Tree attributes -- plot level
tre_pop <- data %>%
filter(EVAL_TYP == 'EXPVOL') %>%
## Make sure we only have unique observations of plots, trees, etc.
distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, SUBP, TREE, .keep_all = TRUE) %>%
## Plot-level estimates first (note we are NOT using EXPNS here)
group_by(YEAR, ESTN_UNIT_CN, STRATUM_CN, PLT_CN) %>%
summarize(bioPlot = sum(DRYBIO_AG * TPA_UNADJ * tAdj * tDI / 2000, na.rm = TRUE),
carbPlot = sum(CARBON_AG * TPA_UNADJ * tAdj * tDI / 2000, na.rm = TRUE))
## Estimate Area attributes -- plot level
area_pop <- data %>%
filter(EVAL_TYP == 'EXPCURR') %>%
## Make sure we only have unique observations of plots, trees, etc.
distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, .keep_all = TRUE) %>%
## Plot-level estimates first (multiplying by EXPNS here)
## Extra grouping variables are only here so they are carried through
group_by(YEAR, P1POINTCNT, P1PNTCNT_EU, P2POINTCNT, AREA_USED, ESTN_UNIT_CN, STRATUM_CN, PLT_CN) %>%
summarize(forArea = sum(CONDPROP_UNADJ * aAdj * aDI, na.rm = TRUE))
## Joining the two tables
bio_pop_plot <- left_join(tre_pop, area_pop)
## Joining, by = c("YEAR", "ESTN_UNIT_CN", "STRATUM_CN", "PLT_CN")
Now that we have both area and tree attributes in the same table, we can follow through with the remaining estimation procedures at the strata and estimation unit levels:
## Strata level
bio_pop_strat <- bio_pop_plot %>%
group_by(YEAR, ESTN_UNIT_CN, STRATUM_CN) %>%
summarize(aStrat = mean(forArea, na.rm = TRUE), # Area mean
bioStrat = mean(bioPlot, na.rm = TRUE), # Biomass mean
carbStrat = mean(carbPlot, na.rm = TRUE), # Carbon mean
## We don't want a vector of these values, since they are repeated
P2POINTCNT = first(P2POINTCNT),
AREA_USED = first(AREA_USED),
## Strata weight, relative to estimation unit
w = first(P1POINTCNT) / first(P1PNTCNT_EU),
## Strata level variances
aVar = (sum(forArea^2) - sum(P2POINTCNT * aStrat^2)) / (P2POINTCNT * (P2POINTCNT-1)),
bioVar = (sum(bioPlot^2) - sum(P2POINTCNT * bioStrat^2)) / (P2POINTCNT * (P2POINTCNT-1)),
carbVar = (sum(carbPlot^2) - sum(P2POINTCNT * carbStrat^2)) / (P2POINTCNT * (P2POINTCNT-1)),
## Strata level co-varainces
bioCV = (sum(forArea*bioPlot) - sum(P2POINTCNT * aStrat *bioStrat)) / (P2POINTCNT * (P2POINTCNT-1)),
carbCV = (sum(forArea*carbPlot) - sum(P2POINTCNT * aStrat *carbStrat)) / (P2POINTCNT * (P2POINTCNT-1)))
## Moving on to the estimation unit
bio_pop_eu <- bio_pop_strat %>%
group_by(YEAR, ESTN_UNIT_CN) %>%
summarize(aEst = sum(aStrat * w, na.rm = TRUE) * first(AREA_USED), # Mean Area
bioEst = sum(bioStrat * w, na.rm = TRUE) * first(AREA_USED), # Mean biomass
carbEst = sum(carbStrat * w, na.rm = TRUE) * first(AREA_USED), # Mean carbon
## Estimation unit variances
aVar = (first(AREA_USED)^2 / sum(P2POINTCNT)) *
(sum(P2POINTCNT*w*aVar) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*aVar)),
bioVar = (first(AREA_USED)^2 / sum(P2POINTCNT)) *
(sum(P2POINTCNT*w*bioVar) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*bioVar)),
carbVar = (first(AREA_USED)^2 / sum(P2POINTCNT)) *
(sum(P2POINTCNT*w*carbVar) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*carbVar)),
## Estimation unit covariances
bioCV = (first(AREA_USED)^2 / sum(P2POINTCNT)) *
(sum(P2POINTCNT*w*bioCV) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*bioCV)),
carbCV = (first(AREA_USED)^2 / sum(P2POINTCNT)) *
(sum(P2POINTCNT*w*carbCV) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*carbCV)))
Finally, we can sum attributes across estimation units to obtain totals for our region:
## sum across Estimation Units for totals
bio_pop <- bio_pop_eu %>%
group_by(YEAR) %>%
summarize(AREA_TOTAL = sum(aEst, na.rm = TRUE),
BIO_AG_TOTAL = sum(bioEst, na.rm = TRUE),
CARB_AG_TOTAL = sum(carbEst, na.rm = TRUE),
## Ratios
BIO_AG_ACRE = BIO_AG_TOTAL / AREA_TOTAL,
CARB_AG_ACRE = CARB_AG_TOTAL / AREA_TOTAL,
## Total samping errors
AREA_TOTAL_SE = sqrt(sum(aVar, na.rm = TRUE)) / AREA_TOTAL * 100,
BIO_AG_TOTAL_SE = sqrt(sum(bioVar, na.rm = TRUE)) / BIO_AG_TOTAL * 100,
CARB_AG_TOTAL_SE = sqrt(sum(carbVar, na.rm = TRUE)) / CARB_AG_TOTAL * 100,
## Ratio variances
bioAcreVar = (1 / AREA_TOTAL^2) * (sum(bioVar) + (BIO_AG_ACRE^2)*sum(aVar) - 2*BIO_AG_ACRE*sum(bioCV)),
carbAcreVar = (1 / AREA_TOTAL^2) * (sum(carbVar) + (CARB_AG_ACRE^2)*sum(aVar) - 2*CARB_AG_ACRE*sum(carbCV)),
BIO_AG_ACRE_SE = sqrt(sum(bioAcreVar, na.rm = TRUE)) / BIO_AG_ACRE * 100,
CARB_AG_ACRE_SE = sqrt(sum(carbAcreVar, na.rm = TRUE)) / CARB_AG_ACRE * 100) %>%
## Re ordering, dropping variance
select(YEAR, BIO_AG_ACRE, CARB_AG_ACRE, BIO_AG_TOTAL, CARB_AG_TOTAL, AREA_TOTAL, BIO_AG_ACRE_SE,
CARB_AG_ACRE_SE, BIO_AG_TOTAL_SE, CARB_AG_TOTAL_SE, AREA_TOTAL_SE)
Comparing with rFIA
, we get a match!
biomass(clipFIA(fiaRI), totals = TRUE)
## # A tibble: 1 x 37
## YEAR NETVOL_ACRE SAWVOL_ACRE BIO_AG_ACRE BIO_BG_ACRE BIO_ACRE CARB_AG_ACRE
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2018 2491. 1419. 70.4 14.0 84.4 35.2
## # ... with 30 more variables: CARB_BG_ACRE <dbl>, CARB_ACRE <dbl>,
## # NETVOL_TOTAL <dbl>, SAWVOL_TOTAL <dbl>, BIO_AG_TOTAL <dbl>,
## # BIO_BG_TOTAL <dbl>, BIO_TOTAL <dbl>, CARB_AG_TOTAL <dbl>,
## # CARB_BG_TOTAL <dbl>, CARB_TOTAL <dbl>, AREA_TOTAL <dbl>,
## # NETVOL_ACRE_SE <dbl>, SAWVOL_ACRE_SE <dbl>, BIO_AG_ACRE_SE <dbl>,
## # BIO_BG_ACRE_SE <dbl>, BIO_ACRE_SE <dbl>, CARB_AG_ACRE_SE <dbl>,
## # CARB_BG_ACRE_SE <dbl>, CARB_ACRE_SE <dbl>, NETVOL_TOTAL_SE <dbl>,
## # SAWVOL_TOTAL_SE <dbl>, BIO_AG_TOTAL_SE <dbl>, BIO_BG_TOTAL_SE <dbl>,
## # BIO_TOTAL_SE <dbl>, CARB_AG_TOTAL_SE <dbl>, CARB_BG_TOTAL_SE <dbl>,
## # CARB_TOTAL_SE <dbl>, AREA_TOTAL_SE <dbl>, nPlots_TREE <dbl>,
## # nPlots_AREA <dbl>
bio_pop
## # A tibble: 5 x 11
## YEAR BIO_AG_ACRE CARB_AG_ACRE BIO_AG_TOTAL CARB_AG_TOTAL AREA_TOTAL
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2014 68.0 34.0 25033359. 12516679. 367884.
## 2 2015 69.1 34.6 25559575. 12779788. 369844.
## 3 2016 70.6 35.3 25852439. 12926219. 366364.
## 4 2017 70.8 35.4 26063235. 13031617. 368373.
## 5 2018 70.4 35.2 25823833. 12911916. 366959.
## # ... with 5 more variables: BIO_AG_ACRE_SE <dbl>, CARB_AG_ACRE_SE <dbl>,
## # BIO_AG_TOTAL_SE <dbl>, CARB_AG_TOTAL_SE <dbl>, AREA_TOTAL_SE <dbl>
Adding grouping variables
Unlike estimation without sampling errors, we can NOT just add grouping variables to the above procedures in our group_by
call. Rather, we will need to account for absence points here (or zero-length outputs) or our estimates will be artificially inflated if groups are not mutually exclusive at the plot level. Example: The presence of red oak on a plot does not negate the potential presence of white ash on the same plot. Therefor, there should be a zero listed for each species not found on the plot. We accomplish this by treating each group as a seperate population, computing each individually, and then rejoining the groups at the end of the operation.
First let’s make a dataframe where each row defines a group that we want to estimate:
## All groups we want to estimate
grps <- data %>%
group_by(YEAR, SPCD) %>%
summarize()
Each row in grps
now defines an individual population that we would like to produce estimates for, thus our end product should have the same number of rows. Thus, we can loop over each of the rows in grps
and use the estimation procedures above to estimate attributes for each group.
Before we get started with the loop we need to find a way to define the population of interest for each interation (row). To do that, we will modify our domain indicators from above to reflect whether or not an observation falls within the population defined by grps[i,]
. Saving the original domain indicators as variables so they are not overwritten in the loop:
tDI <- data$tDI
aDI <- data$aDI
Now we can start our loop:
## Let's store each output in a list
bio_pop <- list()
## Looping
for (i in 1:nrow(grps)){
## Tree domain indicator
data$tDI <- tDI * if_else(data$SPCD == grps$SPCD[i], 1, 0) * if_else(data$YEAR == grps$YEAR[i], 1, 0)
## No need to modify the area domain indicator for SPCD, because
## we want to estimate all forested area within the year
data$aDI <- aDI * if_else(data$YEAR == grps$YEAR[i], 1, 0)
## SAME AS ABOVE, JUST IN A LOOP NOW --> SIMILAR TO USING GROUP_BY
## Estimate Area attributes -- plot level
tre_pop <- data %>%
filter(EVAL_TYP == 'EXPVOL') %>%
## Make sure we only have unique observations of plots, trees, etc.
distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, SUBP, TREE, .keep_all = TRUE) %>%
## Plot-level estimates first (note we are NOT using EXPNS here)
group_by(ESTN_UNIT_CN, STRATUM_CN, PLT_CN) %>%
summarize(bioPlot = sum(DRYBIO_AG * TPA_UNADJ * tAdj * tDI / 2000, na.rm = TRUE),
carbPlot = sum(CARBON_AG * TPA_UNADJ * tAdj * tDI / 2000, na.rm = TRUE))
## Estimate Area attributes -- plot level
area_pop <- data %>%
filter(EVAL_TYP == 'EXPCURR') %>%
## Make sure we only have unique observations of plots, trees, etc.
distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, .keep_all = TRUE) %>%
## Plot-level estimates first (multiplying by EXPNS here)
## Extra grouping variables are only here so they are carried through
group_by(P1POINTCNT, P1PNTCNT_EU, P2POINTCNT, AREA_USED, ESTN_UNIT_CN, STRATUM_CN, PLT_CN) %>%
summarize(forArea = sum(CONDPROP_UNADJ * aAdj * aDI, na.rm = TRUE))
## Joining the two tables
bio_pop_plot <- left_join(tre_pop, area_pop)
## Now we can follow through with the remaining estimation procedures
## Strata level
bio_pop_strat <- bio_pop_plot %>%
group_by(ESTN_UNIT_CN, STRATUM_CN) %>%
summarize(aStrat = mean(forArea, na.rm = TRUE), # Area mean
bioStrat = mean(bioPlot, na.rm = TRUE), # Biomass mean
carbStrat = mean(carbPlot, na.rm = TRUE), # Carbon mean
## We don't want a vector of these values, since they are repeated
P2POINTCNT = first(P2POINTCNT),
AREA_USED = first(AREA_USED),
## Strata weight, relative to estimation unit
w = first(P1POINTCNT) / first(P1PNTCNT_EU),
## Strata level variances
aVar = (sum(forArea^2) - sum(P2POINTCNT * aStrat^2)) / (P2POINTCNT * (P2POINTCNT-1)),
bioVar = (sum(bioPlot^2) - sum(P2POINTCNT * bioStrat^2)) / (P2POINTCNT * (P2POINTCNT-1)),
carbVar = (sum(carbPlot^2) - sum(P2POINTCNT * carbStrat^2)) / (P2POINTCNT * (P2POINTCNT-1)),
## Strata level co-varainces
bioCV = (sum(forArea*bioPlot) - sum(P2POINTCNT * aStrat *bioStrat)) / (P2POINTCNT * (P2POINTCNT-1)),
carbCV = (sum(forArea*carbPlot) - sum(P2POINTCNT * aStrat *carbStrat)) / (P2POINTCNT * (P2POINTCNT-1)))
## Moving on to the estimation unit
bio_pop_eu <- bio_pop_strat %>%
group_by(ESTN_UNIT_CN) %>%
summarize(aEst = sum(aStrat * w, na.rm = TRUE) * first(AREA_USED), # Mean Area
bioEst = sum(bioStrat * w, na.rm = TRUE) * first(AREA_USED), # Mean biomass
carbEst = sum(carbStrat * w, na.rm = TRUE) * first(AREA_USED), # Mean carbon
## Estimation unit variances
aVar = (first(AREA_USED)^2 / sum(P2POINTCNT)) *
(sum(P2POINTCNT*w*aVar) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*aVar)),
bioVar = (first(AREA_USED)^2 / sum(P2POINTCNT)) *
(sum(P2POINTCNT*w*bioVar) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*bioVar)),
carbVar = (first(AREA_USED)^2 / sum(P2POINTCNT)) *
(sum(P2POINTCNT*w*carbVar) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*carbVar)),
## Estimation unit covariances
bioCV = (first(AREA_USED)^2 / sum(P2POINTCNT)) *
(sum(P2POINTCNT*w*bioCV) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*bioCV)),
carbCV = (first(AREA_USED)^2 / sum(P2POINTCNT)) *
(sum(P2POINTCNT*w*carbCV) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*carbCV)))
## Finally, sum across Estimation Units for totals
bio_pop_total <- bio_pop_eu %>%
summarize(AREA_TOTAL = sum(aEst, na.rm = TRUE),
BIO_AG_TOTAL = sum(bioEst, na.rm = TRUE),
CARB_AG_TOTAL = sum(carbEst, na.rm = TRUE),
## Ratios
BIO_AG_ACRE = BIO_AG_TOTAL / AREA_TOTAL,
CARB_AG_ACRE = CARB_AG_TOTAL / AREA_TOTAL,
## Total samping errors
AREA_TOTAL_SE = sqrt(sum(aVar, na.rm = TRUE)) / AREA_TOTAL * 100,
BIO_AG_TOTAL_SE = sqrt(sum(bioVar, na.rm = TRUE)) / BIO_AG_TOTAL * 100,
CARB_AG_TOTAL_SE = sqrt(sum(carbVar, na.rm = TRUE)) / CARB_AG_TOTAL * 100,
## Ratio variances
bioAcreVar = (1 / AREA_TOTAL^2) * (sum(bioVar) + (BIO_AG_ACRE^2)*sum(aVar) - 2*BIO_AG_ACRE*sum(bioCV)),
carbAcreVar = (1 / AREA_TOTAL^2) * (sum(carbVar) + (CARB_AG_ACRE^2)*sum(aVar) - 2*CARB_AG_ACRE*sum(carbCV)),
BIO_AG_ACRE_SE = sqrt(sum(bioAcreVar, na.rm = TRUE)) / BIO_AG_ACRE * 100,
CARB_AG_ACRE_SE = sqrt(sum(carbAcreVar, na.rm = TRUE)) / CARB_AG_ACRE * 100) %>%
## Re ordering, dropping variance
select(BIO_AG_ACRE, CARB_AG_ACRE, BIO_AG_TOTAL, CARB_AG_TOTAL, AREA_TOTAL, BIO_AG_ACRE_SE,
CARB_AG_ACRE_SE, BIO_AG_TOTAL_SE, CARB_AG_TOTAL_SE, AREA_TOTAL_SE)
## Saving the output in our list...
bio_pop[[i]] <- bio_pop_total
}
Great, we have our estimates! Except they are locked up in a list. Converting back to a data.frame
and rejoining with grps
:
bio_pop <- bind_rows(bio_pop)
bio_pop_sp <- bind_cols(grps, bio_pop)
Comparing with rFIA, we have a match!
head(biomass(clipFIA(fiaRI), totals = TRUE, bySpecies = TRUE))
## # A tibble: 6 x 40
## YEAR SPCD COMMON_NAME SCIENTIFIC_NAME NETVOL_ACRE SAWVOL_ACRE BIO_AG_ACRE
## <int> <int> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2018 12 balsam fir Abies balsamea 0.422 0 0.00697
## 2 2018 43 Atlantic w~ Chamaecyparis ~ 2.62 0.742 0.0403
## 3 2018 68 eastern re~ Juniperus virg~ 1.34 0 0.0308
## 4 2018 126 pitch pine Pinus rigida 55.7 43.8 1.18
## 5 2018 129 eastern wh~ Pinus strobus 481. 406. 8.52
## 6 2018 130 Scotch pine Pinus sylvestr~ 0.156 0 0.00314
## # ... with 33 more variables: BIO_BG_ACRE <dbl>, BIO_ACRE <dbl>,
## # CARB_AG_ACRE <dbl>, CARB_BG_ACRE <dbl>, CARB_ACRE <dbl>,
## # NETVOL_TOTAL <dbl>, SAWVOL_TOTAL <dbl>, BIO_AG_TOTAL <dbl>,
## # BIO_BG_TOTAL <dbl>, BIO_TOTAL <dbl>, CARB_AG_TOTAL <dbl>,
## # CARB_BG_TOTAL <dbl>, CARB_TOTAL <dbl>, AREA_TOTAL <dbl>,
## # NETVOL_ACRE_SE <dbl>, SAWVOL_ACRE_SE <dbl>, BIO_AG_ACRE_SE <dbl>,
## # BIO_BG_ACRE_SE <dbl>, BIO_ACRE_SE <dbl>, CARB_AG_ACRE_SE <dbl>,
## # CARB_BG_ACRE_SE <dbl>, CARB_ACRE_SE <dbl>, NETVOL_TOTAL_SE <dbl>,
## # SAWVOL_TOTAL_SE <dbl>, BIO_AG_TOTAL_SE <dbl>, BIO_BG_TOTAL_SE <dbl>,
## # BIO_TOTAL_SE <dbl>, CARB_AG_TOTAL_SE <dbl>, CARB_BG_TOTAL_SE <dbl>,
## # CARB_TOTAL_SE <dbl>, AREA_TOTAL_SE <dbl>, nPlots_TREE <dbl>,
## # nPlots_AREA <dbl>
bio_pop_sp
## # A tibble: 253 x 12
## # Groups: YEAR [5]
## YEAR SPCD BIO_AG_ACRE CARB_AG_ACRE BIO_AG_TOTAL CARB_AG_TOTAL AREA_TOTAL
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2014 12 0.00708 0.00354 2604. 1302. 367884.
## 2 2014 43 0.0300 0.0150 11020. 5510. 367884.
## 3 2014 68 0.0293 0.0146 10763. 5382. 367884.
## 4 2014 96 0 0 0 0 367884.
## 5 2014 126 1.22 0.608 447070. 223535. 367884.
## 6 2014 129 7.99 4.00 2939774. 1469887. 367884.
## 7 2014 130 0.00325 0.00163 1197. 599. 367884.
## 8 2014 261 0.763 0.382 280807. 140404. 367884.
## 9 2014 313 0.0141 0.00703 5173. 2586. 367884.
## 10 2014 316 15.0 7.50 5516182. 2758091. 367884.
## # ... with 243 more rows, and 5 more variables: BIO_AG_ACRE_SE <dbl>,
## # CARB_AG_ACRE_SE <dbl>, BIO_AG_TOTAL_SE <dbl>, CARB_AG_TOTAL_SE <dbl>,
## # AREA_TOTAL_SE <dbl>