Estimating population attributes


In the code below, we highlight the basic estimation procedures used to compute population estimates of forest attributes from Forest Inventory and Analysis (FIA) data. We will demonstrate these procedures with the fiaRI dataset (included in the rFIA package) so that you can follow along. Download the R script for these examples here.

Our goal here is to estimate total tree biomass, total treecarbon (aboveground) and total forested area in the state of Rhode Island for the year 2018. From these totals, we will compute ratios of average tree biomass/ forested acre and average tree carbon/ forested acre. We will do this with and without sampling errors (without being much simpler), and show you how we handle grouped estimates in both cases. All estimates will be computed for live trees.

The source code for rFIA will vary slightly from that presented below as we designed rFIA to be as flexible and computationally efficient as possible. Despite the differences in syntax and structure, the estimation procedures presented below are identical to those in rFIA. You can find and download the full source code for rFIA from our GitHub Repo.



Data Preparation

First, let’s load some packages and the fiaRI dataset:

# Load some packages
library(rFIA)
library(dplyr)

# Load the fiaRI dataset (included in rFIA)
data(fiaRI)
db <- fiaRI

To compute population estimates of current area and current biomass/carbon from the FIADB, we need to identify and subset the necessary portions. To do this, we will use what FIA calls an EVALID, hence the name ‘EVALIDator’.

ids <- db$POP_EVAL %>%
  select('CN', 'END_INVYR', 'EVALID') %>%
  inner_join(select(db$POP_EVAL_TYP, c('EVAL_CN', 'EVAL_TYP')), by = c('CN' = 'EVAL_CN')) %>%
  ## Now we filter out everything except current area and 
  ## current volume ids
  filter(EVAL_TYP %in% c('EXPCURR', 'EXPVOL'))

## Now that we have those EVALIDs, let's use clipFIA to subset
db <- clipFIA(db, evalid = ids$EVALID)
## Warning in clipFIA(db, evalid = ids$EVALID): Returning subset by EVALID only.
## For most recent subset, specify `evalid = NULL` and `mostRecent = TRUE`.

Since we need some information stored in each of these tables to compute estimates, we will join them into one big dataframe (let’s call that data) that we can operate on.

## Select only the columns we need from each table, to keep things slim
PLOT <- select(db$PLOT, CN, MACRO_BREAKPOINT_DIA)
COND <- select(db$COND, PLT_CN, CONDID, CONDPROP_UNADJ, PROP_BASIS, COND_STATUS_CD, OWNGRPCD)
TREE <- select(db$TREE, PLT_CN, CONDID, SUBP, TREE, STATUSCD, DRYBIO_AG, CARBON_AG, TPA_UNADJ, DIA, SPCD)
POP_ESTN_UNIT <- select(db$POP_ESTN_UNIT, CN, EVAL_CN, AREA_USED, P1PNTCNT_EU)
POP_EVAL <- select(db$POP_EVAL, EVALID, EVAL_GRP_CN, ESTN_METHOD, CN, END_INVYR, REPORT_YEAR_NM)
POP_EVAL_TYP <- select(db$POP_EVAL_TYP, EVAL_TYP, EVAL_CN)
POP_PLOT_STRATUM_ASSGN <- select(db$POP_PLOT_STRATUM_ASSGN, STRATUM_CN, PLT_CN)
POP_STRATUM <- select(db$POP_STRATUM, ESTN_UNIT_CN, EXPNS, P2POINTCNT, 
                      ADJ_FACTOR_MICR, ADJ_FACTOR_SUBP, ADJ_FACTOR_MACR, CN, P1POINTCNT)

## Join the tables
data <- PLOT %>%
  ## Add a PLT_CN column for easy joining
  mutate(PLT_CN = CN) %>%
  ## Join COND & TREE
  left_join(COND, by = 'PLT_CN') %>%
  left_join(TREE, by = c('PLT_CN', 'CONDID')) %>%
  ## Population tables
  left_join(POP_PLOT_STRATUM_ASSGN, by = 'PLT_CN') %>%
  left_join(POP_STRATUM, by = c('STRATUM_CN' = 'CN')) %>%
  left_join(POP_ESTN_UNIT, by = c('ESTN_UNIT_CN' = 'CN')) %>%
  left_join(POP_EVAL, by = c('EVAL_CN' = 'CN')) %>%
  left_join(POP_EVAL_TYP, by = 'EVAL_CN')

Now let’s make a column that will adjust for non-response in our sample (See Bechtold and Patterson (2005), 3.4.3 ‘Nonsampled Plots and Plot Replacement’). Since we know there are no macroplots in RI, we don’t really need to worry about that here, but we will show you anyways.

## Make some adjustment factors
data <- data %>%
  mutate(
    ## AREA
    aAdj = case_when(
           ## When NA, stay NA
           is.na(PROP_BASIS) ~ NA_real_,
           ## If the proportion was measured for a macroplot,
           ## use the macroplot value
           PROP_BASIS == 'MACR' ~ as.numeric(ADJ_FACTOR_MACR),
           ## Otherwise, use the subpplot value
           PROP_BASIS == 'SUBP' ~ ADJ_FACTOR_SUBP),
    ## TREE
    tAdj = case_when(
           ## When DIA is na, adjustment is NA
           is.na(DIA) ~ ADJ_FACTOR_SUBP,
           ## When DIA is less than 5", use microplot value
           DIA < 5 ~ ADJ_FACTOR_MICR,
           ## When DIA is greater than 5", use subplot value
           DIA >= 5 ~ ADJ_FACTOR_SUBP
         ))

Next, we need to construct what Bechtold and Patterson (2005) called a ‘domain indicator function’. (see Eq. 4.1, pg. 47 of the publication). This is essentially just a vector which indicates whether a tree (or plot, condition, etc.) is within our domain of interest (live trees on forest land).

To construct the domain indicator, we just need a vector which is the same length as our joined table (data), and takes a value of 1 if the stem (or condition) is in the domain and 0 otherwise. We build seperate domain indicators for estimating tree totals and area totals, because we can specify different domains of interest for both. For example, if we used our tree domain (live trees on forest land) to estimate area, then we would not actually be estimating the full forested area in RI. Instead we would estimate the forested area ONLY where live trees are currently present.

## Build a domain indicator for land type and live trees
## Land type (all forested area)
data$aDI <- if_else(data$COND_STATUS_CD == 1, 1, 0)
## Live trees only (on forested area)
data$tDI <- if_else(data$STATUSCD == 1, 1, 0) * data$aDI

## Now, let's just rename the END_INVYR column to 'YEAR'
## for a pretty output like rFIA
data <- data %>%
  mutate(YEAR = END_INVYR) %>%
  ## remove any NAs
  filter(!is.na(YEAR))



Without Sampling Errors

Now we are ready to start computing estimates. If we don’t care aboute sampling errors, we can use the EXPNS column in the POP_STRATUM table to make our lives easier. EXPNS is an expansion factor which descibes the area, in acres, that a stratum represents divided by the number of sampled plots in that stratum (see Bechtold and Patterson (2005), section 4.2 for more information on FIA stratification procedures). When summed across summed across all plots in the population of interest, EXPNS allows us to easily obtain estimates of population totals, without worrying about fancy stratifaction procedures and variance estimators.

No grouping variables

First we compute totals for biomass, carbon, and forested area:

## Estimate Tree totals
tre_bio <- data %>%
  filter(EVAL_TYP == 'EXPVOL') %>%
  ## Make sure we only have unique observations of plots, trees, etc.
  distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, SUBP, TREE, .keep_all = TRUE) %>%
  ## Plot-level estimates first (multiplying by EXPNS here)
  group_by(YEAR, ESTN_UNIT_CN, ESTN_METHOD, STRATUM_CN, PLT_CN) %>%
  summarize(bioPlot = sum(DRYBIO_AG * TPA_UNADJ * tAdj * tDI * EXPNS  / 2000, na.rm = TRUE),
            carbPlot = sum(CARBON_AG * TPA_UNADJ * tAdj * tDI * EXPNS  / 2000, na.rm = TRUE)) %>%
  ## Now we simply sum the values of each plot (expanded w/ EXPNS)
  ## to obtain population totals
  group_by(YEAR) %>%
  summarize(BIO_AG_TOTAL = sum(bioPlot, na.rm = TRUE),
            CARB_AG_TOTAL = sum(carbPlot, na.rm = TRUE))

## Estimate Area totals
area_bio <- data %>%
  filter(EVAL_TYP == 'EXPCURR') %>%
  ## Make sure we only have unique observations of plots, trees, etc.
  distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, .keep_all = TRUE) %>%
  ## Plot-level estimates first (multiplying by EXPNS here)
  group_by(YEAR, ESTN_UNIT_CN, ESTN_METHOD, STRATUM_CN, PLT_CN) %>%
  summarize(forArea = sum(CONDPROP_UNADJ * aAdj * aDI * EXPNS, na.rm = TRUE)) %>%
  ## Now we simply sum the values of each plot (expanded w/ EXPNS)
  ## to obtain population totals
  group_by(YEAR) %>%
  summarize(AREA_TOTAL = sum(forArea, na.rm = TRUE))

Then we can join these tables up, and produce ratio estimates:

bio <- left_join(tre_bio, area_bio) %>%
  mutate(BIO_AG_ACRE = BIO_AG_TOTAL / AREA_TOTAL,
         CARB_AG_ACRE = CARB_AG_TOTAL / AREA_TOTAL) %>%
  ## Reordering the columns
  select(YEAR, BIO_AG_ACRE, CARB_AG_ACRE, BIO_AG_TOTAL, CARB_AG_TOTAL, AREA_TOTAL)
## Joining, by = "YEAR"

Comparing with rFIA, we get a match!

biomass(clipFIA(fiaRI), totals = TRUE)
## # A tibble: 1 x 37
##    YEAR NETVOL_ACRE SAWVOL_ACRE BIO_AG_ACRE BIO_BG_ACRE BIO_ACRE CARB_AG_ACRE
##   <int>       <dbl>       <dbl>       <dbl>       <dbl>    <dbl>        <dbl>
## 1  2018       2491.       1419.        70.4        14.0     84.4         35.2
## # ... with 30 more variables: CARB_BG_ACRE <dbl>, CARB_ACRE <dbl>,
## #   NETVOL_TOTAL <dbl>, SAWVOL_TOTAL <dbl>, BIO_AG_TOTAL <dbl>,
## #   BIO_BG_TOTAL <dbl>, BIO_TOTAL <dbl>, CARB_AG_TOTAL <dbl>,
## #   CARB_BG_TOTAL <dbl>, CARB_TOTAL <dbl>, AREA_TOTAL <dbl>,
## #   NETVOL_ACRE_SE <dbl>, SAWVOL_ACRE_SE <dbl>, BIO_AG_ACRE_SE <dbl>,
## #   BIO_BG_ACRE_SE <dbl>, BIO_ACRE_SE <dbl>, CARB_AG_ACRE_SE <dbl>,
## #   CARB_BG_ACRE_SE <dbl>, CARB_ACRE_SE <dbl>, NETVOL_TOTAL_SE <dbl>,
## #   SAWVOL_TOTAL_SE <dbl>, BIO_AG_TOTAL_SE <dbl>, BIO_BG_TOTAL_SE <dbl>,
## #   BIO_TOTAL_SE <dbl>, CARB_AG_TOTAL_SE <dbl>, CARB_BG_TOTAL_SE <dbl>,
## #   CARB_TOTAL_SE <dbl>, AREA_TOTAL_SE <dbl>, nPlots_TREE <dbl>,
## #   nPlots_AREA <dbl>
bio
## # A tibble: 5 x 6
##    YEAR BIO_AG_ACRE CARB_AG_ACRE BIO_AG_TOTAL CARB_AG_TOTAL AREA_TOTAL
##   <int>       <dbl>        <dbl>        <dbl>         <dbl>      <dbl>
## 1  2014        68.0         34.0    25033359.     12516679.    367884.
## 2  2015        69.1         34.6    25559575.     12779788.    369844.
## 3  2016        70.6         35.3    25852439.     12926219.    366364.
## 4  2017        70.8         35.4    26063235.     13031617.    368373.
## 5  2018        70.4         35.2    25823833.     12911916.    366959.


Adding grouping variables

To add grouping variables to the above procedures, we can simply add the names of the variables we wish to group by to the group_by call:

## Grouping by Ownership group (OWNGRPCD)
## Estimate Tree totals
tre_bioGrp <- data %>%
  filter(EVAL_TYP == 'EXPVOL') %>%
  ## Make sure we only have unique observations of plots, trees, etc.
  distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, SUBP, TREE, .keep_all = TRUE) %>%
  ## Plot-level estimates first (multiplying by EXPNS here)
  group_by(YEAR, OWNGRPCD, ESTN_UNIT_CN, ESTN_METHOD, STRATUM_CN, PLT_CN) %>%
  summarize(bioPlot = sum(DRYBIO_AG * TPA_UNADJ * tAdj * tDI * EXPNS  / 2000, na.rm = TRUE),
            carbPlot = sum(CARBON_AG * TPA_UNADJ * tAdj * tDI * EXPNS  / 2000, na.rm = TRUE)) %>%
  ## Now we simply sum the values of each plot (expanded w/ EXPNS)
  ## to obtain population totals
  group_by(YEAR, OWNGRPCD) %>%
  summarize(BIO_AG_TOTAL = sum(bioPlot, na.rm = TRUE),
            CARB_AG_TOTAL = sum(carbPlot, na.rm = TRUE))

## Estimate Area totals
area_bioGrp <- data %>%
  filter(EVAL_TYP == 'EXPCURR') %>%
  ## Make sure we only have unique observations of plots, trees, etc.
  distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, .keep_all = TRUE) %>%
  ## Plot-level estimates first (multiplying by EXPNS here)
  group_by(YEAR, OWNGRPCD, ESTN_UNIT_CN, ESTN_METHOD, STRATUM_CN, PLT_CN) %>%
  summarize(forArea = sum(CONDPROP_UNADJ * aAdj * aDI * EXPNS, na.rm = TRUE)) %>%
  ## Now we simply sum the values of each plot (expanded w/ EXPNS)
  ## to obtain population totals
  group_by(YEAR, OWNGRPCD) %>%
  summarize(AREA_TOTAL = sum(forArea, na.rm = TRUE))

## Now we can simply join these two up, and produce ratio estimates
bioGrp <- left_join(tre_bioGrp, area_bioGrp) %>%
  mutate(BIO_AG_ACRE = BIO_AG_TOTAL / AREA_TOTAL,
         CARB_AG_ACRE = CARB_AG_TOTAL / AREA_TOTAL) %>%
  ## Reordering the columns
  select(YEAR, OWNGRPCD, BIO_AG_ACRE, CARB_AG_ACRE, BIO_AG_TOTAL, CARB_AG_TOTAL, AREA_TOTAL)
## Joining, by = c("YEAR", "OWNGRPCD")
## Now let's compare with rFIA.... looks like a match!
biomass(clipFIA(fiaRI), totals = TRUE, grpBy = OWNGRPCD)
## # A tibble: 2 x 38
##    YEAR OWNGRPCD NETVOL_ACRE SAWVOL_ACRE BIO_AG_ACRE BIO_BG_ACRE BIO_ACRE
##   <int>    <int>       <dbl>       <dbl>       <dbl>       <dbl>    <dbl>
## 1  2018       30       2615.       1614.        71.8        14.4     86.2
## 2  2018       40       2436.       1332.        69.7        13.8     83.6
## # ... with 31 more variables: CARB_AG_ACRE <dbl>, CARB_BG_ACRE <dbl>,
## #   CARB_ACRE <dbl>, NETVOL_TOTAL <dbl>, SAWVOL_TOTAL <dbl>,
## #   BIO_AG_TOTAL <dbl>, BIO_BG_TOTAL <dbl>, BIO_TOTAL <dbl>,
## #   CARB_AG_TOTAL <dbl>, CARB_BG_TOTAL <dbl>, CARB_TOTAL <dbl>,
## #   AREA_TOTAL <dbl>, NETVOL_ACRE_SE <dbl>, SAWVOL_ACRE_SE <dbl>,
## #   BIO_AG_ACRE_SE <dbl>, BIO_BG_ACRE_SE <dbl>, BIO_ACRE_SE <dbl>,
## #   CARB_AG_ACRE_SE <dbl>, CARB_BG_ACRE_SE <dbl>, CARB_ACRE_SE <dbl>,
## #   NETVOL_TOTAL_SE <dbl>, SAWVOL_TOTAL_SE <dbl>, BIO_AG_TOTAL_SE <dbl>,
## #   BIO_BG_TOTAL_SE <dbl>, BIO_TOTAL_SE <dbl>, CARB_AG_TOTAL_SE <dbl>,
## #   CARB_BG_TOTAL_SE <dbl>, CARB_TOTAL_SE <dbl>, AREA_TOTAL_SE <dbl>,
## #   nPlots_TREE <dbl>, nPlots_AREA <dbl>
bioGrp
## # A tibble: 15 x 7
## # Groups:   YEAR [5]
##     YEAR OWNGRPCD BIO_AG_ACRE CARB_AG_ACRE BIO_AG_TOTAL CARB_AG_TOTAL AREA_TOTAL
##    <int>    <int>       <dbl>        <dbl>        <dbl>         <dbl>      <dbl>
##  1  2014       30        69.9         34.9     7513432.      3756716.    107532.
##  2  2014       40        67.3         33.6    17519927.      8759963.    260353.
##  3  2014       NA       NaN          NaN             0             0          0 
##  4  2015       30        70.7         35.3     7441888.      3720944.    105299.
##  5  2015       40        68.5         34.2    18117688.      9058844.    264546.
##  6  2015       NA       NaN          NaN             0             0          0 
##  7  2016       30        71.8         35.9     7666303.      3833151.    106763.
##  8  2016       40        70.1         35.0    18186136.      9093068.    259601.
##  9  2016       NA       NaN          NaN             0             0          0 
## 10  2017       30        70.5         35.3     8015385.      4007692.    113648.
## 11  2017       40        70.9         35.4    18047850.      9023925.    254725.
## 12  2017       NA       NaN          NaN             0             0          0 
## 13  2018       30        71.8         35.9     8090815.      4045408.    112702.
## 14  2018       40        69.7         34.9    17733018.      8866509.    254256.
## 15  2018       NA       NaN          NaN             0             0          0
If adapting this code for your own use, make sure that your grouping variables are included in the select calls in Data Preperation, otherwise the variable will not be found in data.



With Sampling Errors

Computing estimates with associated sampling errors is a bit more complex than what we saw above, as we can no longer rely on EXPNS to do the heavy lifting for us. In short, we will add a few additional steps when computing tree totals and area totals, summarizing at the strata and estimation unit level along the way. When adding grouping variables, we will need to modify our code further, treating each group as a unique population and summarizing these populations individually. We will follow the procedures outlined by Bechtold and Patterson (2005) (see section 4) for our computations.

Before we get started, Let’s check out what type of estimation methods were used to determine values in the POP tables, as this will influence which variance estimators we use.

unique(db$POP_EVAL$ESTN_METHOD)
## [1] "Post-Stratification"

Looks like just post-stratification, so we will use the stratified random sampling estimation approach (known weights)

No grouping variables

First we estimate both tree and area attributes at the plot level, and then join these estimates before proceeding to the strata level so that we can compute the covariance between area and tree attributes

## Estimate Tree attributes -- plot level
tre_pop <- data %>%
  filter(EVAL_TYP == 'EXPVOL') %>%
  ## Make sure we only have unique observations of plots, trees, etc.
  distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, SUBP, TREE, .keep_all = TRUE) %>%
  ## Plot-level estimates first (note we are NOT using EXPNS here)
  group_by(YEAR, ESTN_UNIT_CN, STRATUM_CN, PLT_CN) %>%
  summarize(bioPlot = sum(DRYBIO_AG * TPA_UNADJ * tAdj * tDI  / 2000, na.rm = TRUE),
            carbPlot = sum(CARBON_AG * TPA_UNADJ * tAdj * tDI  / 2000, na.rm = TRUE)) 

## Estimate Area attributes -- plot level
area_pop <- data %>%
  filter(EVAL_TYP == 'EXPCURR') %>%
  ## Make sure we only have unique observations of plots, trees, etc.
  distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, .keep_all = TRUE) %>%
  ## Plot-level estimates first (multiplying by EXPNS here)
  ## Extra grouping variables are only here so they are carried through
  group_by(YEAR, P1POINTCNT, P1PNTCNT_EU, P2POINTCNT, AREA_USED, ESTN_UNIT_CN, STRATUM_CN, PLT_CN) %>%
  summarize(forArea = sum(CONDPROP_UNADJ * aAdj * aDI, na.rm = TRUE))
  
## Joining the two tables
bio_pop_plot <- left_join(tre_pop, area_pop)
## Joining, by = c("YEAR", "ESTN_UNIT_CN", "STRATUM_CN", "PLT_CN")

Now that we have both area and tree attributes in the same table, we can follow through with the remaining estimation procedures at the strata and estimation unit levels:

## Strata level
bio_pop_strat <- bio_pop_plot %>%
  group_by(YEAR, ESTN_UNIT_CN, STRATUM_CN) %>%
  summarize(aStrat = mean(forArea, na.rm = TRUE), # Area mean
            bioStrat = mean(bioPlot, na.rm = TRUE), # Biomass mean
            carbStrat = mean(carbPlot, na.rm = TRUE), # Carbon mean
            ## We don't want a vector of these values, since they are repeated
            P2POINTCNT = first(P2POINTCNT),
            AREA_USED = first(AREA_USED),
            ## Strata weight, relative to estimation unit
            w = first(P1POINTCNT) / first(P1PNTCNT_EU),
            ## Strata level variances
            aVar = (sum(forArea^2) - sum(P2POINTCNT * aStrat^2)) / (P2POINTCNT * (P2POINTCNT-1)),
            bioVar = (sum(bioPlot^2) - sum(P2POINTCNT * bioStrat^2)) / (P2POINTCNT * (P2POINTCNT-1)),
            carbVar = (sum(carbPlot^2) - sum(P2POINTCNT * carbStrat^2)) / (P2POINTCNT * (P2POINTCNT-1)),
            ## Strata level co-varainces
            bioCV = (sum(forArea*bioPlot) - sum(P2POINTCNT * aStrat *bioStrat)) / (P2POINTCNT * (P2POINTCNT-1)),
            carbCV = (sum(forArea*carbPlot) - sum(P2POINTCNT * aStrat *carbStrat)) / (P2POINTCNT * (P2POINTCNT-1)))

## Moving on to the estimation unit
bio_pop_eu <- bio_pop_strat %>%
  group_by(YEAR, ESTN_UNIT_CN) %>%
  summarize(aEst = sum(aStrat * w, na.rm = TRUE) * first(AREA_USED), # Mean Area
            bioEst = sum(bioStrat * w, na.rm = TRUE) * first(AREA_USED), # Mean biomass
            carbEst = sum(carbStrat * w, na.rm = TRUE) * first(AREA_USED), # Mean carbon
            ## Estimation unit variances
            aVar = (first(AREA_USED)^2 / sum(P2POINTCNT)) * 
              (sum(P2POINTCNT*w*aVar) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*aVar)),
            bioVar = (first(AREA_USED)^2 / sum(P2POINTCNT)) * 
              (sum(P2POINTCNT*w*bioVar) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*bioVar)),
            carbVar = (first(AREA_USED)^2 / sum(P2POINTCNT)) * 
              (sum(P2POINTCNT*w*carbVar) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*carbVar)),
            ## Estimation unit covariances
            bioCV = (first(AREA_USED)^2 / sum(P2POINTCNT)) * 
              (sum(P2POINTCNT*w*bioCV) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*bioCV)),
            carbCV = (first(AREA_USED)^2 / sum(P2POINTCNT)) * 
              (sum(P2POINTCNT*w*carbCV) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*carbCV)))

Finally, we can sum attributes across estimation units to obtain totals for our region:

## sum across Estimation Units for totals
bio_pop <- bio_pop_eu %>%
  group_by(YEAR) %>%
  summarize(AREA_TOTAL = sum(aEst, na.rm = TRUE),
            BIO_AG_TOTAL = sum(bioEst, na.rm = TRUE),
            CARB_AG_TOTAL = sum(carbEst, na.rm = TRUE),
            ## Ratios
            BIO_AG_ACRE = BIO_AG_TOTAL / AREA_TOTAL,
            CARB_AG_ACRE = CARB_AG_TOTAL / AREA_TOTAL,
            ## Total samping errors
            AREA_TOTAL_SE = sqrt(sum(aVar, na.rm = TRUE)) / AREA_TOTAL * 100,
            BIO_AG_TOTAL_SE = sqrt(sum(bioVar, na.rm = TRUE)) / BIO_AG_TOTAL * 100, 
            CARB_AG_TOTAL_SE = sqrt(sum(carbVar, na.rm = TRUE)) / CARB_AG_TOTAL * 100,
            ## Ratio variances
            bioAcreVar = (1 / AREA_TOTAL^2) * (sum(bioVar) + (BIO_AG_ACRE^2)*sum(aVar) - 2*BIO_AG_ACRE*sum(bioCV)),
            carbAcreVar = (1 / AREA_TOTAL^2) * (sum(carbVar) + (CARB_AG_ACRE^2)*sum(aVar) - 2*CARB_AG_ACRE*sum(carbCV)),
            BIO_AG_ACRE_SE = sqrt(sum(bioAcreVar, na.rm = TRUE)) / BIO_AG_ACRE * 100,
            CARB_AG_ACRE_SE = sqrt(sum(carbAcreVar, na.rm = TRUE)) / CARB_AG_ACRE * 100) %>%
  ## Re ordering, dropping variance
  select(YEAR, BIO_AG_ACRE, CARB_AG_ACRE, BIO_AG_TOTAL, CARB_AG_TOTAL, AREA_TOTAL, BIO_AG_ACRE_SE, 
         CARB_AG_ACRE_SE, BIO_AG_TOTAL_SE, CARB_AG_TOTAL_SE, AREA_TOTAL_SE)

Comparing with rFIA, we get a match!

biomass(clipFIA(fiaRI), totals = TRUE)
## # A tibble: 1 x 37
##    YEAR NETVOL_ACRE SAWVOL_ACRE BIO_AG_ACRE BIO_BG_ACRE BIO_ACRE CARB_AG_ACRE
##   <int>       <dbl>       <dbl>       <dbl>       <dbl>    <dbl>        <dbl>
## 1  2018       2491.       1419.        70.4        14.0     84.4         35.2
## # ... with 30 more variables: CARB_BG_ACRE <dbl>, CARB_ACRE <dbl>,
## #   NETVOL_TOTAL <dbl>, SAWVOL_TOTAL <dbl>, BIO_AG_TOTAL <dbl>,
## #   BIO_BG_TOTAL <dbl>, BIO_TOTAL <dbl>, CARB_AG_TOTAL <dbl>,
## #   CARB_BG_TOTAL <dbl>, CARB_TOTAL <dbl>, AREA_TOTAL <dbl>,
## #   NETVOL_ACRE_SE <dbl>, SAWVOL_ACRE_SE <dbl>, BIO_AG_ACRE_SE <dbl>,
## #   BIO_BG_ACRE_SE <dbl>, BIO_ACRE_SE <dbl>, CARB_AG_ACRE_SE <dbl>,
## #   CARB_BG_ACRE_SE <dbl>, CARB_ACRE_SE <dbl>, NETVOL_TOTAL_SE <dbl>,
## #   SAWVOL_TOTAL_SE <dbl>, BIO_AG_TOTAL_SE <dbl>, BIO_BG_TOTAL_SE <dbl>,
## #   BIO_TOTAL_SE <dbl>, CARB_AG_TOTAL_SE <dbl>, CARB_BG_TOTAL_SE <dbl>,
## #   CARB_TOTAL_SE <dbl>, AREA_TOTAL_SE <dbl>, nPlots_TREE <dbl>,
## #   nPlots_AREA <dbl>
bio_pop
## # A tibble: 5 x 11
##    YEAR BIO_AG_ACRE CARB_AG_ACRE BIO_AG_TOTAL CARB_AG_TOTAL AREA_TOTAL
##   <int>       <dbl>        <dbl>        <dbl>         <dbl>      <dbl>
## 1  2014        68.0         34.0    25033359.     12516679.    367884.
## 2  2015        69.1         34.6    25559575.     12779788.    369844.
## 3  2016        70.6         35.3    25852439.     12926219.    366364.
## 4  2017        70.8         35.4    26063235.     13031617.    368373.
## 5  2018        70.4         35.2    25823833.     12911916.    366959.
## # ... with 5 more variables: BIO_AG_ACRE_SE <dbl>, CARB_AG_ACRE_SE <dbl>,
## #   BIO_AG_TOTAL_SE <dbl>, CARB_AG_TOTAL_SE <dbl>, AREA_TOTAL_SE <dbl>


Adding grouping variables

Unlike estimation without sampling errors, we can NOT just add grouping variables to the above procedures in our group_by call. Rather, we will need to account for absence points here (or zero-length outputs) or our estimates will be artificially inflated if groups are not mutually exclusive at the plot level. Example: The presence of red oak on a plot does not negate the potential presence of white ash on the same plot. Therefor, there should be a zero listed for each species not found on the plot. We accomplish this by treating each group as a seperate population, computing each individually, and then rejoining the groups at the end of the operation.

First let’s make a dataframe where each row defines a group that we want to estimate:

## All groups we want to estimate 
grps <- data %>%
  group_by(YEAR, SPCD) %>%
  summarize()

Each row in grps now defines an individual population that we would like to produce estimates for, thus our end product should have the same number of rows. Thus, we can loop over each of the rows in grps and use the estimation procedures above to estimate attributes for each group.

Before we get started with the loop we need to find a way to define the population of interest for each interation (row). To do that, we will modify our domain indicators from above to reflect whether or not an observation falls within the population defined by grps[i,]. Saving the original domain indicators as variables so they are not overwritten in the loop:

tDI <- data$tDI
aDI <- data$aDI

Now we can start our loop:

## Let's store each output in a list
bio_pop <- list()
## Looping
for (i in 1:nrow(grps)){
  
  ## Tree domain indicator
  data$tDI <- tDI * if_else(data$SPCD == grps$SPCD[i], 1, 0) * if_else(data$YEAR == grps$YEAR[i], 1, 0)
  ## No need to modify the area domain indicator for SPCD, because
  ## we want to estimate all forested area within the year
  data$aDI <- aDI * if_else(data$YEAR == grps$YEAR[i], 1, 0)
  
  ## SAME AS ABOVE, JUST IN A LOOP NOW --> SIMILAR TO USING GROUP_BY
  ## Estimate Area attributes -- plot level
  tre_pop <- data %>%
    filter(EVAL_TYP == 'EXPVOL') %>%
    ## Make sure we only have unique observations of plots, trees, etc.
    distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, SUBP, TREE, .keep_all = TRUE) %>%
    ## Plot-level estimates first (note we are NOT using EXPNS here)
    group_by(ESTN_UNIT_CN, STRATUM_CN, PLT_CN) %>%
    summarize(bioPlot = sum(DRYBIO_AG * TPA_UNADJ * tAdj * tDI  / 2000, na.rm = TRUE),
              carbPlot = sum(CARBON_AG * TPA_UNADJ * tAdj * tDI  / 2000, na.rm = TRUE)) 
  
  ## Estimate Area attributes -- plot level
  area_pop <- data %>%
    filter(EVAL_TYP == 'EXPCURR') %>%
    ## Make sure we only have unique observations of plots, trees, etc.
    distinct(ESTN_UNIT_CN, STRATUM_CN, PLT_CN, CONDID, .keep_all = TRUE) %>%
    ## Plot-level estimates first (multiplying by EXPNS here)
    ## Extra grouping variables are only here so they are carried through
    group_by(P1POINTCNT, P1PNTCNT_EU, P2POINTCNT, AREA_USED, ESTN_UNIT_CN, STRATUM_CN, PLT_CN) %>%
    summarize(forArea = sum(CONDPROP_UNADJ * aAdj * aDI, na.rm = TRUE))
  
  ## Joining the two tables
  bio_pop_plot <- left_join(tre_pop, area_pop)
  
  ## Now we can follow through with the remaining estimation procedures
  ## Strata level
  bio_pop_strat <- bio_pop_plot %>%
    group_by(ESTN_UNIT_CN, STRATUM_CN) %>%
    summarize(aStrat = mean(forArea, na.rm = TRUE), # Area mean
              bioStrat = mean(bioPlot, na.rm = TRUE), # Biomass mean
              carbStrat = mean(carbPlot, na.rm = TRUE), # Carbon mean
              ## We don't want a vector of these values, since they are repeated
              P2POINTCNT = first(P2POINTCNT),
              AREA_USED = first(AREA_USED),
              ## Strata weight, relative to estimation unit
              w = first(P1POINTCNT) / first(P1PNTCNT_EU),
              ## Strata level variances
              aVar = (sum(forArea^2) - sum(P2POINTCNT * aStrat^2)) / (P2POINTCNT * (P2POINTCNT-1)),
              bioVar = (sum(bioPlot^2) - sum(P2POINTCNT * bioStrat^2)) / (P2POINTCNT * (P2POINTCNT-1)),
              carbVar = (sum(carbPlot^2) - sum(P2POINTCNT * carbStrat^2)) / (P2POINTCNT * (P2POINTCNT-1)),
              ## Strata level co-varainces
              bioCV = (sum(forArea*bioPlot) - sum(P2POINTCNT * aStrat *bioStrat)) / (P2POINTCNT * (P2POINTCNT-1)),
              carbCV = (sum(forArea*carbPlot) - sum(P2POINTCNT * aStrat *carbStrat)) / (P2POINTCNT * (P2POINTCNT-1)))
  
  ## Moving on to the estimation unit
  bio_pop_eu <- bio_pop_strat %>%
    group_by(ESTN_UNIT_CN) %>%
    summarize(aEst = sum(aStrat * w, na.rm = TRUE) * first(AREA_USED), # Mean Area
              bioEst = sum(bioStrat * w, na.rm = TRUE) * first(AREA_USED), # Mean biomass
              carbEst = sum(carbStrat * w, na.rm = TRUE) * first(AREA_USED), # Mean carbon
              ## Estimation unit variances
              aVar = (first(AREA_USED)^2 / sum(P2POINTCNT)) * 
                (sum(P2POINTCNT*w*aVar) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*aVar)),
              bioVar = (first(AREA_USED)^2 / sum(P2POINTCNT)) * 
                (sum(P2POINTCNT*w*bioVar) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*bioVar)),
              carbVar = (first(AREA_USED)^2 / sum(P2POINTCNT)) * 
                (sum(P2POINTCNT*w*carbVar) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*carbVar)),
              ## Estimation unit covariances
              bioCV = (first(AREA_USED)^2 / sum(P2POINTCNT)) * 
                (sum(P2POINTCNT*w*bioCV) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*bioCV)),
              carbCV = (first(AREA_USED)^2 / sum(P2POINTCNT)) * 
                (sum(P2POINTCNT*w*carbCV) + sum((1-w)*(P2POINTCNT/sum(P2POINTCNT))*carbCV)))
  
  ## Finally, sum across Estimation Units for totals
  bio_pop_total <- bio_pop_eu %>%
    summarize(AREA_TOTAL = sum(aEst, na.rm = TRUE),
              BIO_AG_TOTAL = sum(bioEst, na.rm = TRUE),
              CARB_AG_TOTAL = sum(carbEst, na.rm = TRUE),
              ## Ratios
              BIO_AG_ACRE = BIO_AG_TOTAL / AREA_TOTAL,
              CARB_AG_ACRE = CARB_AG_TOTAL / AREA_TOTAL,
              ## Total samping errors
              AREA_TOTAL_SE = sqrt(sum(aVar, na.rm = TRUE)) / AREA_TOTAL * 100,
              BIO_AG_TOTAL_SE = sqrt(sum(bioVar, na.rm = TRUE)) / BIO_AG_TOTAL * 100, 
              CARB_AG_TOTAL_SE = sqrt(sum(carbVar, na.rm = TRUE)) / CARB_AG_TOTAL * 100,
              ## Ratio variances
              bioAcreVar = (1 / AREA_TOTAL^2) * (sum(bioVar) + (BIO_AG_ACRE^2)*sum(aVar) - 2*BIO_AG_ACRE*sum(bioCV)),
              carbAcreVar = (1 / AREA_TOTAL^2) * (sum(carbVar) + (CARB_AG_ACRE^2)*sum(aVar) - 2*CARB_AG_ACRE*sum(carbCV)),
              BIO_AG_ACRE_SE = sqrt(sum(bioAcreVar, na.rm = TRUE)) / BIO_AG_ACRE * 100,
              CARB_AG_ACRE_SE = sqrt(sum(carbAcreVar, na.rm = TRUE)) / CARB_AG_ACRE * 100) %>%
    ## Re ordering, dropping variance
    select(BIO_AG_ACRE, CARB_AG_ACRE, BIO_AG_TOTAL, CARB_AG_TOTAL, AREA_TOTAL, BIO_AG_ACRE_SE, 
           CARB_AG_ACRE_SE, BIO_AG_TOTAL_SE, CARB_AG_TOTAL_SE, AREA_TOTAL_SE)
  
  ## Saving the output in our list...
  bio_pop[[i]] <- bio_pop_total
}

Great, we have our estimates! Except they are locked up in a list. Converting back to a data.frame and rejoining with grps:

bio_pop <- bind_rows(bio_pop)
bio_pop_sp <- bind_cols(grps, bio_pop)

Comparing with rFIA, we have a match!

head(biomass(clipFIA(fiaRI), totals = TRUE, bySpecies = TRUE))
## # A tibble: 6 x 40
##    YEAR  SPCD COMMON_NAME SCIENTIFIC_NAME NETVOL_ACRE SAWVOL_ACRE BIO_AG_ACRE
##   <int> <int> <chr>       <chr>                 <dbl>       <dbl>       <dbl>
## 1  2018    12 balsam fir  Abies balsamea        0.422       0         0.00697
## 2  2018    43 Atlantic w~ Chamaecyparis ~       2.62        0.742     0.0403 
## 3  2018    68 eastern re~ Juniperus virg~       1.34        0         0.0308 
## 4  2018   126 pitch pine  Pinus rigida         55.7        43.8       1.18   
## 5  2018   129 eastern wh~ Pinus strobus       481.        406.        8.52   
## 6  2018   130 Scotch pine Pinus sylvestr~       0.156       0         0.00314
## # ... with 33 more variables: BIO_BG_ACRE <dbl>, BIO_ACRE <dbl>,
## #   CARB_AG_ACRE <dbl>, CARB_BG_ACRE <dbl>, CARB_ACRE <dbl>,
## #   NETVOL_TOTAL <dbl>, SAWVOL_TOTAL <dbl>, BIO_AG_TOTAL <dbl>,
## #   BIO_BG_TOTAL <dbl>, BIO_TOTAL <dbl>, CARB_AG_TOTAL <dbl>,
## #   CARB_BG_TOTAL <dbl>, CARB_TOTAL <dbl>, AREA_TOTAL <dbl>,
## #   NETVOL_ACRE_SE <dbl>, SAWVOL_ACRE_SE <dbl>, BIO_AG_ACRE_SE <dbl>,
## #   BIO_BG_ACRE_SE <dbl>, BIO_ACRE_SE <dbl>, CARB_AG_ACRE_SE <dbl>,
## #   CARB_BG_ACRE_SE <dbl>, CARB_ACRE_SE <dbl>, NETVOL_TOTAL_SE <dbl>,
## #   SAWVOL_TOTAL_SE <dbl>, BIO_AG_TOTAL_SE <dbl>, BIO_BG_TOTAL_SE <dbl>,
## #   BIO_TOTAL_SE <dbl>, CARB_AG_TOTAL_SE <dbl>, CARB_BG_TOTAL_SE <dbl>,
## #   CARB_TOTAL_SE <dbl>, AREA_TOTAL_SE <dbl>, nPlots_TREE <dbl>,
## #   nPlots_AREA <dbl>
bio_pop_sp
## # A tibble: 253 x 12
## # Groups:   YEAR [5]
##     YEAR  SPCD BIO_AG_ACRE CARB_AG_ACRE BIO_AG_TOTAL CARB_AG_TOTAL AREA_TOTAL
##    <int> <int>       <dbl>        <dbl>        <dbl>         <dbl>      <dbl>
##  1  2014    12     0.00708      0.00354        2604.         1302.    367884.
##  2  2014    43     0.0300       0.0150        11020.         5510.    367884.
##  3  2014    68     0.0293       0.0146        10763.         5382.    367884.
##  4  2014    96     0            0                 0             0     367884.
##  5  2014   126     1.22         0.608        447070.       223535.    367884.
##  6  2014   129     7.99         4.00        2939774.      1469887.    367884.
##  7  2014   130     0.00325      0.00163        1197.          599.    367884.
##  8  2014   261     0.763        0.382        280807.       140404.    367884.
##  9  2014   313     0.0141       0.00703        5173.         2586.    367884.
## 10  2014   316    15.0          7.50        5516182.      2758091.    367884.
## # ... with 243 more rows, and 5 more variables: BIO_AG_ACRE_SE <dbl>,
## #   CARB_AG_ACRE_SE <dbl>, BIO_AG_TOTAL_SE <dbl>, CARB_AG_TOTAL_SE <dbl>,
## #   AREA_TOTAL_SE <dbl>
Previous