Using velox and dplyr to efficiently calculate proportional landcover (‘pland’) in R ## Introduction

Environmental covariates are fundamental to spatially-explicit modeling in ecology. There are hundreds of metrics that have been used to characterize and quantify landscapes in ways that are biologically relevant, from average rainfall to net primary production. One such environmental covariate that is particularly popular in distribution modeling is proportional landcover. Calculated using a buffer around each point, this metric describes the composition of the surrounding landscape in terms of the proportion of each represented landcover class. The historical context for this metric comes from the well-known software, FRAGSTATS (McGarigal and Marks 1995), which includes this calculation – termed “pland.”

Now that R is so popular, FRAGSTATS has faded out a bit and has been all but replaced by the R package `landscapemetrics`. This package offers lots of tools and a straightforward workflow for the analysis of landscapes. The tutorial I’m offering here, though, is a bit more tailored to calculating proportional landcover very efficiently, with a very applicable example using the National Land Cover Database (NLCD). This database is released by USGS and is very commonly used in analysis of landscapes from regions within the United States. One of the strengths of NLCD is that it offers landcover classifications at a very fine spatial resolution of 30m. This resolution surely provides an abundance of information and data to work with, however, it’s computationally intensive. To deal with this, we will implement the R package `velox`, which was designed to achieve much faster raster extraction operations compared to other solutions within R. Much like the `dplyr` package, `velox` can achieve this efficiency by essential serving as a wrapper that conducts the operations in C++.

## Methods

### Generating sample coordinates

To start, we’ll load the packages the following pacakges for use throughout the walkthrough.

``````library(dplyr)
library(rnaturalearth)
library(sf)
``````

Jumping into things, we need to get a polygon for Massachusetts, which we can do via the `rnaturalearth` package. Conveniently, this comes pre-packaged as an sf object, which means we can easily transform its CRS without having to convert the object class. Since we’re dealing with proportional landcover, it’s very important that we use an equal-area projection to maintain comparable buffers around each extraction location. Here we use the Albers Equal Area projection, specified to match the CRS for the nlcd data. For reproducibility, we then sample 10 arbitrary locations from wihtin this polygon to represent the points we intend to use for raster extraction.

``````state = ne_states(iso_a2 = "US", returnclass = "sf") %>%    # pull admin. bounds. for US
filter(iso_3166_2 == "US-MA") %>%                         # select Massachusetts
st_transform(crs = "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80
+towgs84=0,0,0,0,0,0,0 +units=m +no_defs") # nlcd crs

pts = st_sample(state, size = 10, type = "regular")         # sample 10 points in polygon

# Plot!
plot(st_transform(pts, 4326), col="red", pch=20)
`````` Fig. 1: Points plotted in red are locations for raster extraction

### Fetching NLCD data

Next we need to get the NLCD data. Luckily, this doesn’t involve any manual downloads or organizing files, etc.; instead, we can use `get_nlcd()` from the `FedData` package to do all of the footwork for us.

``````library(FedData)

nlcd = get_nlcd(        # fn for pulling NLCD data
label = "4pland",     # this is just a character string label
year = 2011,          # specify the year (2016 should be available soon)
force.redo = F
)

# Plot!
plot(nlcd)
`````` Fig. 2: Points plotted in black are locations for raster extraction

### Extract using raster::extract()

Most of us first learn how to do these calculations within the `raster` package, which we’ll go through here first for comparison to the same process in `velox`. In `raster`, this is a very simple operation, where we pass the raster and the points the `extract()` function, along with a specification for the buffer in meters (1000 m = 1 km). We also specify `df = TRUE`, so that the output data is a dataframe instead of a list object. This is convenient and straightforward, and works very well for this example, but it is computationally intensive and can be nearly intractable for very large rasters.

``````library(raster)

ex.mat.r = extract(nlcd, as(pts, 'Spatial'), buffer=1000, df=T) # raster extract
``````

### Extract using velox

However, in more complicated analyses with more points and larger areas (read: more pixels), this is where the `velox` method really shines. First, we convert the raster to a velox object, and then we generate the point-specific buffers and convert them to a spatial polygons data frame, making sure that they all have IDs. At this point, we can now extract pixels within thhe buffers from the raster. This results in a dataframe just like the output from the extract function.

``````library(velox)
library(sp)

nlcd.vx = velox(stack(nlcd))                                  # raster for velox
sp.buff = gBuffer(as(pts, 'Spatial'), width=1000, byid=TRUE)  # spatial buffer, radius in meters
buff.df = SpatialPolygonsDataFrame(
sp.buff,
data.frame(id=1:length(sp.buff)),                 # set ids
FALSE)                                            # df of buffers
ex.mat.vx = nlcd.vx\$extract(buff.df, df=T)                    # extract buffers from velox raster
rm(nlcd.vx) # removing the velox raster can free up space
``````

### Calculate pland

No matter which method you choose, dplyr is a great choice for carrying out the pland calculations, which you can see below. Compared to what would otherwise be a mess of nested `for` and `if` loops, this workflow is far more streamlined and easier to follow, and takes much less time.

``````prop.lc = ex.mat.vx %>%
setNames(c("ID", "lc_type")) %>%        # rename for ease
group_by(ID, lc_type) %>%               # group by point (ID) and lc class
summarise(n = n()) %>%                  # count the number of occurences of each class
mutate(pland = n / sum(n)) %>%          # calculate percentage
ungroup() %>%                           # convert back to original form
dplyr::select(ID, lc_type, pland) %>%   # keep only these vars
complete(ID, nesting(lc_type),
fill = list(pland = 0)) %>%             # fill in implicit landcover 0s
spread(lc_type, pland)                  # convert to long format
``````

### Assign landcover class names

Finally, for convenience, we can map the landscape class numbers back to their original names. This part is a bit in-the-weeds, but it makes any following analyses far easier. In the code below, we pull a hash table form the `nlcd` object and then use `merge.data.frame()` to match values from the hash table to the dataframe containing the pland values. `merge.data.frame()` is an extremely useful function, similar to `VLOOKUP` in Excel, amd has been designed for exactly this purpose of mapping values from one dataframe to another,

``````nlcd_cover_df = as.data.frame(nlcd@data@attributes[]) %>%      # reference the name attributes
subset(NLCD.2011.Land.Cover.Class != "") %>%                    # only those that are named
dplyr::select(ID, NLCD.2011.Land.Cover.Class)                   # keep only the ID and the lc class name
lc.col.nms = data.frame(ID = as.numeric(colnames(prop.lc[-1])))   # select landcover classes
matcher = merge.data.frame(x = lc.col.nms,                        # analogous to VLOOKUP in Excel
y = nlcd_cover_df,
by.x = "ID",
all.x = T)
colnames(prop.lc) = c("ID", as.character(matcher[,2]))            # assign new names
``````

## Results

This looks great. For a quick checking, summing the values for each row across the columns should add up to 1, which looks right from a quick check!

``````print(prop.lc)
``````
``````# A tibble: 9 x 16
ID    `Open Water` `Developed, Ope… `Developed, Low… `Developed, Med… `Developed, Hig… `Barren Land`
<fct>        <dbl>            <dbl>            <dbl>            <dbl>            <dbl>         <dbl>
1 1          0                 0.288           0.314            0.300           0.0437         0.00175
2 2          0.00729           0.132           0.103            0.0545          0.00933        0
3 3          0.0140            0.130           0.0998           0.0692          0.0131         0.00146
4 4          0.00320           0.0300          0.0116           0.00146         0              0
5 5          0.00291           0.0921          0.0752           0.0807          0.0245         0
6 6          0.257             0.0364          0.105            0.0489          0.000582       0.143
7 7          0.00320           0.0501          0.00815          0.00844         0              0
8 8          0.00292           0.128           0.151            0.0814          0.00787        0
9 9          0.991             0               0                0               0              0.00437
# … with 9 more variables: `Deciduous Forest` <dbl>, `Evergreen Forest` <dbl>, `Mixed Forest` <dbl>,
#   `Shrub/Scrub` <dbl>, Herbaceuous <dbl>, `Hay/Pasture` <dbl>, `Cultivated Crops` <dbl>, `Woody
#   Wetlands` <dbl>, `Emergent Herbaceuous Wetlands` <dbl>
``````

I hope this has been tutorial has provided a convenient and straightforward demonstration of how `velox` can fit into your workflow!

1. McGarigal, K. andBJ Marks. 1994. FRAGSTATS v2: Spatial Pattern Analysis Program for Categorical and Continuous Maps. Computer software program produced by the authors at the University of Massachusetts, Amherst. Available at the following web site: http://www.umass.edu/landeco/research/fragstats/fragstats.html