Skip to content
This repository has been archived by the owner on Apr 11, 2020. It is now read-only.

dis-organization/sfcc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sfcc

The goal of sfcc is to provide fast and flexible creation of sf simple features geometries.

Originally for sfc vectors of POINT, directly motivated by the set-based idioms of silicate, and the need to bypass the thorough checking done by the sf constructors.

The design for the code is this:

  • c++ function for each of POINT, MULTIPOINT, POLYGON, etc. returns a list of sfg
  • input is matrix of coordinates (or maybe a list) and its identifier XY (XYZ, XYZM, etc)
  • groups can be specified, object, subobject, path (equivalent to gibble)
  • the set-based rationale is more efficient for many cases than running the st_polygon and friends functions

For example

  • points_cpp compiles to an internal R function
  • wrap that with a points_rcpp R function to ensure sanity of inputs
  • wrap that with a mk_sfc_POINT R function to organize the inputs and apply the sfc scaffolding

For MULTIPOLYGON

  • multipolygons_cpp compiles to an internal R function
  • wrap that with multipolygons_rcpp R function
  • wrap that with mk_sfc_MULTIPOLYGON

What might also work is to have a single POLYGON constructor, that optionally applies the sfg level classing. Then MULTIPOLYGON wrapper can call that for each of its sub objects, with "no class please" and then apply its own class at the right level. But, these things are pretty simple so a bit of duplication is also ok.

It's not finalize what the best way to specify grouping is, but it's driven from silicate. (There's hopefully a flexible set of different ways to do this, including aes(x, y, group = ..., subobject = ...) and other basics like earcut.hpp, silicate objects, other types).

For now we don't consider GEOMETRY (sfc vectors with mixed topology).

Installation

I wouldn't use this as-is, yet. :)

Example

This example was motivated by this issue, trying to speed up sf creation for POINTs.

The improvement is fairly modest, so there might be a better Rcpp way to do this?

N <- 1e5
x <- data.frame(a = sample(letters, N, replace = TRUE),
                 lng = runif(N, -120, -100),
                 lat = runif(N, 30, 48))
## we avoid st_sf so we can compare apples (just the sfc creation)
sf_mk_points <- function(x, coords) {
        classdim = sf:::getClassDim(rep(0, length(coords)), length(coords), dim, "POINT")
        structure( lapply(split(as.vector(t(as.matrix(x[, coords]))),
                rep(seq_len(nrow(x)), each = length(coords))),
                ## it does help to create this as a template and update in this loop
                ## but not as much as list-creation in C++
                function(vec) structure(vec, class = classdim)),
            n_empty = 0L, precision = 0, crs = NA_crs_,
            bbox = structure(
                c(xmin = min(x[[coords[1]]], na.rm = TRUE),
                ymin = min(x[[coords[2]]], na.rm = TRUE),
                xmax = max(x[[coords[1]]], na.rm = TRUE),
                ymax = max(x[[coords[2]]], na.rm = TRUE)), class = "bbox"),
            class =  c("sfc_POINT", "sfc" ))
}

m <- cbind(x$lng, x$lat)
library(rbenchmark)

library(sf)
#> Linking to GEOS 3.6.2, GDAL 2.2.4, proj.4 4.9.3
benchmark(
  sfcc_pt =  sfcc::mk_sfc_POINT(m, crs = 4326),
  sf_pt = sf_mk_points(x, coords = c("lng", "lat"))
  ,replications = 10
)
#>      test replications elapsed relative user.self sys.self user.child
#> 2   sf_pt           10  16.844   17.787    16.815    0.016          0
#> 1 sfcc_pt           10   0.947    1.000     0.935    0.012          0
#>   sys.child
#> 2         0
#> 1         0

A polygon example (motivated by spex::polygonize)

qm <- quadmesh::quadmesh(raster::raster(volcano), z = NULL)
## a dummy structure to copy
template <- structure(list(cbind(1:5, 0)),
                      class = c("XY", "POLYGON", "sfg"))

nquads <- ncol(qm$ib)

idx <- rep(seq(0, nquads), each = 5L) + c(1L, 2L, 3L, 4L, 1L)
quadgroups <- rep(seq_len(nquads), each = 5)
idx <- rbind(qm$ib, qm$ib[1,])
xylist <- split(c(t(qm$vb[1:2, idx])), rep(quadgroups, 2L))


## slower code

system.time({
 l <- lapply(seq_along(xylist), function(ii) {
   template[[1L]][] <- xylist[[ii]]
   template
 })
})
#>    user  system elapsed 
#>   0.051   0.000   0.051


## Rcpp code is a bit faster

library(sfcc)
## we were working with a flat vector
xylistm <- lapply(xylist, function(v) matrix(v, ncol = 2))
system.time(p <-  polygons_rcpp(xylistm))
#>    user  system elapsed 
#>   0.012   0.000   0.012
identical(l, p)
#> [1] TRUE

About

sf constructors - FORGET THIS see sfheaders

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published