May 2024

Reached 10K lines of code.

RcppEigen version upgraded to 0.3.4

RcppEigen has been updated to Eigen v3.4.0 (version 0.3.4 of the RcppEigen package released on CRAN on 2024-02-28). There are no more restrictions for using Eigen v3.3.9, therefore the whole C++ codebase will upgrade the Eigen version to the 3.4.0. Code will not compile anymore with Eigen v3.3.9 as it already makes use of slicing, indexing and Eigen support for STL iterators available only in the 3.4 version.

The fdaPDE-R package DESCRIPTION and the various CMakeLists.txt files have been updated. femR will need to update its DESCRIPTION file when aligned to the mainstream fdaPDE-core library.

As a consequence of this upgrade, there are no limitations for using the C++20 standard, as announced in February 2024.

Warnings are errors

From this update on, to achieve high-quality code, tests are compiled with the options -Wall -Wpedantic -Wextra -Werror. This setting enables the majority of the interesting warnings, and considers warnings as errors (-Werror flag). That said, warnings cannot be ignored anymore. This options should help also for a less painfull interface with the CRAN rules.

core

  • geometry: the geometry module is under a major rewriting, and is the first step toward a more general core library system. While work is still in progress, here a list of the already official features:

    Nomenclature

    Inside the geometry module, we will refer with cell what in finite element analysis is usually named element, that is, depending on the dimensionality of the domain, a cell will be a segment (1D, 1.5D), a triangle (2D, 2.5D) or a tetrahedron (3D).

    We will name node a vertex of a cell. For triangular meshes, the name edge refers to a boundary segment of a triangle. For tetrahedral meshes, the name face denotes a boundary triangle of a tetrahedron, while the name edge refer to a boundary segment of such triangle, i.e. is a 3D segment (to be cristal clear, a tetrahedron has 4 faces and 6 edges).

    • template Element<int LocalDim, int EmbedDim> has been removed. Element was a generic way to indicate a single segment/triangle/tetrahedron inside a mesh. Its presence was a problem, as the resulting interface was a bit controversial (for instance, a triangle had to define both a concept of face and of edge, which in this case are equivalent). Moreover, it was impossible to work on a simple cell without having an entire mesh structure, which was a bit annoying.

    The following structure has been introduced:

    • Simplex: Simplex<Order, EmbedDim> represents a generic simplex embedded in an EmbedDim-dimensional space. The template parameter Order sets the order of the simplex, e.g. depdending on its value, Simplex represents a segment (Order = 1), a triangle (Order = 2) or a tetrahedron (Order = 3). Higher orders are available, but from a geometrical viewpoint there is no interest in having Order > 3. The following geometric operations are available on a simplex:

      Simplex API

      measure()

      measure of the simplex.

      bounding_box()

      returns the smallest rectangle containing the simplex.

      barycenter()

      returns the midpoint (center of mass) of the simplex.

      barycentric_coords(const NodeType& p)

      given a point \(p\), rewrites it in the barycentric coordinate system of the simplex.

      circumcenter()

      returns the center of the (hyper)-sphere passing through the vertices of the simplex.

      circumradius()

      returns the radius of the (hyper)-sphere passing through the vertices of the simplex.

      contains(const NodeType& p)

      given a point \(p\) determines if the point is inside, on a boundary cell, on a vertex or outside of the simplex.

      supporting_plane()

      the HyperPlane<Order, EmbedDim> passing through the simplex.

      normal()

      the normal direction to the simplex (meaningfull only if Order != EmbedDim, otherwise returns zero).

      nearest(const NodeType& p)

      given a free point \(p\), finds the best approximation of \(p\) in the simplex (e.g., the point in the simplex which is nearest to \(p\)).

      Simplex<Order, EmbedDim> also provides a boundary_iterator type which let iterate over the boundary cells of the simplex as Simplex<Order - 1, EmbedDim> simplices (which means that you have all the API above for each boundary cell).

      Note

      The Simplex<Order, EmbedDim> data type is mainly for internal usage. It is in turn inherited by other high-level concepts. Check the Triangle, Tetrahedron, etc. types below. The point of Simplex is to be a free geometric object, i.e. without any connectivity information related to the presence of an underlying triangulation. Neverthless, having the ability to work on a free geometric object turns out to be usefull in many applications.

      Simplex API
      // create a 2D triangle (Simplex of order 2, embedded in a 2D space)
      SMatrix<2, 3> vertices;
      vertices <<
         0.0, 0.0,
         0.5, 0.0,
         0.0, 0.8;
      Simplex<2, 2> triangle(vertices);
      
      triangle.measure();
      triangle.circumcenter();
      // ... see the table above ...
      
      // compute the perimeter of the triangle using STL algorithms :)
      double p = std::accumulate(
         triangle.boundary_begin(), triangle.boundary_end(), 0.0,
         [](double perimeter, auto& f) {
            return perimeter + f.measure();
         });
      
      // explicit for loop over boundary edges
      for(typename Simplex<2, 2>::boundary_iterator it = boundary_begin(); it != boundary_end(); ++it) {
         it->normal();   // normal direction to edge
         it->measure();  // length of edge
         // ... all the standard Simplex API ...
      }
      
    • Elementary geometric entities: introduced templates Triangle, Tetrahedron and Segment.

      Triangle, Tetrahedron, Segment signature
      template <typename Triangulation> class Triangle    : public Simplex<2, Triangulation::embed_dim>;
      template <typename Triangulation> class Tetrahedron : public Simplex<3, 3>;
      template <typename Triangulation> class Segment     : public Simplex<1, Triangulation::embed_dim>;
      

      While Simplex<Order, EmbedDim> is a free geometric object, each of the above types have no meaning if not bounded to a triangulation, e.g. a Triangle exists only as part of a Triangulation<2, EmbedDim> object.

      In addition to the whole Simplex<Order, EmbedDim> API, the following methods related to connectivity information are available:

      Triangle specific API

      id()

      the triangle id in the physical triangulation.

      neighbors()

      returns the ids of neighboring triangles.

      node_ids()

      returns the ids of the triangle vertices.

      on_boundary()

      true if the triangle has at least one edge on the triangulation boundary.

      edge(int i)

      returns the i-th edge as an EdgeType instance. An EdgeType inherits from Simplex<1, Triangulation::embed_dim>, i.e. it represents a geometrical segment, and provides the following:
      • id(): the ID of the edge in the physical triangulation.

      • node_ids(): the ids of the nodes composing the edge.

      • adjacent_cells(): the ids of the cells (i.e., triangles) which share this edge.

      • on_boundary(): true if the edge is on boundary.

      edges_begin()/edges_end()

      first and last iterator to triangle edges.

      Tetrahedron exposes a simlar interface, with the addition of iterators and access methods to each face of the tetrahedron. Check the Triangulation code snippet below for a detailed exposition of the API.

    • Triangulation: template Mesh<int LocalDim, int EmbedDim> has been removed and replaced by the new Triangulation<int LocalDim, int EmbedDim> type.

      Triangulation signature
      template <int LocalDim, int EmbedDim> class Triangulation;
      

      Tip

      The main point of the Triangulation data type is the exposition of iterators to logically iterate over different parts of the mesh. Moreover, the provided iterators are compliant to the std::bidirectional_iterator concept, which let use any STL algorithm over Triangulation iterator ranges. For instance, it is straightforward to compute the measure of the border:

      // just one line, for a complex operation :)
      std::accumulate(mesh.boundary_edges_begin(), mesh.boundary_edges_end(), 0.0,
         [](double p, auto& cell) { return p + cell.measure(); }
      );
      

      Note

      This let us easily parallelize different operations, once we link fdaPDE to the oneTBB library shipped by Intel, thanks to the STL support for parallel algorithms. It is immediate, for instance, to compute the measure of the mesh by a parallel std::reduce.

      Check the code snippet below for a detailed description of the exposed API:

      Triangulation API
      // a planar triangulation (the API below is also available for triangulated surfaces)
      Triangulation<2, 2> mesh(nodes, cells, boundary);
      // Triangulation<2, 2> will compute for you the edges, the neighboring structure and
      // other connectivity informations
      
      mesh.cell(i);   // request i-th cell as a Triangle<2, 2>
      mesh.node(i);   // request i-th node
      
      // a lot of other informations, such as matrix of edges, neighbors, etc. number of edges, cells,
      // boundary edges and boundary nodes, triangulation bounding box, etc.
      // check the source code for more details!
      
      // iterators
      for(typename Triangulation<2, 2>::cell_iterator it = mesh.cells_begin();
          it != mesh.cells_end(); ++it) {
          // all the interface of Triangle<2, 2> is accessible from the iterator it
          it->measure();      // measure of triangle
          it->circumcenter(); // circumcenter of triangle
          // ...
      }
      
      std::for_each(mesh.cells_begin(), mesh.cells_end(), [](auto& cell) {
          // whatever complex operation on your cell (even the assembly of a differential weak form)
      });
      // and the above can be paralellized thanks to the STL :)
      
      // cycle over the border
      for(typename Triangulation<2, 2>::boundary_edge_iterator it = mesh.boundary_edges_begin();
          it != mesh.boundary_edges_end(); ++it) {
          // all the interface of Triangle<2, 2>::EdgeType available from the iterator
      
          it->measure();        // length of edge
          it->barycenter();     // mid-point of the edge
          it->adjacent_cells(); // the id of the triangle sharing this edge
          // ...
      }
      
      // you can also iterate on
      // - the whole set of edges with: edges_begin() / edges_end()
      // - the boundary nodes with    : boundary_nodes_begin() / boundary_nodes_end()
      
      DVector<int> ids = mesh.locate(locs);               // O(log(n_locs)) point location
      std::unordered_set<int> patch = mesh.node_patch(n); // ids of all triangles having n as vertex
      
      // 3D triangulation (tetrahedralizations)
      Triangulation<3, 3> mesh(nodes, cells, boundary);
      
      // all the interface of a Triangulation<2, 2> is available, with the additional capability
      // of indexing and iterating over the faces of each tetrahedron
      
      for(typename Triangulation<3, 3>::boundary_face_iterator it = mesh.boundary_faces_begin();
          it != mesh.boundary_faces_end(); ++it) {
          // all the interface of Tetrahedron<3, 3>::FaceType (i.e., a 3D triangle with connectivity infos)
          // available from the iterator it
      
          it->normal();   // normal direction to the face
          it->measure();  // area of the triangle
          it->edge_ids(); // global ids in the 3D triangulation of the edges making this triangle
      
          // you can in turn cicle on each 3D edge of the current boundary face it
          for(auto jt = it->edges_begin(); jt != it->edges_end(); ++jt) { ... }
             // jt is a Simplex<1, 3> exposing also
             jt->on_boundary(); // well, this is true, we are iterating over the border :)
             jt->id();          // id of the edge in the 3D triangulation
             jt->node_ids();    // ids of nodes composing the edge
          }
      }
      
      // for a 3D mesh, you can get also its surface mesh as a Triangulation<2, 3> instance
      Triangulation<2, 3> surface = mesh.surface();
      
      double p1 = std::accumulate(mesh.boundary_faces_begin(), mesh.boundary_faces_end(), 0.0,
         [](double s, auto& f) { return s + f.measure(); }
      );
      double p2 = std::accumulate(surface.cells_begin(), surface.cells_end(), 0.0,
         [](double s, auto& f) { return s + f.measure(); }
      );
      // and not surprisingly, p1 == p2 :)
      
    • Projection: template class Projection<TriangulationType> implements an exact and a non-exact method for point projection over a Triangulation<LocalDim, EmbedDim>. Given a free point \(p\), the algorithm searches for the best approximation of \(p\) in the triangulation. In this sense, it works both for manifold and non-manifold domains (in the last case, the algorithm returns the point on the 2D/3D triangulation border which is nearest to \(p\)).

      Computational complexity: let \(N\) be the number of cells and \(n\) the number of points to project.

      • Exact version is \(O(nN)\).

      • Assuming \(n \gg N\) (number of points to project much larger than number of cells), approximate version is \(O(n \log(N))\) (it was \(O(nN)\) in fdaPDE-1.1-17).

      Info

      The approximate algorithm computes a KDTree for fast locating the nearest mesh node to \(p\) (with computational cost \(O(N \log(N))\). This cost is negligible if \(n \gg N\), and performed just once ). Once the nearest point \(q\) is found, a search restricted in the patch of \(q\) is performed, i.e., in the set of cells having \(q\) as vertex (computing the patch costs \(O(\log(N))\)), therefore avoiding a brute force search over the entire mesh (which would cost \(O(N)\)). This is an approximate approach since for highly non-convex region the computed point might be not the nearest to \(p\).

      Projection API
      Triangulation<2, 3> surface(nodes, cells, boundary);
      
      DMatrix<double> points; // free points in 3D space
      // perform projection (use C++ automatic template deduction + tag dispatching)
      DMatrix<double> proj_points = Projection(surface)(points, fdapde::Exact);
      DMatrix<double> proj_points = Projection(surface)(points, fdapde::NotExact);
      
      // NotExact projection requires a O(N log(N)) initialization, this is done just once
      // at first call. You can create a Projection instance and cache
      Projection<Triangulation<2, 3>> project(surface);
      project(points, fdapde::NotExact); // silent initialization here
      project(points, fdapde::NotExact); // just perform fast approximate projection
      
      project(points); // defaults to approximate algorithm
      
  • Minor changes:

    • Optimizers support for objective stopping criterion callback: if the objective functor provided to the optimizer exposes a method with the following signature:

      template <typename OptimizerType> bool opt_stopping_criterion(OptimizerType& opt);
      

      any optimizer in the optimization module will execute it to evaluate if convergence has been reached. Users of the optimization module can hence define objective functions with a custom stopping criterion (see, e.g., density estimation in fdaPDE-cpp).

    • Binary matrix: binary matrices have proved to be extremely usefull for handling bitmasks and are getting more and more used in the codebase. The following additional methods are now exposed:

      int n = 50, m = 50;
      BinaryMatrix<Dynamic> bitmask(n, m);
      
      // writable block expressions
      bitmask.block(0, 0, 10, 10) = BinaryMatrix<Dynamic>::Ones(10, 10);
      // reshape operation
      bitmask.reshape(25, 100); // reshape bitmask to a 25 x 100 matrix (no-cost operation)
      bitmask.vector_view();    // linearize the bitmask into a column vector (no-cost operation)
      
      // returns all the indexes set to true/false in the bitmask: O(nm) operation
      bitmask.which(true);  // which true?
      bitmask.which(false); // which false?
      which(bitmask);       // implicitly returns true indexes (R style)
      

cpp

  • Density estimation: official support for density estimation models. Below the API exposed by the DensityEstimationBase core class for the density module:

    DensityEstimationBase API

    n_obs()

    number of active data locations. A data location is active if is it not masked.

    n_locs()

    number of all data locations (e.g., the overall number of observations). It coincides with n_obs() if no observation is masked.

    Psi()

    the matrix \(\Psi\) (evaluation of spatial basis functions at data location) for space-only problems, the matrix \(\Upsilon\) for space-time problems (as defined in Begu, B., Panzeri, S. (2022), Space-Time Density Estimation with Partial Differential Equation Regularization. PACS project. Pag 9.)

    Upsilon()

    if some observation is masked, returns the matrix provided by Psi() where rows corresponding to masked observations are set to zero, otherwise is equivalent to calling Psi(). Upper models should mainly interact with this method, instead of directly calling Psi()

    PsiQuad()

    the matrix of evaluations of the reference basis system at the quadrature nodes. Already tensorized for space-time problems.

    w()

    weights of the quadrature rule used for the approximation of \(\int_{\Omega} e^g\). Already tensorized for space-time problems.

    int_exp(const DVector<double>& g)

    evaluation of \(\int_{\Omega} e^g\)

    int_exp()

    evaluation of \(\int_{\Omega} e^{\hat g}\), where \(\hat g\) is the current estimation of the log density field.

    grad_int_exp(const DVector<double>& g)

    evaluation of the gradient of \(\int_{\Omega} e^g\)

    grad_int_exp()

    evaluation of the gradient of \(\int_{\Omega} e^{\hat g}\), where \(\hat g\) is the current estimation of the log density field.

    g()

    expansion coefficient vector of the log density field.

    f()

    expansion coefficient vector of the density field, e.g. \(f = e^g\).

    masked_obs()

    BinaryVector<Dynamic> of masked observations.

    As always, in addition, a model inheriting from DensityEstimationBase has access to the specific API induced by the choosen regularization. Check the source code for details.

    The masking mechanism

    This is something already shown in February 2024. At the statistical level, “masking” means to remove some observations (the masked ones) from the fitting. This corresponds to set to zero all the rows of the matrix \(\Psi\) (or \(\Upsilon\)) corresponding to masked observations. This mechanism is used, for instance, by the KCV class to perform CV selection of the smoothing parameters.

    By doing so, all models inheriting from DensityEstimationBase have immediate support for smoothing parameter selection by K-fold Cross Validation.

    Class DEPDE implements the density estimation model shown in Ferraccioli, F., Arnone, E., Finos, L., Ramsay, J.O., Sangalli, L.M. (2021), Nonparametric density estimation over complicated domains, Journal of the Royal Statistical Society (space-only) and Begu, B., Panzeri, S., Arnone, E., Carey, M., and Sangalli, L.M. (2024), A nonparametric penalized likelihood approach to density estimation of space-time point patterns, Spatial Statistics (space-time).

    template <typename RegularizationType_>
    class DEPDE : public DensityEstimationBase<DEPDE<RegularizationType_>, RegularizationType_> { ... };
    

    Because the resolution strategy for a density estimation model is a penalized log-likelihood minimization, DEPDE exposes an interface compatible with the optimization module, i.e. it acts exactly as an optimizer objective function (see the optimization module API for details).

    DEPDE API

    operator(const DVector<double>& g)

    evaluates the penalized log-likelihood functional at \(g\), i.e. computes \(L(g) = - 1^\top \Upsilon g + \int_{\Omega} e^g + g^\top P_{\lambda} g\)

    derive()

    returns a std::function<DVector(const DVector<double>&)> encoding the gradient of the penalized log-likelihood functional.

    bool opt_stopping_criterion(OptimizerType&)

    stops the optimization algorithm if the relative difference between the log-likelihood or the penalized log-likelihood is below a user defined tolerance (defaults to \(10^{-5}\)).

    set_tolerance(double)

    sets the tolerance for the custom stopping criterion (the one triggered by opt_stopping_criterion()).

    void set_g_init(const DMatrix<double>&)

    sets the initial log-density expansion coefficient vector.

    Note

    DEPDE does not compute any initialization density (e.g., by heat-process). Instead, it requests the initialization point, which must be externally computed.

    void set_optimizer(OptimizerType&&)

    sets the optimization algorithm for the minimization of the penalized log-likelihood functional \(L(g)\). The optimizer is internally type-erased.

    void init()

    initializes the model stack.

    void solve()

    triggers the optimizer for the minimization of the penalized log-likelihood.

    Check the code snippet below for the provided API:

    Example

    DEPDE API
    // assume mesh and laplacian penalty already defined...
    
    // space-only model
    DEPDE<SpaceOnly> model(penalty);
    model.set_lambda_D(0.1);
    
    // data in point-pattern processes coincide with locations
    BlockFrame<double, int> df;
    df.insert(SPACE_LOCS, ...);
    
    model.set_tolerance(1e-5); // set tolerance on custom stopping criterion
    model.set_data(df);
    
    // set optimization algorithm (here you have access to the whole optimization API)
    int max_iter = 500;
    double opt_tolerance = 1e-5; // set optimizer tolerance (looks for the norm of the objective gradient)
    double step = 1e-2;
    model.set_optimizer(BFGS<fdapde::Dynamic> {max_iter, opt_tolerance, step}); // optimizer must be Dynamic
    
    // gradient descent with adaptive step
    model.set_optimizer(GradientDescent<fdapde::Dynamic, BacktrackingLineSearch> {max_iter, opt_tolerance, step});
    // in general, you can set any optimization algorithm in the optimization module
    
    // initialize and solve
    model.set_g_init(...); // optimization algorithm init point
    model.init();
    model.solve();
    
    model.g(); // estimated log-density field
    
    // you can also approach the fitting as a pure optimization problem (emphasis on the optimizer)
    
    BFGS<fdapde::Dynamic, WolfeLineSearch> optimizer(max_iter, opt_tolerance, step);
    optimizer.optimize(model, g_init);
    optimizer.optimum(); // estimated log-density field
    
    // and by just changing the RegularizetionType template, you get space-time :)
    DEPDE<SpaceTimeSeparable> model(penalty_space, penalty_time);
    // all the API above stays valid
    
    DEPDE KFold-CV API
    // assume mesh and laplacian penalty already defined...
    
    DEPDE<SpaceOnly> model(penalty);
    model.set_data(df);
    model.set_optimizer(BFGS<fdapde::Dynamic, WolfeLineSearch> {max_iter, opt_tolerance, step});
    
    int n_folds = 10;
    int seed = fdapde::random_seed;
    KCV kcv(n_folds, seed);
    
    DMatrix<double> lambda_grid; // the grid of smoothing parameters to explore
    DMatrix<double> g_init_grid; // for each value of lambda, the initial density field (computed in some way)
    model.set_g_init(g_init_grid);
    
    // calibrate the model
    kcv.fit(model, lambda_grid); // uses DEPDE::CVScore scoring function, see below for details
    
    // at the end you get
    kcv.avg_scores();
    kcv.std_scores();
    kcv.optimum(); // optimal smoothing parameter
    

    Info

    DEPDE internally defines its cross-validation scoring index as a functor of type DEPDE::CVScore, exposing a call operator compatible with the KCV requirement. DEPDE::CVScore implements Equation (1.18) of Begu, B., Panzeri, S. (2022), Space-Time Density Estimation with Partial Differential Equation Regularization. PACS project. Pag 17.

    Tip

    When a model of type ModelType exposes a public type ModelType::CVScore, calling KCV::fit(model, lambda_grid) fallbacks to the use of ModelType::CVScore as cross validation index (rises a static assert otherwise). Specifically, ModelType::CVScore must expose a constructor with the following signature:

    CVScore(ModelType& model);
    

    and expose a call operator

    double operator()(
       int fold, const DVector<double>& lambda,
       const BinaryVector<fdapde::Dynamic>& train_mask, const BinaryVector<fdapde::Dynamic>& test_mask);
    

R (base)

Be prepared, almost ready (at least on paper).