May 2024¶
Reached 10K lines of code.
RcppEigen version upgraded to 0.3.4
RcppEigen
has been updated to Eigen v3.4.0 (version 0.3.4 of the RcppEigen
package released on CRAN on 2024-02-28). There are no more restrictions for using Eigen v3.3.9, therefore the whole C++ codebase will upgrade the Eigen version to the 3.4.0. Code will not compile anymore with Eigen v3.3.9 as it already makes use of slicing, indexing and Eigen support for STL iterators available only in the 3.4 version.
The fdaPDE-R package DESCRIPTION and the various CMakeLists.txt files have been updated. femR will need to update its DESCRIPTION file when aligned to the mainstream fdaPDE-core library.
As a consequence of this upgrade, there are no limitations for using the C++20 standard, as announced in February 2024.
Warnings are errors
From this update on, to achieve high-quality code, tests are compiled with the options -Wall -Wpedantic -Wextra -Werror
. This setting enables the majority of the interesting warnings, and considers warnings as errors (-Werror
flag). That said, warnings cannot be ignored anymore. This options should help also for a less painfull interface with the CRAN rules.
core
geometry: the geometry module is under a major rewriting, and is the first step toward a more general core library system. While work is still in progress, here a list of the already official features:
Nomenclature
Inside the geometry module, we will refer with cell what in finite element analysis is usually named element, that is, depending on the dimensionality of the domain, a cell will be a segment (1D, 1.5D), a triangle (2D, 2.5D) or a tetrahedron (3D).
We will name node a vertex of a cell. For triangular meshes, the name edge refers to a boundary segment of a triangle. For tetrahedral meshes, the name face denotes a boundary triangle of a tetrahedron, while the name edge refer to a boundary segment of such triangle, i.e. is a 3D segment (to be cristal clear, a tetrahedron has 4 faces and 6 edges).
template
Element<int LocalDim, int EmbedDim>
has been removed.Element
was a generic way to indicate a single segment/triangle/tetrahedron inside a mesh. Its presence was a problem, as the resulting interface was a bit controversial (for instance, a triangle had to define both a concept of face and of edge, which in this case are equivalent). Moreover, it was impossible to work on a simple cell without having an entire mesh structure, which was a bit annoying.
The following structure has been introduced:
Simplex:
Simplex<Order, EmbedDim>
represents a generic simplex embedded in anEmbedDim
-dimensional space. The template parameterOrder
sets the order of the simplex, e.g. depdending on its value,Simplex
represents a segment (Order
= 1), a triangle (Order
= 2) or a tetrahedron (Order
= 3). Higher orders are available, but from a geometrical viewpoint there is no interest in havingOrder > 3
. The following geometric operations are available on a simplex:¶ measure()
measure of the simplex.
bounding_box()
returns the smallest rectangle containing the simplex.
barycenter()
returns the midpoint (center of mass) of the simplex.
barycentric_coords(const NodeType& p)
given a point \(p\), rewrites it in the barycentric coordinate system of the simplex.
circumcenter()
returns the center of the (hyper)-sphere passing through the vertices of the simplex.
circumradius()
returns the radius of the (hyper)-sphere passing through the vertices of the simplex.
contains(const NodeType& p)
given a point \(p\) determines if the point is inside, on a boundary cell, on a vertex or outside of the simplex.
supporting_plane()
the
HyperPlane<Order, EmbedDim>
passing through the simplex.normal()
the normal direction to the simplex (meaningfull only if
Order != EmbedDim
, otherwise returns zero).nearest(const NodeType& p)
given a free point \(p\), finds the best approximation of \(p\) in the simplex (e.g., the point in the simplex which is nearest to \(p\)).
Simplex<Order, EmbedDim>
also provides aboundary_iterator
type which let iterate over the boundary cells of the simplex asSimplex<Order - 1, EmbedDim>
simplices (which means that you have all the API above for each boundary cell).Note
The
Simplex<Order, EmbedDim>
data type is mainly for internal usage. It is in turn inherited by other high-level concepts. Check theTriangle
,Tetrahedron
, etc. types below. The point ofSimplex
is to be a free geometric object, i.e. without any connectivity information related to the presence of an underlying triangulation. Neverthless, having the ability to work on a free geometric object turns out to be usefull in many applications.// create a 2D triangle (Simplex of order 2, embedded in a 2D space) SMatrix<2, 3> vertices; vertices << 0.0, 0.0, 0.5, 0.0, 0.0, 0.8; Simplex<2, 2> triangle(vertices); triangle.measure(); triangle.circumcenter(); // ... see the table above ... // compute the perimeter of the triangle using STL algorithms :) double p = std::accumulate( triangle.boundary_begin(), triangle.boundary_end(), 0.0, [](double perimeter, auto& f) { return perimeter + f.measure(); }); // explicit for loop over boundary edges for(typename Simplex<2, 2>::boundary_iterator it = boundary_begin(); it != boundary_end(); ++it) { it->normal(); // normal direction to edge it->measure(); // length of edge // ... all the standard Simplex API ... }
Elementary geometric entities: introduced templates
Triangle
,Tetrahedron
andSegment
.template <typename Triangulation> class Triangle : public Simplex<2, Triangulation::embed_dim>; template <typename Triangulation> class Tetrahedron : public Simplex<3, 3>; template <typename Triangulation> class Segment : public Simplex<1, Triangulation::embed_dim>;
While
Simplex<Order, EmbedDim>
is a free geometric object, each of the above types have no meaning if not bounded to a triangulation, e.g. aTriangle
exists only as part of aTriangulation<2, EmbedDim>
object.In addition to the whole
Simplex<Order, EmbedDim>
API, the following methods related to connectivity information are available:¶ id()
the triangle id in the physical triangulation.
neighbors()
returns the ids of neighboring triangles.
node_ids()
returns the ids of the triangle vertices.
on_boundary()
true if the triangle has at least one edge on the triangulation boundary.
edge(int i)
- returns the i-th edge as an
EdgeType
instance. AnEdgeType
inherits fromSimplex<1, Triangulation::embed_dim>
, i.e. it represents a geometrical segment, and provides the following: id()
: the ID of the edge in the physical triangulation.node_ids()
: the ids of the nodes composing the edge.adjacent_cells()
: the ids of the cells (i.e., triangles) which share this edge.on_boundary()
: true if the edge is on boundary.
edges_begin()/edges_end()
first and last iterator to triangle edges.
Tetrahedron
exposes a simlar interface, with the addition of iterators and access methods to each face of the tetrahedron. Check theTriangulation
code snippet below for a detailed exposition of the API.- returns the i-th edge as an
Triangulation: template
Mesh<int LocalDim, int EmbedDim>
has been removed and replaced by the newTriangulation<int LocalDim, int EmbedDim>
type.template <int LocalDim, int EmbedDim> class Triangulation;
Tip
The main point of the
Triangulation
data type is the exposition of iterators to logically iterate over different parts of the mesh. Moreover, the provided iterators are compliant to thestd::bidirectional_iterator
concept, which let use any STL algorithm overTriangulation
iterator ranges. For instance, it is straightforward to compute the measure of the border:// just one line, for a complex operation :) std::accumulate(mesh.boundary_edges_begin(), mesh.boundary_edges_end(), 0.0, [](double p, auto& cell) { return p + cell.measure(); } );
Note
This let us easily parallelize different operations, once we link fdaPDE to the oneTBB library shipped by Intel, thanks to the STL support for parallel algorithms. It is immediate, for instance, to compute the measure of the mesh by a parallel
std::reduce
.Check the code snippet below for a detailed description of the exposed API:
// a planar triangulation (the API below is also available for triangulated surfaces) Triangulation<2, 2> mesh(nodes, cells, boundary); // Triangulation<2, 2> will compute for you the edges, the neighboring structure and // other connectivity informations mesh.cell(i); // request i-th cell as a Triangle<2, 2> mesh.node(i); // request i-th node // a lot of other informations, such as matrix of edges, neighbors, etc. number of edges, cells, // boundary edges and boundary nodes, triangulation bounding box, etc. // check the source code for more details! // iterators for(typename Triangulation<2, 2>::cell_iterator it = mesh.cells_begin(); it != mesh.cells_end(); ++it) { // all the interface of Triangle<2, 2> is accessible from the iterator it it->measure(); // measure of triangle it->circumcenter(); // circumcenter of triangle // ... } std::for_each(mesh.cells_begin(), mesh.cells_end(), [](auto& cell) { // whatever complex operation on your cell (even the assembly of a differential weak form) }); // and the above can be paralellized thanks to the STL :) // cycle over the border for(typename Triangulation<2, 2>::boundary_edge_iterator it = mesh.boundary_edges_begin(); it != mesh.boundary_edges_end(); ++it) { // all the interface of Triangle<2, 2>::EdgeType available from the iterator it->measure(); // length of edge it->barycenter(); // mid-point of the edge it->adjacent_cells(); // the id of the triangle sharing this edge // ... } // you can also iterate on // - the whole set of edges with: edges_begin() / edges_end() // - the boundary nodes with : boundary_nodes_begin() / boundary_nodes_end() DVector<int> ids = mesh.locate(locs); // O(log(n_locs)) point location std::unordered_set<int> patch = mesh.node_patch(n); // ids of all triangles having n as vertex // 3D triangulation (tetrahedralizations) Triangulation<3, 3> mesh(nodes, cells, boundary); // all the interface of a Triangulation<2, 2> is available, with the additional capability // of indexing and iterating over the faces of each tetrahedron for(typename Triangulation<3, 3>::boundary_face_iterator it = mesh.boundary_faces_begin(); it != mesh.boundary_faces_end(); ++it) { // all the interface of Tetrahedron<3, 3>::FaceType (i.e., a 3D triangle with connectivity infos) // available from the iterator it it->normal(); // normal direction to the face it->measure(); // area of the triangle it->edge_ids(); // global ids in the 3D triangulation of the edges making this triangle // you can in turn cicle on each 3D edge of the current boundary face it for(auto jt = it->edges_begin(); jt != it->edges_end(); ++jt) { ... } // jt is a Simplex<1, 3> exposing also jt->on_boundary(); // well, this is true, we are iterating over the border :) jt->id(); // id of the edge in the 3D triangulation jt->node_ids(); // ids of nodes composing the edge } } // for a 3D mesh, you can get also its surface mesh as a Triangulation<2, 3> instance Triangulation<2, 3> surface = mesh.surface(); double p1 = std::accumulate(mesh.boundary_faces_begin(), mesh.boundary_faces_end(), 0.0, [](double s, auto& f) { return s + f.measure(); } ); double p2 = std::accumulate(surface.cells_begin(), surface.cells_end(), 0.0, [](double s, auto& f) { return s + f.measure(); } ); // and not surprisingly, p1 == p2 :)
Projection: template class
Projection<TriangulationType>
implements an exact and a non-exact method for point projection over aTriangulation<LocalDim, EmbedDim>
. Given a free point \(p\), the algorithm searches for the best approximation of \(p\) in the triangulation. In this sense, it works both for manifold and non-manifold domains (in the last case, the algorithm returns the point on the 2D/3D triangulation border which is nearest to \(p\)).Computational complexity: let \(N\) be the number of cells and \(n\) the number of points to project.
Exact version is \(O(nN)\).
Assuming \(n \gg N\) (number of points to project much larger than number of cells), approximate version is \(O(n \log(N))\) (it was \(O(nN)\) in
fdaPDE-1.1-17
).
Info
The approximate algorithm computes a
KDTree
for fast locating the nearest mesh node to \(p\) (with computational cost \(O(N \log(N))\). This cost is negligible if \(n \gg N\), and performed just once ). Once the nearest point \(q\) is found, a search restricted in the patch of \(q\) is performed, i.e., in the set of cells having \(q\) as vertex (computing the patch costs \(O(\log(N))\)), therefore avoiding a brute force search over the entire mesh (which would cost \(O(N)\)). This is an approximate approach since for highly non-convex region the computed point might be not the nearest to \(p\).Triangulation<2, 3> surface(nodes, cells, boundary); DMatrix<double> points; // free points in 3D space // perform projection (use C++ automatic template deduction + tag dispatching) DMatrix<double> proj_points = Projection(surface)(points, fdapde::Exact); DMatrix<double> proj_points = Projection(surface)(points, fdapde::NotExact); // NotExact projection requires a O(N log(N)) initialization, this is done just once // at first call. You can create a Projection instance and cache Projection<Triangulation<2, 3>> project(surface); project(points, fdapde::NotExact); // silent initialization here project(points, fdapde::NotExact); // just perform fast approximate projection project(points); // defaults to approximate algorithm
Minor changes:
Optimizers support for objective stopping criterion callback: if the objective functor provided to the optimizer exposes a method with the following signature:
template <typename OptimizerType> bool opt_stopping_criterion(OptimizerType& opt);
any optimizer in the optimization module will execute it to evaluate if convergence has been reached. Users of the optimization module can hence define objective functions with a custom stopping criterion (see, e.g., density estimation in fdaPDE-cpp).
Binary matrix: binary matrices have proved to be extremely usefull for handling bitmasks and are getting more and more used in the codebase. The following additional methods are now exposed:
int n = 50, m = 50; BinaryMatrix<Dynamic> bitmask(n, m); // writable block expressions bitmask.block(0, 0, 10, 10) = BinaryMatrix<Dynamic>::Ones(10, 10); // reshape operation bitmask.reshape(25, 100); // reshape bitmask to a 25 x 100 matrix (no-cost operation) bitmask.vector_view(); // linearize the bitmask into a column vector (no-cost operation) // returns all the indexes set to true/false in the bitmask: O(nm) operation bitmask.which(true); // which true? bitmask.which(false); // which false? which(bitmask); // implicitly returns true indexes (R style)
cpp
Density estimation: official support for density estimation models. Below the API exposed by the
DensityEstimationBase
core class for the density module:¶ n_obs()
number of active data locations. A data location is active if is it not masked.
n_locs()
number of all data locations (e.g., the overall number of observations). It coincides with
n_obs()
if no observation is masked.Psi()
the matrix \(\Psi\) (evaluation of spatial basis functions at data location) for space-only problems, the matrix \(\Upsilon\) for space-time problems (as defined in Begu, B., Panzeri, S. (2022), Space-Time Density Estimation with Partial Differential Equation Regularization. PACS project. Pag 9.)
Upsilon()
if some observation is masked, returns the matrix provided by
Psi()
where rows corresponding to masked observations are set to zero, otherwise is equivalent to callingPsi()
. Upper models should mainly interact with this method, instead of directly callingPsi()
PsiQuad()
the matrix of evaluations of the reference basis system at the quadrature nodes. Already tensorized for space-time problems.
w()
weights of the quadrature rule used for the approximation of \(\int_{\Omega} e^g\). Already tensorized for space-time problems.
int_exp(const DVector<double>& g)
evaluation of \(\int_{\Omega} e^g\)
int_exp()
evaluation of \(\int_{\Omega} e^{\hat g}\), where \(\hat g\) is the current estimation of the log density field.
grad_int_exp(const DVector<double>& g)
evaluation of the gradient of \(\int_{\Omega} e^g\)
grad_int_exp()
evaluation of the gradient of \(\int_{\Omega} e^{\hat g}\), where \(\hat g\) is the current estimation of the log density field.
g()
expansion coefficient vector of the log density field.
f()
expansion coefficient vector of the density field, e.g. \(f = e^g\).
masked_obs()
BinaryVector<Dynamic>
of masked observations.As always, in addition, a model inheriting from
DensityEstimationBase
has access to the specific API induced by the choosen regularization. Check the source code for details.The masking mechanism
This is something already shown in February 2024. At the statistical level, “masking” means to remove some observations (the masked ones) from the fitting. This corresponds to set to zero all the rows of the matrix \(\Psi\) (or \(\Upsilon\)) corresponding to masked observations. This mechanism is used, for instance, by the
KCV
class to perform CV selection of the smoothing parameters.By doing so, all models inheriting from
DensityEstimationBase
have immediate support for smoothing parameter selection by K-fold Cross Validation.Class
DEPDE
implements the density estimation model shown in Ferraccioli, F., Arnone, E., Finos, L., Ramsay, J.O., Sangalli, L.M. (2021), Nonparametric density estimation over complicated domains, Journal of the Royal Statistical Society (space-only) and Begu, B., Panzeri, S., Arnone, E., Carey, M., and Sangalli, L.M. (2024), A nonparametric penalized likelihood approach to density estimation of space-time point patterns, Spatial Statistics (space-time).template <typename RegularizationType_> class DEPDE : public DensityEstimationBase<DEPDE<RegularizationType_>, RegularizationType_> { ... };
Because the resolution strategy for a density estimation model is a penalized log-likelihood minimization,
DEPDE
exposes an interface compatible with the optimization module, i.e. it acts exactly as an optimizer objective function (see the optimization module API for details).¶ operator(const DVector<double>& g)
evaluates the penalized log-likelihood functional at \(g\), i.e. computes \(L(g) = - 1^\top \Upsilon g + \int_{\Omega} e^g + g^\top P_{\lambda} g\)
derive()
returns a
std::function<DVector(const DVector<double>&)>
encoding the gradient of the penalized log-likelihood functional.bool opt_stopping_criterion(OptimizerType&)
stops the optimization algorithm if the relative difference between the log-likelihood or the penalized log-likelihood is below a user defined tolerance (defaults to \(10^{-5}\)).
set_tolerance(double)
sets the tolerance for the custom stopping criterion (the one triggered by
opt_stopping_criterion()
).void set_g_init(const DMatrix<double>&)
sets the initial log-density expansion coefficient vector.
Note
DEPDE
does not compute any initialization density (e.g., by heat-process). Instead, it requests the initialization point, which must be externally computed.void set_optimizer(OptimizerType&&)
sets the optimization algorithm for the minimization of the penalized log-likelihood functional \(L(g)\). The optimizer is internally type-erased.
void init()
initializes the model stack.
void solve()
triggers the optimizer for the minimization of the penalized log-likelihood.
Check the code snippet below for the provided API:
Example
// assume mesh and laplacian penalty already defined... // space-only model DEPDE<SpaceOnly> model(penalty); model.set_lambda_D(0.1); // data in point-pattern processes coincide with locations BlockFrame<double, int> df; df.insert(SPACE_LOCS, ...); model.set_tolerance(1e-5); // set tolerance on custom stopping criterion model.set_data(df); // set optimization algorithm (here you have access to the whole optimization API) int max_iter = 500; double opt_tolerance = 1e-5; // set optimizer tolerance (looks for the norm of the objective gradient) double step = 1e-2; model.set_optimizer(BFGS<fdapde::Dynamic> {max_iter, opt_tolerance, step}); // optimizer must be Dynamic // gradient descent with adaptive step model.set_optimizer(GradientDescent<fdapde::Dynamic, BacktrackingLineSearch> {max_iter, opt_tolerance, step}); // in general, you can set any optimization algorithm in the optimization module // initialize and solve model.set_g_init(...); // optimization algorithm init point model.init(); model.solve(); model.g(); // estimated log-density field // you can also approach the fitting as a pure optimization problem (emphasis on the optimizer) BFGS<fdapde::Dynamic, WolfeLineSearch> optimizer(max_iter, opt_tolerance, step); optimizer.optimize(model, g_init); optimizer.optimum(); // estimated log-density field // and by just changing the RegularizetionType template, you get space-time :) DEPDE<SpaceTimeSeparable> model(penalty_space, penalty_time); // all the API above stays valid
// assume mesh and laplacian penalty already defined... DEPDE<SpaceOnly> model(penalty); model.set_data(df); model.set_optimizer(BFGS<fdapde::Dynamic, WolfeLineSearch> {max_iter, opt_tolerance, step}); int n_folds = 10; int seed = fdapde::random_seed; KCV kcv(n_folds, seed); DMatrix<double> lambda_grid; // the grid of smoothing parameters to explore DMatrix<double> g_init_grid; // for each value of lambda, the initial density field (computed in some way) model.set_g_init(g_init_grid); // calibrate the model kcv.fit(model, lambda_grid); // uses DEPDE::CVScore scoring function, see below for details // at the end you get kcv.avg_scores(); kcv.std_scores(); kcv.optimum(); // optimal smoothing parameter
Info
DEPDE
internally defines its cross-validation scoring index as a functor of typeDEPDE::CVScore
, exposing a call operator compatible with theKCV
requirement.DEPDE::CVScore
implements Equation (1.18) of Begu, B., Panzeri, S. (2022), Space-Time Density Estimation with Partial Differential Equation Regularization. PACS project. Pag 17.Tip
When a model of type
ModelType
exposes a public typeModelType::CVScore
, callingKCV::fit(model, lambda_grid)
fallbacks to the use ofModelType::CVScore
as cross validation index (rises a static assert otherwise). Specifically,ModelType::CVScore
must expose a constructor with the following signature:CVScore(ModelType& model);
and expose a call operator
double operator()( int fold, const DVector<double>& lambda, const BinaryVector<fdapde::Dynamic>& train_mask, const BinaryVector<fdapde::Dynamic>& test_mask);
R (base)
Be prepared, almost ready (at least on paper).