An optical tracking device (NDI Polaris Vega), featuring an inaccuracy of only 0.2 mm, was employed to determine the true spatial coordinates (NDI coordinates) of the deformable template and compare them to our deformation inference. For convenience, five marker points (
A,
D,
E,
F,
G) on the template were selected, as shown in Fig.
13, and their coordinates were recorded in both NDI coordinates and the deformation inference (camera coordinates) for two different deformation states, as detailed in Table
2. The goal is to quantify the deformation errors between the actual deformation in NDI coordinates and the reconstructed results in camera coordinates. There is no necessity to align the two coordinate systems. We provide the 3D error metric
$$\begin{aligned} \varepsilon = \frac{1}{|R|} \sum _{(P,Q)\in R}{\Vert P-Q\Vert }, \end{aligned}$$
(18)
where
R represents a set of point pairs with
P in camera coordinates and
Q as its corresponding ground truth 3D point in NDI coordinates. For the five markers specified, the errors across two distinct deformation states were measured as
\(\varepsilon _1 = 5.96\) mm and
\(\varepsilon _2 = 4.21\) mm, respectively. The segments
AG,
FG, and
DE were selected and rendered using Blender, as illustrated in Fig.
14. The computing efficiency of this method has been impacted by the implementation of numerous enhanced strategies. Fifty iterations were conducted on a
\(24\times 32\) meshed template, approximating the size of A4 paper, resulting in an average processing time of 4 s per deformation inference. The recently introduced method required approximately 3 s, constituting 75 percent of the total processing time. To further understand the performance bottleneck of our method, we divided it into four main components: image segmentation, contour fitting, physics simulation, and bilateral mesh denoising. Physics simulation encompasses the constrained physics simulator, as outlined in section “
Constrained physics simulator”, and position-based dynamics, as shown in Fig.
6. Different mesh sizes for the template were employed, ranging from
\(8\times 16\) to
\(32\times 40\), with seven incremental steps. For each dataset, 50 deformation inferences were conducted, and the average execution time was calculated, as illustrated in Fig.
15. With the increase in mesh size, there is a corresponding rise in the overall processing time. Contour fitting and physics simulation, key components of this method, exhibit a nearly linear increase, accounting for only about 20 percent of the overall processing time. Conversely, image segmentation and bilateral mesh denoising, key components of the comprehensive solution, constitute over 70 percent of the processing time, highlighting a critical issue impacting the performance of our approach. The deformation error for various mesh sizes was also evaluated, as illustrated in Fig.
16. Upon reaching a mesh size of
\(24\times 32\), both the mean and median deformation errors of the grid points reach their minimum values. With the continued increase in mesh size, a higher amount of computational resources is required. However, the accuracy of the deformation inference remains relatively unchanged. The determination of mesh size should be based on the actual physical dimensions and texture characteristics of the deformable template, rather than on the assumption that larger sizes are inherently superior.