1 Introduction
BT | Behavior Tree |
---|---|
RBT | Reconfigurable Behavior Tree |
DS | Dynamical System |
ESDS | Energy-based Stabilizer of Dynamical Systems |
WM | Working Memory |
LTM | Long-Term Memory |
2 Related work
2.1 Motion planning with dynamical systems
2.2 Task scheduling and execution monitoring
3 Methods
3.1 Energy-based stabilizer of dynamical systems
\(h_{1}(x,\underline{x},\overline{x}) = \begin{cases}[ll] 1 & x \geq \overline{x} \\ 0 & x \leq \underline{x} \\ 0.5(1+\sin \left (\pi \left (\frac{x-\underline{x}}{\overline{x}-\underline{x}} - 0.5\right )\right )& \text{otherwise} \end{cases} \) |
\(h_{2}(x,\underline{x},\overline{x}) = 1 - h_{1}(x,\underline{x},\overline{x})\) |
\(\alpha (s) = \min (0.99,h_{1}(s,0,0.1\kappa (\Vert \boldsymbol{x}\Vert )\overline{s})\cdot h_{2}(s,0.9\kappa (\Vert \boldsymbol{x}\Vert )\overline{s},\kappa (\Vert \boldsymbol{x}\Vert )\overline{s}))\) |
\(\beta (z,s) = 1 - h_{1}(z,-0.01,0)\cdot h_{2}(s,0,0.1\kappa (\Vert \boldsymbol{x}\Vert )\overline{s})\) |
\(\qquad \qquad - h_{1}(s,0.9\kappa (\Vert \boldsymbol{x}\Vert )\overline{s},\kappa (\Vert \boldsymbol{x}\Vert )\overline{s})\cdot h_{2}(z,0,0.01)\) |
\(\gamma (z,s) = 1 - h_{1}(z,0,0.01)\cdot h_{2}(s,0,0.1\kappa (\Vert \boldsymbol{x}\Vert )\overline{s})\) |
Approach | SEA [mm2] | Train. Time [s] |
---|---|---|
ESDS [13] | 431.5 / [26.0-1307] | 0.08 / [0.03-0.17] |
CLF-DM [18] | 460.7 / [16.6-1269] | 2.3 / [0.09-21.5] |
τ-SEDS [17] | 537.0 / [26.4-1139] | 25.3 / [7.6-55.4] |
C-GMR [19] | 496.7 / [20.3-1840] | 0.1 / [0.03-0.28] |
3.2 Task monitoring with reconfigurable behavior trees
Type | Symbol | Success | Failure |
---|---|---|---|
Fallback/Selector | ? | One child succeeds | All children fail |
Sequence | → | All children succeed | One child fails |
Parallel | ⇉ | >M children succeed | >N − M children fail |
Decorator | ◊ | Custom | Custom |
Action | Upon completion | Impossible to complete | |
Condition | True | False |
goal reached
condition turns True
. It exploits a blackboard to share variables across the nodes and store the logical pre- and postconditions and the priority list. The blackboard is a thread-safe mechanism that greatly simplifies the communication between nodes. The core of the RBT are the green and blue nodes that are executed in parallel, preserving the asynchronous nature of sensor readings and decision making. The Emphasizer (green node) transforms sensory input into subtree priorities and it never terminates (is always in the Running state). This, and the fact the Parallel node parent of the Emphasizer terminates only if its two children do, let the RBT run until the goal is reached. The blue nodes, which are dynamically allocated at each tick, load from the LTM the branch with higher priority and prepare a small BT ready for execution. The dynamic allocation of the blue nodes is required to prevent deadlocks, letting the RBT reach the goal.
True
. Multiple postconditions are attached to a Sequence node (line 11) and therefore sequentially checked. After these steps, the Sequence node is connected to \(\mathcal{T}_{\mathit{fal}}\) (line 12) that now contains all the postconditions and can be connected the BT (line 18). Preconditions are also cast into Condition nodes (line 16), while Action nodes are used to represent robot motions represented as stable dynamical systems (see Sect. 3.1). An Action can be executed only if all its preconditions are True
. This is achieved by connecting Action and Conditions to a Sequence node that is then attached to the BT (lines 17–18).
True
preconditions and at least a False
postconditions. The Emphasizer periodically looks for active subtrees and determines if there are execution conflicts, i.e. multiple branches that are concurrently active. This ambiguity in the decision process is resolved using a priority-based mechanism. We introduce a priority for each active branch, a real value normalized between 0 and 1, and use it to determine which subtree has to be loaded and executed. Following [4], we define the priority \(e\) as
4 Evaluation
r_box
, b_box
, and g_box
) by picking them from the table (Fig. 3(a)) and placing them in the “storage” area indicated by a white patch (Fig. 3(b)). The Panda robot and the operational space use CoppeliaSim [31] and the robot model identified in [32]. Each box can be sorted by executing the BT shown in Fig. 4, where the generic box
reads as r_box
, b_box
, or g_box
. The presented scenario is simple, but it is sufficient to show the modularity and reusability of the proposed solution. Indeed, the nodes in Fig. 4 can be abstracted into the higher-level action node execute subtree
in Fig. 2 (modularity). Moreover, the subtree in Fig. 4 can be exploited to pick and place similar objects (reusability), like the 3 colored boxes in Fig. 3. The switching between the 3 sorting subtasks is regulated by a RBT like the one in Fig. 2. At runtime, the sorting BT of the closest (highest priority) box is loaded and connected to the BT using Algorithm 1, replacing the block execute subtree
. To compute the subtree priority (4), we choose \(\omega _{\min }\) as the length of the box side (\(\omega _{\min }=0.05\text{ m}\)) and we estimate the maximum distance that still allows grasping a box to be \(\theta _{\max }=1\text{ m}\). The RBT successfully terminates if the 3 boxes are sorted in the storage area. This is obtained by defining the RBT goal as r_box placed
∧ b_box placed
∧ g_box placed
. RBTs are implemented in Python using the basic BT nodes provided by py_tree
.3 In our implementation, the RBT has 19 nodes, obtained by merging the trees in Fig. 2 and Fig. 4. Tree traversals (ticks) are periodically performed every \(38\text{ ms}\). For comparison, a standard BT requires 151 nodes to schedule the same task [4]. This result in an increase in tick time of \(\approx 38\) %.
pick box
and place box
action nodes in Fig. 4 are mapped to stable dynamical systems using the ESDS approach presented in Sect. 3.1. The DS representing each motion is learned from a demonstrated trajectory using a Gaussian processes [33]. The training data and the retrieved trajectories are shown in Fig. 5. At runtime, the system generates a smooth trajectory connecting the current end-effector position with a given target position. The target for picking actions is the box position, while for placement actions it is the desired position in the storage area. Trajectories generated by ESDS (implemented in Matlab®) are depicted in Fig. 5.
b_box
in the storage area, the RBT updates the priority of r_box
and g_box
. Since r_box
is the closest to the robot, the subtree sort r_box
(Fig. 4) is loaded and executed. The sort g_box
subtask is executed at the end and the RBT successfully terminates.r_box
. However, during the execution of the pick r_box
action, we remove the r_box
from the scene. The system detects this incident and promptly reacts by loading the sort g_box
subtask. ESDS replans the pick trajectory on the fly without discontinuities (Fig. 5(c)). Once the green box is sorted, the RBT does not terminate since the goal r_box placed
∧ b_box placed
∧ g_box placed
is False
and keeps monitoring the scene to detect eventual changes. At this point, one can place the red box in the storage area or back on the table. If r_box
is placed in the storage area then r_box placed
becomes True
and the task successfully terminates. In case r_box
is placed back to the table, the Instantiator loads the sort r_box
subtree and the sorting task successfully terminates.