Introduction to RoboKudo#

RoboKudo is built upon the py_trees library, which features direct integration with ROS and also offers visualization methods to inspect the current state of the behavior tree. This allows for easy introspection of the process execution during runtime and also analysis of how the tree changed at a given point in time. Support to easily integrate new computer vision methods is one of our main goals, as there is typically not a single computer vision algorithm that can handle all perception tasks well. For 2D image processing, we directly support OpenCV while for 3D data processing most of the algorithms are working on the data structures of the Open3D library. It is also possible to integrate computer vision approaches from external processes, like for example Docker Containers or ROS nodes. To support this work, Annotators in RoboKudo can communicate with ROS nodes via ROS actions.


In the annotators folder, you can find the core annotators for typical computer vision problems. All of them work with a common data structure, called CAS, which holds the sensor data and any annotations that the annotators generated on the data. The collection reader for example reads in percepts from a given sensor and stores them in the CAS, while the ImagePreprocessor reads sensor data from the CAS, generates a PointCloud from it, and stores it in the CAS.


Each pipeline can contain multiple annotators that are analyzing their required sensory information and then feed back their results into annotations on the particular regions in the data. To enable annotators to exchange data flexibly, we are building on top of the Unstructured Information Management(UIM) concept that was made popular by the IBM Watson architecture. Each pipeline in RoboKudo gets associated with a single CAS object, which holds the relevant information for each pipeline like sensory data and also annotations. In short, the CAS is basically a dictionary where the keys are called Views which are predefined lists of keywords for certain annotations or sensory data types. For example, to get the associated color image in the running pipeline you can execute: cas.get(COLOR_IMAGE)

The CAS class also offers a method to filter annotations by their type for example if you’re looking for a specific annotation in your CAS. This allows you for example to retrieve all Classification annotations and then analyze them while leaving all other annotations out.


As previously mentioned pipelines are a specialized version of the behavior tree sequence. The main difference is that Pipelines get assigned a new CAS whenever they are created or are running a new iteration, but they also contain data structures for visualization and provide helper methods to access the other annotators that are also running in the pipeline or get a reference to the CAS of the pipeline, even if your behavior or annotator is not a direct successor of the pipeline or is nested below other behaviors.

BT-CAS-Interaction graphic


In order to reason about similar types of annotation, it is required to wrap your annotations into classes. The Annotation superclass can be extended to define your own annotation types. This step is important to later on reason about annotators that can output a certain annotation and also represent if an annotator is basically yielding the same type of information and could provide additional or alternative results. An example for an annotation is a color histogram for a particular object in the image. This color histogram annotation could contain not only the histogram but also the three most dominant colors and also the ratio of them. Another type of annotation could be a shape where you can also define the possible values by providing it you know him and the class definition.


GUI handling is currently also done in the behavior tree. There is a single behavioral responsible for updating the GUI which consists of a 2D and 3D image viewer. Currently, everything that should be visualized has to be put in a data structure which is called the annotator output. To access the data structure and how you can set data on it can be seen for example in the image preprocessor annotator. When the pattern has been executed successfully the GUI behavior will take all the generated annotator outputs and visualizes them in the currently active window. There is also basic support for key callbacks so that you can press the key in the visualizers that gets forwarded to the individual annotator. This allows you to set for example a special analysis mode or change the type of visualization for the next one.


  1. Get RoboKudo running on your system with a simple pipeline.

  2. Read about the general concept of behavior trees. There is an excellent, comprehensive book on arxiv which can be seen here. But there are also many tutorials on YouTube on how to use behavior trees in general or practical applications on different problem domains like robotics and also games. Petter Ögren has a lot of good videos on his Youtube channel

  3. Get familiar with the py_trees library after reading how behavior trees work. py_trees offers lots of documentation and also basic explanations of how behavior trees work. Going through the examples in the library will help you understand behavior trees even better and practice the concepts.

  4. Set up your IDE to start development. Typically we recommend using PyCharm to do that. Check out the Installation chapter for details.

From here on it depends a bit on how you want to contribute to the project. If you’re working on the flow control of RoboKudo and therefore close to behavior trees, you should already be ready to start contributing. If you want to introduce new computer vision methods, we recommend looking into tutorials and documentation on using OpenCV and Open3D. Additionally, looking into their underlying concepts and libraries such as numpy, can be a huge benefit. If you want to develop fast computer vision methods in Python, you need to be at least familiar with how to quickly iterate over data structures by either using numpy operations or Numba.