Sunday, October 17, 2021

    HTML5 Mastery: Tree Traversal

    Must Read

    We started this site to inspire young minds to motivate and encourage them towards Programming Language. In this site you will get programming tutorials, tech, programming facts, programming fun and programming blogs.

    One of the most important concepts in the DOM is tree traversal. Since computer science has been established as its own field of study, decades of research have been spent on data structures and algorithms. One of the most used structures is a tree. Trees are everywhere. A very rudimentary, yet useful and often used version is a binary tree. A tournament can be represented as a binary tree. The DOM tree is not binary, however. Instead it is a K-ary tree. Each node may have zero to N sub-nodes, called childNodes.

    A DOM tree hosts a wide range of possible types of nodes. There may be Text,Element, Comment and other special ones, such as ProcessingInstruction orDocumentType, among many. Most of them won’t have any childNodes by definition. They are end-points and carry only a single piece of information. For instance, a Comment node only carries the specified comment string. A Text node is only available to store a content string.

    The Element node hosts other nodes. We can recursively descend from elements to elements to go through all available nodes in the system.

    An example that also relates to the previous article regarding the <template>element is filling out a DOM subtree structure. A subtree is a part of the tree, which starts at a specified element. We call the specified element the subtree root. If we take the <html> element of a tree as the subtree root, the subtree would be nearly identical with the real tree, which starts at the document, i.e., one level below thedocumentElement.

    Filling out the subtree structure requires us to iterate on all children of the subtree root. On each node we need to check for the exact type of node and then go on in the appropriate manner. For instance, each element needs to be considered as a subtree root again. Text nodes, on the other hand, have to be evaluated more carefully. Maybe we also want to check the comment nodes for special directives. Furthermore, the attributes of the elements should be regarded as well.

    For the scenario we use a method called applyModel to fill templated strings with values from a model. The method looks as follows and could, of course, be further optimized. Nevertheless, for our purposes it is certainly sufficient.

    Let’s see an implementation for the described scenario, which uses the applyModelmethod on various occasions. This takes an instance of a template element and an object called model to return a new DocumentFragment. The new fragment uses the data from the model to change all values from {X} to the result of evaluating the expression X using the provided object.

    The previous code uses a function findAllNodes, which takes a node and stores all its children in an array. The function is then called recursively on each child. In the end all results are merged to a single array of the whole subtree, i.e. we transform the tree structure to a 1-dimensional array.

    The following snippet shows a sample implementation for the described algorithm.

    The function to change each node in the array is shown below. The function performs some manipulations depending on the type of node. We only care about attributes and text nodes.

    Even though the code is easy to understand, it is not very pretty. We have quite a few performance problems, especially since we have a lot of unfortunately necessary DOM operations. This can be done more efficiently using one of the DOM tree helpers. Note that the findAllNodes method returns an array with all nodes, not just all Element instances in the subtree. If we’re interested in the latter, we can simply use a querySelectorAll('*') call, which performs the iteration for us.

    A solution that comes up immediately is to use a NodeIterator. A NodeIteratoriterates over nodes. It fits nearly perfectly to our criteria. We can create a newNodeIterator by using the createNodeIterator method of the document object. There are three crucial parameters:

    1. The root node of the subtree to iterate.
    2. Filters, which nodes to select / iterate over.
    3. An object with an acceptNode for custom filtering.

    While the first argument is just a plain DOM node, the other two use special constants. For instance, if all nodes should be shown, we have to pass on -1 as the filter. Alternatively we can use NodeFilter.SHOW_ALL. We could combine multiple filters in several ways. As an example, the combination of showing all comments and all elements can be expressed with NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_ELEMENT.

    The third argument is an object that may look as primitive as the following code snippet. Even though the object wrapping the function seems to be redundant, it has been specified that way. Some browsers, e.g. Mozilla Firefox, give us the possibility of reducing the object to a single function.

    Here we accept any node that is passed in. Additionally we have the option to reject a node (and its children) with the FILTER_REJECT option. If we just want to skip the node, but are still interested in its children, if any, we can use the FILTER_SKIPconstant.

    Implementing our former example using the NodeIterator is fairly straightforward. We create a new iterator by using the constructor method in the document. Then we use the nextNode method to iterate over all nodes.

    Let’s have a look at the transformed example.

    The DOM lookup is completely hidden from us. This is a great benefit. We only request the nodes we desire, and the rest is done in the most efficient manner in the browser’s engine. However, on the other side we still have to provide code for iterating the attributes.

    Even though attributes are covered by the SHOW_ATTRIBUTE constant, they are not related to the element nodes as children. Instead they live in a NamedNodeMapcollection, which will not be included in the lookup by the NodeIterator. We can only iterate over attributes if we start the iteration at an attribute, limiting ourselves to only attributes.

    The previous example could also invoke the change in the provided filter. This is, however, not a good practice, as we might want to use the iterator for other purposes as well. Therefore the iterator should really just present a read-only solution, which does not mutate the tree.

    Mutating the tree is also not very well supported by the NodeIterator. The iterator can be thought of like a cursor in a document, being placed between two (the last and the next) node. Therefore a NodeIterator does not point to an any node.

    We want to iterate over nodes in a subtree. Another option that may come to our mind is to use a TreeWalker. Here we walk the tree as the name already suggests. We specify a root node and elements to consider in our route and then we process. The interesting part is that a TreeWalker has a lot in common with a NodeIterator. Not only do we already see a lot of shared properties, it also uses the sameNodeFilter to set up constraints.

    In most scenarios the TreeWalker is actually a better choice than theNodeIterator. The API of the NodeIterator is bloated for what it delivers. TheTreeWalker contains even more methods and settings, but at least uses them.

    The main difference between the TreeWalker and the NodeIterator is that the former presents a tree-oriented view of the nodes in a subtree, rather than the iterator’s list-oriented view. While the NodeIterator allows us to move forward or back, a TreeWalker gives us also the option of moving to the parent of a node, to one of its children, or to a sibling.

    In contrast to the NodeIterator, the ‘TreeWalker points directly to a specific node in the tree. If the node being pointed to is moved around, the TreeWalker will therefore follow. More importantly, if the node being pointed to is removed from the tree, then we will effectively end up outside the document tree. If we follow the advice for theNodeIterator and do not mutate the tree during traversal, we will end up with the same path.

    The TreeWalker also seems to be nearly identical to the NodeIterator for our purpose. There are reasons why the latter couldn’t gain much attention. Nevertheless the TreeWalker is not very well known either. Maybe its area of usage is just too limited, giving us no ability to traverse attributes again—especially with the third option we have for DOM tree iteration.

    Let’s consider a simple example for a selection that results in the following image.


    Here we start in the text node of the first list item and finish at the end of the text node of a strong element. The following image illustrates the covered range from the source code’s perspective.

    DOM tree
    DOM tree

    Displaying the DOM tree for the provided Range instance is also interesting. We see that such a range is able to cover a whole variety of nodes independent of their ancestor or sibling.

    nodes independent
    nodes independent

    Extracting the selected nodes gives us a DocumentFragment, which starts at a new list item and ends after the strong element.


    The extraction is actually a DOM mutating action, i.e. it will modify the existing tree. Now we are left with the following two items, exactly as we expected.


    Since text could include elements and everything contained in them, the range also needs to cover these cases. At first sight the Range is strangely engineered, as it always needs to care about at least two cases: text and non-text (mostly elements). However, as we’ve seen there is a good reason for distinguishing these two cases, otherwise we couldn’t select text fragments.

    The Range object lacks the iteration capabilities we experienced earlier. Instead we have improved serialization and export abilities. Changing our former example is therefore cumbersome at first. Nevertheless, by introducing a few new methods we are able to combine the flexibility of a Range with enhanced iteration.

    These two methods let us use the Range just like the iterators before. Right now we can only go in one direction; however, we could easily implement methods to skip the children, go directly to the parent, or perform any other move.

    Even though the Range is garbage collected like any other JavaScript object, it is still good practice to release it using the detach function. One of the reasons is that all Range instances are actually kept in the document, where they are updated in case of DOM mutations.

    Defining our own iterator methods on the Range is beneficial. The offsets are automatically updated and we have auxiliary possibilities, such as cloning the current selection as a DocumentFragment, extracting or deleting the selected nodes. Also we can design the API for our own needs.

    Iterating through the DOM tree is an interesting topic for anyone thinking about DOM manipulation and efficient node retrieval. For most use cases there is already a corresponding API. Do we want a plain iteration? Do we want to select a range? Are we keen on walking the tree? There are advantages and disadvantages for each method. If we know what we want, we can make the right choice.

    Unfortunately, tree structures are not as simple as 1-dimensional arrays. They can be mapped to 1-dimensional arrays, but the mapping follows the same algorithm as iterating over its structure. If we use one of the provided structures, we have access to methods that already follow these algorithms. Therefore we get a convenient way to do some iteration over nodes in a K-ary tree. We also save some performance by performing fewer DOM operations.

    See more:

    HTML5 Cheat Sheet

    HTML5 tools to make your life easier – Part 1

    Latest Articles

    More Recipes Like This