« Looking for Testers

Headdress »

May 12th, 2006 Written by liorean

JavaScript and Selectors

For those of you that didn’t know it, Anne van Kesteren is the editor of a TR an editor’s draft on an API for Selectors. I’ve not been following the discussions on the mailing list, but I have some comments on the draft as it looks right now. Notably, I think they’re currently about to make several design errors that I would prefer to be corrected before people start implementing the thing. First, a little general discussion about what the use for this thing is…

The most obvious benefit (to my eyes at least) of having these selectors to use in DOM scripting is that the performance of the built in selectors mechanism and built in traversals and filters will be considerably faster than a manual script traversal and filtering. It will also save code size and thereby download time, development cost and time, and maintainance cost. Most of these benefits can already be realised by use of XPath, but I feel there are good reasons for both interfaces. First of all, there are very many client side developers that have pretty good knowledge of selectors through years of using CSS but don’t have any knowledge of XPath. Second, I believe XPath is more complex and in all probability slower.

Use cases

So let’s examine possible use cases. What use do we have for selectors based DOM manipulation? That is, what use do we have for them in script that CSS or current DOM doesn’t do, or that they would do considerably better?

Getting rid of whitespace nodes

One of the most common problems I see by inexperienced DOM coders is that they don’t realise that whitespace exists as text nodes in the DOM in pretty much all implementations that are not made by Microsoft. And even if Microsoft changed their implementation to what everybody else is doing, the problem of having to filter them out manually still exists. Doing things such as counting the number of child elements of a parent is not at all as straightforward as one could want it to be. And in any case, that filtering has negative effects on performance. Using a selector, by it’s nature only matching elements, would make all kinds of traversal where you are only interested in elements faster and would reduce the neccesary code size.

Accessing elements based on non-hierarchical properties

Do you want to sort a table based on the values in the fith column? Or maybe you want to loop through all elements that have a certain class to make a menu from? Or give all form controls of a specific type and class an event listener validating user input? Accessing elements by selector could allow you to do this with considerably less code and with better performance.

Reading and filtering foreign DOM documents

For example, some AJAX applications may use an XML document as a form of client side database, and use selectors as their query language for pulling data to display in the AJAX application’s document.

Adding event listener

Adding event listeners to the document, without manipulating it, based on other factors than the element’s tag name or id attribute. This is something that I imagine will become more and more common as AJAX and advanced website UI comes into greater use. Adding event handlers to dynamically loaded content from a single script instead of including it in the content allows for reducing download times and bandwidth use.

Manipulating the DOM

Selectors allow for manipulation of the DOM based on factors such as element class or other attributes, manipulation based on pseudo classes, based on ancestor relations etc. Say you have a document with structural footnotes. You could use the DOM to insert these as floats at the place they are referred for instance. I already mentioned efficient table sorting is a case of this too.

Separation of interspersed parts

Say you have a large versioned document such as a manuscript under revision, and you have a lot of people writing notes or commentary on that document. It may be practical to use AJAX to split it into several documents based on version, author, commentator etc and pull data from these documents into the main document as needed. However, what if you want to access the notes and comments from a lot of authors or commentators on a single section of that document? Selectors could be used to pull just the part necessary for that section from each source and insert into the main document, as if it were a database query language (a use mentioned above, I believe). This is probably something a database would do better, but don’t underestimate the desire in developers for doing database-like work with XML…

Understanding how selectors are used currently

Disclaimer: I am not a browser implementor, this is only my understanding of how they do things and how costly those things are.

I’ve seen people make this assumption more than once: that selectors have equal costs for testing if an element matches a selector vs. listing elements that match the selector. That’s just not true. It’s easy to answer “Does this node match these provisions?”. It’s somewhat harder to answer “Can you give me a collection of nodes that at this very moment match these provisions?”. And it’s much harder to answer “Can you give me a ‘live’ collection of nodes that, at any one time, matched these provisions?”. In other words: It’s cheap testing an element for whether it matches a selector. It’s expensive to keep track of all elements that match the selector at any single time. The first is a single action or a few actions, the latter is many orders of complexity higher. So, browsers don’t go around keeping lists of elements that match each and every selector used in the style sheets associated with the document, they test elements for matching selectors when needed.

So, this is something people should be aware of: you won’t benefit from the current selectors implementation of browsers if you want live collections.

Design issues with the Selectors API

Selectors should be first class

My first and foremost issue with the Selectors API as it looks right now is that they are second class citizens. Regexes are first class citizens in JavaScript, and I strongly feel selectors should be, too. Why? Well, selectors are, I believe, just another regular expression language. Regex are made for matching textual patterns in strings, and selectors are made for matching node properties in node trees. So, essentially they are the same concepts, just operating on different things.

That’s not much of a motivation however. “Just because dit is this way, dat should be too!”. Is there a reason with clear benefits why I’d like selectors to be first class citizens? Well, let’s see if I can convince you of it :)

Code reuse and performance

Okay, so selectors for live NodeLists wouldn’t be performant at all. Selectors are basically bundles of boolean tests to perform on an element, none of which may be false. If selectors are first class, they can be compiled into native filters once instead of each time they are used as would be the case if they were interpreted from strings and nsresolvers each time they were used.

Matching something against a selector

Selectors should be able to match in different settings.

  1. I want to be able to take a node, and ask the selector if this node matches. selector.test(node)

  2. I want to be able to take a node, and ask the selector for the first node of all nodes in it’s decendant tree that matches. selector.search(parentNode)

  3. I want to be able to take a node, and ask the selector for a collection of all nodes in it’s decendant tree that match. selector.filter(parentNode)

  4. I want to be able to take a NodeList and ask the selector for the first node that matches. selector.searchList(nodeList) (the ECMAScript bindings could even be selector.search(nodeList), but that might not work with some other languages)

  5. I want to be able to take a NodeList and ask the selector for a collection of all nodes that match. selector.filterList(nodeList) (or selector.filter(nodeList), see previous point.)

Performing DOM manipulations based on selector matching

Some DOM manipulations would be easier if you could use selectors to do them. Such as node.removeDecendantMatches(selector) which would reasonably return a collection or DocumentFragment of the removed nodes. I can also imagine that people would want to batch import nodes from other documents based on selector matching. document.importMatches(parentNode,selector) which should return a DocumentFragment belonging to the document or maybe the more general document.importNodeList(selector.filter(nodeList)) (more on why this later syntax isn’t quite a good idea later on).

Event listeners and selectors

Event listeners set on selectors would be great, no? Well, there’s a problem there, as explained above about the way selectors are used in browsers. You’d want to set the events on a selector, and you’d expect it to work on any and all matching nodes, and be live in the way of adding or removing event listeners on elements when they suddenly start or stop matching the selector because of DOM changes. (Or user actions, think :hover.) I believe a carefully designed interface for registering and unregistering events on selectors should be possible to build. It wouldn’t have to be live, it could work by listening on just the necessary DOM manipulation events on just the necessary nodes. However, that’s probably way out of scope for the Selectors API TR.

So, what am I talking about? Well, you could write a more generic event listener that choses a path depending on whether selectors match the target element or not. It would require much more processing since you probably have to make sure you’ve added it to all elements that could possibly match, but it’s something that you could do with just the method examples from above and current DOM events.

Selectors should not be bound to a document

First class selectors should allow using the same selector on nodes from different documents, without having to bother with document of origin. So, they should not be constructed from document.createSelector(sel,nsresolver) or similar, they should be independent of the document. new Selectors(string,nsresolver) or even plainly Selectors(string) where the ECMAScript bindings allow for the nsresolver to be absent if the namespace selectors syntax is not used in the selectors string.

Is StaticNodeList really such a good idea?

Other languages than ECMAScript might be in need of some kind of StaticNodeList or similar. But really, isn’t ECMAScript the primary consumer of this interface here? As I see it, the built in Array is the better choice. Or at least, please have the StaticNodeList inherit directly from Array in the ECMAScript bindings. Because I see no reason why StaticNodeList would disallow manipulation – Unlike NodeList it isn’t supposed to be live. It has no correlation with the DOM once it has been collected. So why not allow full array manipulation? – and Array has a slew of useful manipulation tools. Especially with the extensions introduced by Mozilla’s JavaScript 1.6 additions.

Posted in Javascript |

You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.


11 Responses to “JavaScript and Selectors”

  1. Anne van Kesteren Says:
    May 13th, 2006 at 6:39 am

    I will comment on the list, but what you might want to correct here is that it’s not a Technical Report (yet), just a proposal from me. Given that the group operates in public (more or less) I can tell you that we do intend to publish this as a first public working draft soonish. That does not mean however it is ready for implementors. For that it has to be a Candidate Recommendation in theory which comes after Last Call which comes after one or more working drafts.

  2. Erik Arvidsson Says:
    May 15th, 2006 at 1:08 pm

    Very good points indeed.

    There is a huge need for testing if a node matches a selector. I think releasing a spec without that would be a huge oversight.

    What about relative expressions? In CSS Selectors everything is relative to the document. For example how would we use this to get all the children elements of an element?

    I think we could generalize the event attachment case even more. We could have an object that gets called whenever a node is inserted, modified or removed which causes a match. Basically, during the style resolution, we do callbacks to a javascript object.

    var sel = new Selector(‘.foo’);
    sel.addEventListener(‘match’, f, true); // or non EventTarget interface
    sel.addEventListener(‘unmatch’, g, true);
    sel.watchMatch(document);

    where the relatedTarget would be the node that started to match. This would require that the item keeps track of matching nodes, which like you said, I doubt that browsers do. However, letting the browser do this for us might save some time and make this less buggy/more efficient… and this case is simpler because the selector text is known. It would still be useful if we only had ‘match’ but then all apps would have to duplicate all this code over and over and over.

  3. Patrick Corcoran Says:
    May 15th, 2006 at 1:45 pm

    I agree that the current state of affairs with regard to DOM manipulation is wanting.

    But I don’t really see the need to solve this problem by adding new layers of syntax and language extensions. That to me is overkill. If every problem were solved thus, JavaScript would ultimately end up as a worse syntactical mess than perl. (With all due respect to perl… :)

    The API needed to solve this problem is in my opinion a wrapper of utility classes. Create a DOMSelector object, write an interpreted library for it that works across all browsers today, and then let the browser manufacturers internalize and optimize that in native code, all voluntarily and in good time.

    I guess I just don’t see the need to constantly formalize language extensions just to simplify the code of common design patterns.

  4. Mike McNally Says:
    May 15th, 2006 at 5:08 pm

    I don’t think there’s much of a performance issue with selectors used for event listening, at least for events driven by mouse and keyboard happenings. The only time the set of listening nodes is interesting is the point at which an event actually happens. Currently the browser can determine whether a node it’s visiting has a handler by simply checking a pointer. With selector-based registration, the browser would additionally need to match the nodes against the set of registered selectors. That doesn’t seem so bad, as for any given page the number of such selectors shouldn’t be that great, and the node vs. selector predicate would have to be fast anyway. It’s not as if the browser has to check all the nodes on a page, because it knows already exactly what nodes to check for the event.

    As to the idea of selectors being implemented as native Javascript language elements, I don’t see what good that’d do as there are no other language elements against which a selector makes sense. In general Javascript does execute in the context of a DOM, but those elements are themselves not part of the language spec per se. Thus it seems to me that definining the semantics for selectors requires references to things (like the way a node’s parent is found) that are not part of the language per se, which seems pretty weird for a language element.

  5. Dustin Diaz Says:
    June 15th, 2006 at 12:21 pm

    I don’t have much to say except for the fact that having selector support would just be way cool for JavaScript. Having seen the growing number of cases of people wanting “getElementsBySelector” and $$(selector) and…well the list goes on; it would be down right logical to just implement a feature like that within the language.

  6. Jeff Schiller Says:
    June 16th, 2006 at 1:43 pm

    Out of curiosity, assuming XML nodes, is there anything that can be done using CSS Selectors that cannot be done with an equivalent XPath expression?

    Also, I’m not sure why you believe XPath would be slower, maybe I don’t know enough about browser internals though…

  7. hax Says:
    June 20th, 2006 at 2:02 am

    XPath doesn’t have pseudo-class and pseudo-element, and the syntax of CSS selectors is refined so it is more straightforward to write and understand. Of couse there are sth XPath can do but CSS selector not, eg. query a parent node as the pattern of descendants.

  8. liorean Says:
    June 27th, 2006 at 5:05 pm

    Jeff: XPath potential algorithmical complexity is higher than selectors. Selectors matching can always be done unidirectionally, which is not true for XPath. So, selectors can be optimised harder than XPath. XPath has a worst case algorithmical complexity of O(nx) for any value of x. I’m not sure what the worts case algorithmical complexity of selectors might be, but given what I know about NodeList traversal, I’d say O(n2) sounds likely.

    If you want to find elements matching a group of selectors in a given document: You can always go from the root, testing the first sequence of selectors in each selector towards the document root element and queue up testing that sequence of simple selectors against all elements in it’s subtree. In case of a match, queue up testing the next sequence of simple selectors on the subtree of the matching element, excluding elements dependent on what combinator was used. This way, the traversal doesn’t have to visit any node more than once. If the last sequence of simple selectors in one of the selectors matched, then the matching element is added to the list that is to be returned.

    If you want to find out whether a given element matches a certain group of selectors, you do it the opposite way. You test the element against the last sequence of simple selectors, and then you traverse backwards, towards the root, testing elements dependent on what combinator was used, until you’ve either run out of elements to test or you have found a match to the first sequence of simple selectors. If you have found a match, you abort the traversal and return true. If you have run out of elements to test, you return false.

    When I said that you only have to visit any given node once, I didn’t take certain pseudo class, pseudo element or negation simple selectors into account. Either the testing for these can be done traversing backwards on each node, or it can be moved from the simple selectors testing into the traversal subtree exclusion by the code generator.

  9. Lee, ChihCheng Says:
    September 2nd, 2006 at 1:03 pm

    Bug report:

    If use UTF-8 charset, the tooltips will die.

    for example, search the: BubbleTooltips.html file, and change:

    charset=iso-8859-1 -> utf-8

    then , the ToolTips will not work.

  10. Ann Birret Says:
    June 15th, 2007 at 9:09 am

    Thanks for this really interesting post. It appears really helpful for me. I would like to ask you if I could translate it and include it in our page, also with link to your page. Alternatively I would like to put link to your page on my section with interesting articles. If it would be possible to put this link on my page please email me. One more time thanks for really great article. Greetings

  11. Meredith Borstadt Says:
    February 3rd, 2010 at 12:50 am

    Excellent blog. You have brought in a new devotee. Please maintain the great writings and I look forward to more of your newsworthy updates.

Leave a Reply