Post Archive

› October 14, 2004

XML for ECMAScript

  • Reported by liorean

For a long time, I've been frustrated that there has been no good way of semantically marking up JavaScript in XML, for code samples and walking somebody through a script in tutorials or articles. So, in June this year I started a project to create a DTD and XML namespace for marking up JavaScript as XML. I've worked strictly from ECMA-262 3ed, and have just now finished it to the point that the entire token tree is representable. However, there are many aspects of it that need some consideration and there are design and extensibility concerns that must be taken into account.

Frankly, I don't think I as a sole developer will be able to finish this, nor be able to make the best decisions for this project. So, I'm here opening up for discussion the idea of making this a community project. Anyone interested in helping with development, organsation and/or management of this XML application, please chime in here. Also if you know of a similar project please tell me, as I don't want to be doing any duplicate work I can avoid.

Currently, all development has been done on the DTD file, and nothing else. The DTD file as it looks right now is accessible from http://dtd.liorean.net/tes/3.0/.

Update: Eric Lippert asked me for an code example of this markup. Well, the current version is large and cumbersome, but I'll give you an example of both the current structure and the way I want to take this. First, the JavaScript:

function fn(a,b){
   var c=3*a-b;
   return "3*a-b: "+c;
}

Currently, this would be marked up like this:

<?xml version="1.0" encoding="utf-8">
<!DOCTYPE Program PUBLIC "-//liorean.net//DTD TES 3.0//EN" "http://dtd.liorean.net/tes/3.0/">
<Program xmlns="http://ns.liorean.net/tes/3">
   <FunctionDeclaration>
       <Identifier>fn</Identifier>
       <FormalParameters>
           <Identifier>a</Identifier>
           <Identifier>b</Identifier>
       </FormalParameters>
       <VariableStatement>
           <Identifier>c</Identifier>
           <Assign/>
           <AdditiveExpression>
               <MultiplicativeExpression>
                   <DecimalLiteral>3</DecimalLiteral>
                   <Multiplication/>
                   <Identifier>a</Identifier>
               </MultiplicativeExpression>
               <Subtraction/>
               <Identifier>b</Identifier>
           </AdditiveExpression>
       </VariableStatement>
       <ReturnStatement>
           <AdditiveExpression>
               <StringLiteral>"3*a-b: "</StringLiteral>
               <Addition/>
               <Identifier>c</Identifier>
           </AdditiveExpression>
       </ReturnStatement>
   </FunctionDeclaration>
</Progam>

And the way I would want to turn this:

<?xml version="1.0" encoding="utf-8">
<!DOCTYPE program PUBLIC "-//liorean.net//DTD TES 3.0//EN" "http://dtd.liorean.net/tes/3.0/">
<program xmlns="http://ns.liorean.net/tes/3">
   <fn>
       <id>fn</id>
       <params>
            <id>a</id>
            <id>b</id>
       </params>
       <var>
            <assign>
                <id>c</id>
                <subtract>
                    <multiply>
                        <decimal>3</decimal>
                        <id>a</id>
                    </multiply>
                    <id>b</id>
                </subtract>
           </assign>
       </var>
       <return>
           <add>
               <string>"3*a-b: "</string>
               <id>c</id>
           </add>
       </return>
   </fn>
</program>

The latter doesn't conform as strictly to ECMA, of course, and there are a few things that are questionable. (Merging or keeping separate the constructs of function declarations and function expressions, for instance.)

Comments

1. October 19, 2004 09:08 AM

Quote this comment

Curcan Ovidiu Posted…

I think XML is waaaay too verbose for that. The format will be bloated. Why not just use CDATA sections to include the code samples in your XML documents?

2. October 19, 2004 11:43 AM

Quote this comment

liorean Posted…

Well, it's verbose, I give you that - it's not without being designed that way. This language is made to be a semantic representation of JavaScript source code. In a CDATA block, the parser doesn't know a string from an identifier from a number from a function - it's all just textual data. This language is meant to know the difference. It knows a function is a function, a string is a string, an identifier is an identifier. Extending it a bit from how it looks today, you could add an id to each variable and function declaration, pointing to them from identifiers could act as links for instance. You can use css to hide or display function bodies as needed, to use what indentation you want, you can chose to wrap statement bodies in curly brace blocks always or only when needed, you can use it to insert semicolons or line endings or what you wish for statement terminators. You could use XSLT on the server to, from the same TES document, generate either a fully commented script or a compact one for use on the web. You could use XSLT on the client to generate code listings for iew where the microsoft excuse for CSS2 features aren't able to do it.

3. October 20, 2004 03:27 AM

Quote this comment

Curcan Ovidiu Posted…

Thanks for the explanation. To be honest, at first, I kinda missed on the whole purpose of the language. :) Now it's all clear, and it sounds like a great idea indeed.

4. November 3, 2004 10:24 PM

Quote this comment

Jay Adams Posted…

As I've told Eric, don't forget about us vbscripters. There's still life in the language yet. In any event, XML for scripting does seem a bit bloated. I understand the end result that you seek, but it seems futile. Why not create a metadata standard for all scripting languages to make porting code back and forth to other languages and platforms easy via XML? Just my 1.5 cents :)

5. November 5, 2004 06:27 PM

Quote this comment

Svend Tofte Posted…

For all languages? I was wondering recently, why there was no "coloring" program, that worked for "all" source code. And went to think about how you'd make this. The problem is of course, that you need to basicly write a YACC style program, that accepts as input the tokenization of the intended target language, and returns a program that will colorize the source code. That's basicly the same problem here, except here you want to make the semantics explicit (in the coloring system, you'd expect colors to be coherent, variables are, say green, in all languages, so there is some level of semantics, but not strictly so).

This could potentially be pretty cool, in code exploring tools, built, say on Mozilla, since it already has such tight DOM/JS integration. Venkman could get a little spruced up. Essentially, keeping an AST of sorts in the memory, a little like how Visual Studio seems "dynamic".

6. November 5, 2004 11:21 PM

Quote this comment

liorean Posted…

VBScript and ECMAScript aren't so syntactically, grammatically or semantically close really. And if you want to add C# or possibly even C/C++ to the mix, you'd find yourself in a really messy situation. Add on top of that PERL, Python, Ruby, Scheme, ML, Haskell, APL, J, Prolog and hundreds of other languages, and you'll soon find the largest common denominator so worthless that it can't do anything. The other way would be to work in the entire feature set of those combined, and then exclude based on the language you are representing. Neither choice is particularly attractive to me.

Returning to VBScript: Admittedly I know very little about VBScript but as I understand it it's an entirely differentent ballgame. A few examples: VBScript and ECMAScript use different ways of escaping characters inside a string. ECMAScript identifiers are always case sensitive. The languages use different keywords for common uses. Worse, common operators sometimes act different.

IIRC VBScript doesn't have first class functions (aka lambda expressions, aka function objects) as distict from function calls. The scoping system in ECMAScript is more complex than that in VBScript. Another interesting feature of ECMAScript is the new keyword and the function call/constructor call distinction. In fact, the entire object model is different from that in VBScript. We also have differences such as separate numerical add and string concatenation operators in VBScript (add can act as string concatenation, but in the case of one numerical and one string operand, it uses add instead of concatenate, which makes it incompatible with the combined add/concatenate operator in ECMAScript which acts as concatenate in the same situation). We have pure feature differences such as constants which does not exist in ECMAScript, function/procedure distinction, differences in error handling. In short, to represent VBScript you would be better served by a similar language made for that language instead of me trying to make this language able to handle both. I'm having trouble with making some decisions as to interpretation and practical representation of some concepts of ECMAScript, not to speak of Netscape JavaScript and Microsoft JScript, and then on top of that E4X and the future ECMAScript 4 spec (if work on it resumes). VBScript would definitely be beyond me, I can't devote that much time for this.

7. November 10, 2004 08:20 AM

Quote this comment

Svend Tofte Posted…

It all ties back to the sorce code colourizer that works for any language. It'll only work if you generalize it SO hard, that the markup becomes virtually meaningless. All programming languages are drawn from the same language afterall (let's just assume ASCII), and they all follow certain rules (Chomsky hierachy, etc). They tend to be defined via grammars. So naturally, a generalization would not be impossible. The language would just have to have a huge amount of elements to drawn on, all possible elements, drawn from across all possible languages. And then of course, the even larger task of, from a given string, actually constructing the tree.

Theoretically possible, and for a source-code colourizer (which is a little less semantic), it may be possible. Both otherwise, I would guess not. At any rate, my understanding of the theory of programming languages is certainly not strong enough to make any final statements. It would be energy wasted, to make it cover just JScript+VBScript. If it had to be generalized, it should be generalized properly, so that it may cover any language, which is constructed in a given way.

Consider posting to Lambda the ultimate, for better comments ... ( http://lambda-the-ultimate.org/ )

8. November 21, 2004 05:58 AM

Quote this comment

Erik Arvidsson Posted…

To me this looks just like a serialization of the JS parse tree, which is exactly what I've been looking for for quite some time. Things like these are useful for tools that I want/have. Java doc tool (does it include comments?). White space removal and obfuscation.

9. November 21, 2004 06:37 AM

Quote this comment

Armand Posted…

mm interesting but why so bloatd We have done something similar like this a while back here is some of my code

	[function name="getSum1" params="param1, param2">
		[return expr="param1 + param2"/>
	[/function>

[do-while test="k 
    [s:alert params="'testalert'"/>
    [s:object.doSomething params="'hbk', 5, 5"/>
[/do-while>

and mixed with xhtml

[html xmlns="http://www.w3.org/1999/xhtml" xmlns:jsxml2="http://www.schwingsoft.com/ns/2004/jsxml2" xmlns:short="http://www.schwingsoft.com/ns/2004/jsxml2-shorthands">
	[head>
		[title>Sample XML2JS embedded in XHTML
		[script type="text/javascript">
			
			[jsxml2:function name="doAlert" params="val">
				[short:window.alert>
					[jsxml2:param expr="val"/>
				[/short:window.alert>
			[/jsxml2:function>
		[/script>
	[/head>
	[body>
		[p>Press the button below to test the XML2JS 'doAlert' function:[/p>
		[div>
			[button onclick="doAlert('Embedded XML2JS in XHTML works!')">test[/button>
		[/div>
	[/body>
[/html>

10. February 14, 2005 08:57 PM

Quote this comment

Renaud Waldura Posted…

I have done something like this. It doesn't conform to your DTD, but I've got a working implementation. I basically adapt the AST returned by the Rhino parser to a DOM. It works, I can transform Javascript with XSLT.

Please contact me at renaud+js@waldura.com.

11. February 16, 2005 02:46 PM

Quote this comment

Renaud Waldura Posted…

Example:

function _padNumber(theNumber) {
	if (theNumber < 10)
		return "0" + theNumber;
	return theNumber;
}

Becomes:

<SCRIPT>
  <FUNCTION/>
  <FUNCTION>_padNumber<BLOCK>
      <BLOCK>
        <IFNE>
          <LT>
            <NAME>theNumber</NAME>
            <NUMBER>10.0</NUMBER>
          </LT>
        </IFNE>
        <RETURN>
          <ADD>
            <STRING>0</STRING>
            <NAME>theNumber</NAME>
          </ADD>
        </RETURN>
        <TARGET/>
      </BLOCK>
      <RETURN>
        <NAME>theNumber</NAME>
      </RETURN>
    </BLOCK>
  </FUNCTION>
</SCRIPT>

It's based off a true Javascript parser, so there are some redundant elements. But nothing a good transformation can't solve right?

Anyway, let me know if anyone's interested in developing this further.

12. February 18, 2005 11:33 AM

Quote this comment

liorean Posted…

Renaud, I've got an interest in seeing that. I'll see if I can get a better forum for this discussion than comments on WG, though. Maybe a forum or a mailinglist.

13. February 18, 2005 11:51 AM

Quote this comment

Armand Posted…

I will also release my solution soon just have to make a schem and some innstruction documents