TypeScript Compiler : Documentation Output

Recently I have been developing my TypeScript UI project, which is hosted on CodePlex. CodePlex comes with a reasonable Documentation Wiki tab and so I have been trying to build documentation for all my classes, interfaces etc. both inline (i.e. in the code) and on the project site. However, manually converting JSDoc to Wiki Docs is slow, laborious and very hard to keep up-to-date. To add to this, my sister has agreed to translate much of the online documentation into German. This presents me with the issue of how to automate documentation generation so I can get on with coding, and how to guide her on what does and doesn’t need translating.

My solution to this problem is to modify the existing TypeScript compiler to add a “–documentation” option which will output documentation files for all the classes, interfaces and enumerables in a standard format (“.ts.wiki” files). I can then write a short C# program to parse these files and show my sister what needs translating.

This seems like it ought to be relatively simple, but this turns out to be horribly messy and tricky. This is mainly due to two big issues with the TS Compiler:

  1. There is a total lack of comments/documentation on how the compiler works or what any of the names mean
  2. It’s not designed (so far as I can tell) with a proper post-processor

I have been tackling these issues and have been making reasonable progress, so in this article I will begin to explain where I’ve got to and where it is heading.

Where I began

I began by looking at the TypeScript compiler and considering which of its existing outputs would be closest to what I wanted. The compiler in its original form has two main outputs:

  1. The JS files
  2. Declaration files

The JS files are the compiled code and not exactly useful as documentation, especially since it generally doesn’t contain the comments and isn’t the TypeScript code. I deduced then that whatever code produced the JS probably wasn’t going to help me in producing TS based documentation with JSDoc descriptions included (which are, of course, comments). Declarations files, however, contain TypeScript output, with or without comments, in a standard format and not including any of the actual code. So essentially, documentation but layout out in a different way.

To proceed I knew I would need to add a new option to the compiler and a new file format. By looking at the file names and a few bits of the code I worked out that “emitter“s are the things which use “walkers” to go down the symbol tree and emit the relevant output to a file. So I copied and pasted a version of the declarationEmitter.ts file and refactored till it was a “documentationEmitter”. Finally, by trawling the code I was able to duplicate the declarationEmitter lines of code and change them to documentationEmitter code thus adding –documentation as a compiler option and .ts.wiki as an output format.

Hacking the declarationEmitter

The next stage (and my current stage) is hacking the declarationEmitter code till it becomes a documentationEmitter. A challenge with this is that I’m not outputting actual code nor am I trying to output it to a single file nor in the order that the coee is in the script file (for instance I want to order function names alphabetically and separate functions, properties etc. into groups). This presents the post-processing issue. TS is designed to output as it parses which works fine for a compiler like this, but not for documentation. Documentation needs to be written to file out-of-order (with respect to the code) and in the full light of all relevant code around it. I have therefore, come up with a workaround to the lack of an obvious (if any) post-processor.

The declarationEmitter class contains a “close” method which is supposed to close the current emitters file output stream. I am going to use the documentationEmitter’s “close” method as my post-processor kick to emit the documentation (and emit to multiple files). The rest of the emitter code will build a documentation-block tree as an intermediate step between symbol-tree and documentation output. This means changing all the “emit” and callback methods so that instead of immediately writing to the output file, they are context aware and emit to the current documentation block (or create a new where block appropriate).

A documentation block will consist of the text for that block, what type of block it is (e.g. class block, module block, function description block, etc.), the block signature (e.g. public, private, public static, private static), a reference to the block’s parent documentation block and an array of the child documentation blocks. This will allow me to construct a tree of documentation where the text is ready it just needs piecing together in a different order (e.g. class title then what module (namespace)  it belongs to).

This re-coding should be simple, and conceptually it is, but in reality this is a laborious and tricky process. Some of the names of the emitter methods are obscure like “emitTypeNamesMember” (which so far as I can tell, emits the type information for a function, property, variable or something else e.g. number or { x: number; y:number }). It is not exactly clear for someone who doesn’t know what it does or what the exact contents of the symbols are. So at each stage I am left with the following steps:

  1. Hack it so it emits to the current documentation block but otherwise output remains the same
  2. Hack as much else of the code till it all compiles and produces some form of vague documentation-like output
  3. Deduce what incorrect documentation (either in content, format or both) came from where
  4. Work out what the hell it should have been, if it should have been there at all, and if appropriate, where in the documentation it is supposed to go.
  5. Go back and re-write the code to make it look right
  6. Repeat the above

Not the nicest way to develop since it gives me no real solid idea of how far I have gotten, how much work is left and leaves a lot of guessing (not least I have to mangle 1259 lines of code before it even compiles!)

Advice for others

If you want to do this sort of thing, good luck. It is difficult to get your head around and definitely time-consuming (unless you happen to know your way around a compiler so well that nothing is new to you!) Here’s some information that may help you:

  • Don’t get hung up in TypeScript Services or the Harness – they don’t really help you if you are trying to extend the compiler functionality.
    • Services and Harness are (so far as I can understand) wrappers for:
      • Supporting Node.js, Windows Script Host and web browser environments
      • Diagnostic services/information
  • TypeScript.ts contains the overriding control logic but doesn’t do any parsing etc. in itself – add methods here and link them up for things like calls to new emitters
  • AST – Abstract Symbol Tree – This is the breakdown of the TypeScript into symbols going down from Script (i.e. file level) through modules, classes, functions all the way to variables and their type specifiers along with comments.
    • You do not necessarily have to handle every possible type of symbol – just the ones you are interested in – you can add a general catch-all (that does nothing) for the rest
  • ASTWalker – A “walker” literally walks you down the symbol tree, symbol by symbol and you can request certain information about symbols as you go (e.g. directly related comments, symbol type, symbol name)
    • The walker has two callbacks, pre and post, pre happens just before it “walks over” the symbol, post happens just after it walks over the symbol
    • Pre and Post must return booleans:
      • returning false for pre (I think) makes the walker skip processing the symbol and its children (and you don’t get a post call)
    • Pre and Post pass you a symbol representation object which gives you everything you need about that symbol (though naming is obscure and beware not all properties are always there e.g. ASISymbol can be null)
    • Use GetASTWalkerFactory().walk(pre, post) to start walking down a tree – you can often use the same method for the pre/post callbacks with an extra parameter – see DeclarationEmitter.emitDocumentation for an example
  • Emitters – This uses a walker to walk the symbol tree and handle symbols it is interested in (e.g. declarations emitter only handles public or exported or declared symbols such as exported modules and class but not private variables or code within functions)
    • Emitters output to a particular document (file) specified when they are created (but you can create other new files within the emitter)
    • Emitters can be told to output to a single file but you may wish to ignore this
    • Emitters generally contain callbacks for each symbol type that process the symbol and then pass the essential information to emitter methods
    • Emitter methods actually write to the doc file
  • IOHost – This is global to the compiler to standardise IO to files (to make it work across Node.js, WSH, web-browser)
    • Because this is global you can use it from anywhere so you can use it to create files (there is only one instance per compiler (program) instance)
  • Process – Again this is global and has some very helpful methods for giving debug trace
    • process.stdout and process.stderr is accessible from anywhere – use .write (with “\r\n” for new lines) to emit debug info e.g.
      • “declarationEmitter.ts : Line 59 : Constructor called\r\n”
      • This si a good standard output that lets you trace back to the TS source line easily (don’t forget the \r\n or everything ends up on one line!!)
  • If you are looking for a particular bit of symbol information, think what it is compiled to, find where it gets compiled the main compiler code, copy paste what is there! It is the fasted way to work out how to get certain information. Also, look at what type of PullDecl is used – it affects what information is visible/accessible.

I hope this article helps someone with their attempts at hacking the TS compiler and I will hopefully be submitting my code to the TS CodePlex project at some stage in the future (if not, I’ll at least post the code online for others to use so check back here for updates or follow me on Twitter!).

TypeScript UI : Data Binding

I recently announced that I am developing Data Binding for TypeScript UI (to be included in version 1.0.2). This article will give some more detail about why this is a good idea and how I plan to implement Data Binding into TypeScript. Data Binding, for those who don’t know and for clarification, is a method of linking data in a source (typically a web server) to a UI control that the user sees. Data Binding involves a number of steps which can be thought of in this structure:

  1. Access
  2. Adaptation
  3. Binding
  4. Update

1. Data Access

Data Access is the process of requesting data from the data source or pushing data to the data source. In a web-based context, it has typically been a full page request or more recently AJAX requests. “HTML5” (to use the umbrella term) introduces Web Sockets – two-way streams from server to client – and these are proving particularly good for creating responsive apps. TypeScript UI will support two basic types of data accessor, two types of data and three data formats:

  1. AJAX : String data : XML or JSON
  2. Web Sockets : String/binary data : XML or JSON or Raw data

The data from data accessors gets passed to data adapters, which transform the code-readable data into human-readable data, or data which the UI control can understand.

2. Data Adaptation

Data Adaptation is the process of taking raw data from a data accessor and transforming it into human-readable data or data which a UI control can understand. Data Adapters also handle the reverse process. TypeScript UI Data Adapters will have two main functions: I2O and O2I – standing for Input to Output and Output to Input; the latter being the inverse process of the former. The main intention is for them to convert data from a data accessor to data that a UI control can understand. However, they could be used to adapt any form of data or variable arbitrarily; they will have no rigid link to the rest of the Data namespace i.e. Data Adapters will be usable as a standalone unit.

3. Data Binding

Data Binding is the magic of this whole chain. A Data Binding has two main components: the UI Control property and the Data Adapter/Accessor pair. The Data Binding links the inputs/outputs of the Data Adapter to the UI control property. When the server sends updated data, the binding will get the callback from the data accessor, process the data through the data adapter and subsequently update the UI control. Likewise, if the user changes the value of the control, the binding will send the new data to the server, via the Data Adapter/Data Accessor. That’s the simple understanding of what it is supposed to do; the implementation is rather more complex. TypeScript UI will actually implement a more flexible, powerful structure than just plain data binding. The reasoning behind the structure comes from the inclusion of Data Updaters, so I move onto that and come back to Data Binding implementation later.

4. Data Updaters

Data Update is the process of updating the UI control data from the data source or vice-versa. However, Data Update is handled mostly by the data binding, so what is a Data Updater? A Data Updater is code which organises periodic update of data (in whatever way the programmer decides). The primary aim is to allow, for example, periodic updates such as fetching new Tweets or messages from a data source. TypeScript UI will implement a simple Data Updater structure which contains an interval setting, the update method to call and three events:

  • OnUpdateInvoke – Occurs when the update method is invoked.
  • OnUpdateBegin – Occurs when the update of data actually begins.
  • OnUpdateEnd – Occurs when the update of data ends.

TypeScript UI Structure / Implementation

The structure is currently as mentioned in the paragraphs above. So to finish off the explanation from section 3 then. TypeScript UI will implement a structure which includes Binding Groups and Binding Collections (a collection is a list of groups). Each UI Control will have an instance of a Binding Collection to which all Binding Groups will be added. A Binding Group will contain the list of one or more Data Bindings in the group and an associated Data Updater that will handle updating all the bindings in that group when requested (or scheduled). The Data Updater may be omitted if no updater is wanted. The reason for this slightly more bloated structure is that it allows multiple data bindings to be updated at the same time, by the same updater without needing to store multiple references to an instance of an updater or data binding. This simplifies managing where bindings are created, held and destroyed and also allows more powerful update systems to be developed. The current working version of the full structure is shown below (image from the documentation on CodePlex):

TypeScript UI Documentation : Data Binding Diagram
TypeScript UI : Data Binding


If you have any comments specifically about TypeScript UI Data Binding, please either create a discussion on CodePlex or fill in the comment form below. All (mature) comments are welcome!