YAJSML

Ugh, I’m pretty tired of the endless parade of “Oh hai, iz wrtn a JS loaderz” projects. Given the number of existing implementations and the general solved-ness of the problem, the time devoted to it is disappointing. But here I am, doing just the same.

## Principles

This work is related to that done in [RequireJS](http://requirejs.org/) and [CommonJS](http://wiki.commonjs.org/wiki/Modules), but hardly [bound](http://tagneto.blogspot.com/) by them. Instead, the results are a product of the following principles:

* Improving the loading characteristics of a JavaScript project should be approached as an incremental optimization problem.
* Simplicity is best. Only implement the necessary features.
* Caching should be exploited. Expensive one-time operations are acceptable provided their responses are reusable.
* [Modules](http://wiki.commonjs.org/wiki/Modules/1.1.1) are well defined, widely used, and well founded. The five different asynchronous loading specifications, not so much.

This leads to what I’m given to think is a much [simpler version that works](http://c2.com/xp/DoTheSimplestThingThatCouldPossiblyWork.html) than existing implementations. The other nice thing is that the packaging tool takes an original approach to solving the dependency issue.

## Observations

### Optimization
Naïve implementations are good. Those implementations may be slow, but they are also cheap and set the stage for proper [optimization](http://c2.com/cgi/wiki?RulesOfOptimization). Chances are, that many possible optimizations are rendered unnecessary by the right tools.

### Dependencies
Most current implementations use one of [five](http://wiki.commonjs.org/wiki/Modules/Transport) ways to wrap a module’s code with a description of the dependencies that that code requires, and which a library will fetch asynchronously, finally evaluating the modules code once all are loaded. The thought being that, once that module is received, first all of its dependencies need loading (asynchronously so they are non-blocking, natch).

IMHO, that obscures the more obvious and important observation, that having dependencies that aren’t loaded by the time the current module is loaded, asynchronous or not, is *never* good. If this is to be treated as an optimization problem, then the issue is one of packaging. If the packaging works well, the question of synchronous/asynchronous loading is moot.

### Packaging
Existing packagers all perform some sort of parsing on source files, usually a regular expression, maybe a full preprocessor language. Both approaches have the downsides of requiring boilerplate code or being unreliable. There is also the additional complication of describing lazy dependencies so that they do not get confused with loading dependencies.

The good news is that, given the availability of non-browser interpreters, there is a third way, where the code itself can be evaluated offline and dependencies *extracted during run-time*. Not only would this extract only those dependencies needed exactly at load time and require no boilerplate, using the same kernel in both environments, it would keep both the client and the packager’s results consistent.

### Versioning
The current practice for deploying JavaScript is to set a query parameter like `bust=v_n+1` on the URL of the script’s location to, in effect, invalidate the cache. This happens to work in the monolithic file case, however, lazy loading code makes versioning a problem that cannot be ignored. While new clients will use `v_n+1`, clients using `v_n` code must continue to receive `v_n` code. For this reason, versioning should be reflected in the base URI.


http://assets1.example.com/js/src/n/
http://assets1.example.com/js/src/n+1/
http://assets1.example.com/js/src/n+2/

### Caching
It’s a common observation that different areas of a project change at different rates. This is certainly the case in web applications where library code will change much slower than application code. It follows then that updates to application code should have no effect on still cacheable library code.

Convention already specifies this using a leading slash for `’/application/code’` and none for `’library/code’`. This is simple to exploit by allowing different URIs for the two classes of code.


http://assets1.example.com/js/lib/0.1.2/
http://assets1.example.com/js/src/0.3.0/

## Implementation
The tool that implements this loader provides two things things, a kernel and a module compiler. For the moment it is on an [experimental branch](https://github.com/cweider/modulizer/tree/experimental) of the Modulizer project, though I’m beginning to like “Yajsml” more and more.

The kernel is a terse bit of JS that provides the module loading and fetching functionality. It has no references to the global environment and, by default, exports to the `require` symbol. It’s the part that enables a simple page like this:

<script type="text/javascript" src="/kernel.js"></script>
<script type="text/javascript">
  require.setRootURI('/js/src/');
  require.setLibraryURI('/js/lib/');
  require.setGlobalKeyPath('require');
  app = new (require('/app').Application)({
    "userId": 1234
  , "baseURI": "http://example.com/"
  });
</script>

The module compiler takes a number of paths and compiles them into a `require.define()` call. Using the command line tool:

../modulize --output code.js.     \
            --root-path ./src     \
            --library-path ./lib  \ 
            --import-dependencies -- ./src/app.js

Produces the following package:

require.define({
  "/app": null
, "/app.js": function (require, exports, module) {
    var models = require('/models');
    var util = require('util');
  }
, "/models": null
, "/models.js": null
, "/models/group": null
, "/models/group.js": function (require, exports, module) {
    exports.Group = function () {
      /*...*/
    };
  }
, "/models/index": null
, "/models/index.js": function (require, exports, module) {
    exports.User = require('./user').User;
    exports.Group = require('./group').Group;
  }
, "/models/user": null
, "/models/user.js": function (require, exports, module) {
    exports.User = function () {
      /*...*/
    };
  }
, "util": null
, "util.js": null
, "util/index": null
, "util/index.js": function (require, exports, module) {
    exports.escapeHTML = function () {};
    exports.escapeHTMLAttribute = function () {};
    exports.importantURL = 'http://example.com/';
  }
});

## Future
This the first iteration in a longer project with several more big ideas to adopt, but, IMHO, aside from one or two missing features, this is a pretty comprehensive solution for the problem of distributing code from the client’s perspective. The remaining improvements revolve around improving the optimization of packaging and using the cache more effectively.

Regarding effective use of the cache, having module requests get redirected to designated/canonical packages has lots of potential to increase cache hits when loading order varies – such as across pages. As far as finding an optimal packaging goes, it’s the kind of problem that sounds like the perfect job for some sort of nondeterministic heuristic-ish algorithm.

And finally, while the two buckets, `libraryURI` and `rootURI`, are probably sufficient for most projects, the thought of allowing for multiple library paths is appealing. Searching would of course be made more expensive for some modules, but I suspect that ordering the search paths by increasing frequency of updates may allow caching to compensate for this.

## Updates
It’s since become clear that parts of this discussion are reasonably independent of each other, so they’ve been broken into their own projects:

– [require-kernel](https://github.com/cweider/require-kernel): A minimalist implementation of `require` that supports asynchronous retrieval.
– [yajsml](https://github.com/cweider/yajsml): An asset server that performs packaging and clever things like redirecting to a canonical resource.
– [modulizer](https://github.com/cweider/yajsml): A tool that finds dependencies at runtime.

Finding JavaScript’s Global Object

With JavaScript code being written in ever more diverse environments these days, some assumptions are bound to get broken. One such assumption is that the object bound to the symbol window in the current scope is the global object. Every approach I’ve seen searches through a list of probable symbols and returns the first defined, instead of using the language itself.

var global = (typeof window != 'undefined' ? window : global)

Below is a snippet that will return the global object independent of scope and interpreter.

var global = (function () {return this})();

Note: except in the rarest of cases, direct address of the global object is illegitimate regardless of approach, using this more robust snippet is no excuse.