Notes to myself

An effort to extend the time between the recently learned and soon forgotten

January, 2015

DIY sunburst

Sunburst visualizations can be visually appealing. The idea of the sunburst chart is one of nested pie charts, with adjacent rings implying connections within each arc, as suggested by the coloring of the example below (one of Bostock's many examples.) In addition to the rich network of associations suggested by a visual inspection of the graphic, note that clicking on any of the individual subsections will allow you to zoom in to view only that subsection and its children, ignoring everything else in the plot. (Click on the center to reset the original display.)

This is all great, but the challenge for anyone wanting to utilize a sunburst is that this complex, hierarchical visual representation requires a complex, hierarchical JSON-based data structure. Therefore even if you have a good reason for such an illustration and good underlying data, you have to build up a structure in real, parse-able JSON before you can consider using some borrowed D3 code to create the picture in your webpage. When I was drawing a similar picture for someone recently I asked her to put her data into a structure that is fundamentally linear, but which is then converted internally into the necessary hierarchical structure. In case anyone else could use such a capability I'll include a link to the operational web-based server below.

Note: that there is still no getting away from generating legal JSON -- all I'm claiming is that this new structure might be a little easier to generate. This new approach requires two files; one containing the name of every distinct subsection, and the other using a series of strings delimited by periods that describes exactly how those subsections are related to one another. The first file I'll call 'categories', and here's an example of how it might look:

    [
        {"index":"1","name":"protein"},
        {"index":"2","name":"chaperone"},
        {"index":"3","name":"cytoskeletal protein"},
        {"index":"4","name":"enzyme modulator"},
        {"index":"5","name":"kinase"}
    ]
    

This file should have one line for every single subsection in the graphic. Each subsection should have an index number ( they don't have to be sequential) and a name ( which will show up in the final graphic).

The second file is called 'elements' and it consists of references to the index numbers in the categories file. Here's an example of how it might look:

    [
        {"hierarchy":"1"},
        {"hierarchy":"1.2"},
        {"hierarchy":"1.3"},
        {"hierarchy":"1.5"},
        {"hierarchy":"1.5.4"},
        {"hierarchy":"1.5.4"}
    ]
    

The idea is that numbers on the left side of the numeric string with periods refer to inner rings in the sunburst, while the numbers later in the string refer to rings progressively further to the outside of the sunburst. In the example above the Sunburst would have one root element at the very center, three elements in the surrounding ring, with one of those subsections having two children and the other subsection having one. Repetitions numerical sequences are okay, and they simply indicate that there is another instance of a particular category ( and that the specified arc should therefore be proportionally bigger).

Here is the graphic the program created when I fed the contents of the above two files in through the user interface:

kinaseenzymemodulatorcytoskeletalproteinchaperone

Three cautions:

  • every number referenced in the elements file must correspond to an index in the categories file, or else the results will be unpredictable.
  • The resulting data structure is strictly hierarchical, which means every child can only have one parent ( the parents can of course have multiple children). If you go ahead and connect a child index to multiple parent indices then the software will assume that the children are distinct copies of one another. You could end up with more children arcs that way than you expected.
  • you should probably provide a common index (in the example above its index 1) from which all other arcs to send, since that approach tells the program explicitly how the whole picture should fit together

Here is the link http://graphicscow.com/probe/uploadPrep. Give it a try if you like, and if you come up with any especially interesting sunburst graphics then I would love to receive a copy of your data set ( use the 'contact' link) and see the picture for myself. Thanks!

How clean is your JavaScript?

JavaScript as a language has undergone a vast surge in popularity over the past few years. As recently as five years ago the language was despised by many (including me) as unreliable and filled with browser dependent peculiarities. Furthermore I would've claimed at that time that writing any sort of a large project in JavaScript would've been a futile effort, since the lack of an object orientation led inevitably to poorly structured and fundamentally unmaintainable code. A few years, however, have changed everything. ECMA 5 brought reliable standardization across browsers, and JavaScript has become the principal tool for giving websites interactivity, and people (again including me) are learning to use the language in a way that makes large-scale software development quite feasible.

The language is still loosely typed, of course, and makes no claims about object orientation. The trick, therefore, is to adopt language patterns suited to Java's functional nature. I thought I'd sketch out three of my favorites. I'll provide references when possible, but some of these patterns come from the JavaScript zeitgeist and aren't ( to my knowledge) attributable to any one author.

  1. The module pattern

    The focus of this pattern is the explicit identification of accessible interface functions. I first saw this pattern identified by name in JavaScript Patterns, by Stoyan Stefanov.

              var myFunctionName = function (argument1) {
    
               var privateFunction = (function () {
                 return 1;
               },
    
               publicFunction = function () {
                 return privateFunction()
               };
    
               return {
                    publicFunction: publicFunction;
               }
            }());
         

    This is one of my favorite patterns for holding together a collection of related methods. You will see it used repeatedly in the D3 code and make available on this site.

  2. The encapsulate immediate execution pattern

    Encapsulate absolutely everything, minimizing namespace pollution. I may have seen this in Secrets of the JavaScript Ninja (by John Resig) but I'm not sure

              (function (argument1) {
    
                  var hi = 47;
                  console.log('hi='+hi);
    
            }());
         

    Instead of wrapping the whole function in parentheses there is another approach to forcing immediate execution, and that is by prepending an exclamation mark

              !function (argument1) {
                  var hi = 47;
                  console.log('hi='+hi);
              }();
              

    It works just as well and takes one fewer character, but to me it seems a little more obscure.

  3. A trick for namespace management

    The following is a one-liner I use to hold all code (including immediately executed blocks, as in the previous pattern) inside of encapsulating variables

           var baget = baget || {};  // encapsulating variable
    
           baget.someFineNewVariable = 47;  // adding something to your encapsulating variable
          

    The idea here is to store everything you write inside one (or else a very small number) of global variables. The idea of the one line of code is to define a new variable 'baget', in my example. If it already exists then assign it to itself, no harm done. If it doesn't exist then create a new object upon which you can hang everything else. The advantage of this approach is that multiple references to the same high level variable in different JavaScript files will not clobber one another.

Disease rising

While science steadily continues to improve our ability to identify new therapeutic drugs, the dangers of infectious disease are actually increasing over time. While a variety of forces are at work, there are two crucial, largely unavoidable factors:

  • microorganisms reproduce very quickly and commonly mutate readily, allowing them to evolve compensatory mechanisms to address the ways we would fight them, and
  • increasing movement of people and organism around the globe leads to outbreaks of invasive species, especially microscopic species that can travel within host species
These problems are intertwined of course, with global travel bringing previously geographically separated species into contact, leading to evolutionary interactions favoring smaller, rapidly reproducing organisms. One or both of these factors are presumably behind the troubling outbreak of Chikungunya into the Americas.

The geographical spread of Chikungunya virus was previously limited to sub-Saharan Africa, India, parts of Southeast Asia and Indonesia. While the disease is not typically fatal it is still pretty nasty: rash, headache, nausea and inflammation of the eyes may all be present, often along with fever and severe joint pain. Sufferers usually exhibit symptoms lasting from several days to several weeks, but occasionally the disease morphs into a chronic condition that can be debilitating. The disease is known to be spread by mosquitoes, and was first identified in Tanzania in 1952.

The geographical range of the disease is currently expanding rapidly, as explained in this update from the Centers for Disease Control. Puerto Rico has seen roughly 25,000 suspected cases, and cases have been reported in 42 other countries in North, Central, and South America. The problem is not restricted to tropical areas, with Canada reporting 300 confirmed cases in 2014. Clearly this story is still actively unfolding. Keep on the lookout, especially if you live in or travel through warm climates.