Davide P. Cervone · 4d3cfc3e
--- a/Understanding-mathjax-performance.md
+++ b/Understanding-mathjax-performance.md
 > This is working draft.

-This posting gives an overview over the different aspects that affect MathJax performance.
+This posting gives an overview of the different aspects that affect MathJax performance.

 ## "Real" size

-A full download of the MathJax code is ~22MB, but most of it is due to the legacy picture fonts (~9.5MB), , the unpacked folder (containing the code before it was compressed -- ~4.1MB), and the configuration folder (~2.8MB -- most pages need only one configuration file but more later).
+A full download of the MathJax code is ~22MB, but most of it is due to the legacy image fonts (~9.5MB), , the unpacked folder (containing the code before it was compressed -- ~4.1MB), and the configuration folder (~2.8MB -- most pages need only one configuration file but more later).

-In other words, what's "really" MathJax, is `MathJax.js` as well as the `extension`, `localization` and `jax` folders, and the webfonts -- summing up to ~5MB. 
+In other words, what's "really" MathJax, is `MathJax.js` as well as the `extensions`, `localization` and `jax` folders, and the webfonts -- summing up to ~5MB. 

-*However*, MathJax will never actually need all of these 5MB. E.g., we offer webfonts in 4 format, which exist for specific (older) browsers who can't use the current webfonts standard -- woff). 
+*However*, MathJax will never actually need all of these 5MB. E.g., we offer webfonts in 4 formats, which exist for specific (older) browsers that can't use the current webfonts standard -- woff. 

-So as a first approximation: "all of MathJax", i.e., all input and output options and their extensions that a user would ever have download ~3.5MB. 
+So as a first approximation: "all of MathJax", i.e., all input and output options and their extensions that a user would ever have to download is ~3.5MB. 

-But in real life 1 input + 1 output is used, which is ~1.5MB (and sending compressed files should bring it down to ~650KB). 
+But in real life most pages only use 1 input + 1 output jax, which is ~1.5MB (and sending compressed files should bring it down to ~650KB). 

 As a comparison: the average web page is ~1.5MB in June 2013 according to the [http-archive](http://httparchive.org/interesting.php?a=All&l=Jul%201%202013).

@@ -22,11 +22,11 @@ The effective load a visitor experiences is lower still since most pages don't u

 MathJax is highly modular even within a single input or output option. MathJax will only load those components which are actually needed for the mathematical content found on a page. 

-For example, if MathJax is configured to render TeX input to HTML output, it won't load the components needed for certain LaTeX packages unless there's content in the page using them. Similarly, it will only load those webfonts files containing the characters actually needed. 
+For example, if MathJax is configured to render TeX input to HTML output, it won't load the components needed for certain LaTeX packages unless there's content in the page using them. Similarly, it will only load those webfont files containing the characters actually needed. 

-The same principle applies to multiple input options: if e.g. the configuration allows both MathML and TeX input, but the page only contains MathML, then no TeX components will be loaded.
+The same principle applies to multiple input options: if e.g. the configuration allows both MathML and TeX input, but the page only contains MathML, then the main TeX processing code will not be loaded (only a small configuration file that allows TeX to be loaded had it appeared on the page).

-We do not have specific data, but we estimate that the effective size is 500kb -- 1MB (uncompressed).
+We do not have specific data, but we estimate that the effective size is 500kb (or 1MB uncompressed).

 We need to balance the benefit of modularizing with the number of network connections. 

@@ -36,15 +36,15 @@ But this balance must be revisited regularly and more options could help.

 ## Caching

-In addition to the size of the MathJax components, caching improves performance after the first load. 
+In addition to the size of the MathJax components, browser caching improves performance after the first load. 

 Once any MathJax components have downloaded, they will remain in the browser cache for a specific time (usually 1 week) so a visitor will usually only download them on the visit to the very first page using MathJax and skip this particular performance drain in later visits. 

-While browser caching is separated per domain (for security reasons), the MathJax CDN let's page authors benefit from each other: if a user visits one site using the CDN, than any other site using the CDN will benefit from the MathJax components already cached while during the visit to the first site.
+While browsers separate their caching per domain (for security reasons), the MathJax CDN let's page authors benefit from each other: if a user visits one site using the CDN, than any other site using the CDN will benefit from the MathJax components already cached during the visit to the first site.

-While not adding to performance on an initial visit, caching improves speed on any future visit. 
+While not helping performance on an initial visit to a site, caching improves speed on any future visit. 

-Alternative and additional caching methods could expand and optimize this performance benefit. 
+Alternative and additional caching methods (implemented as part of MathJax) could expand and optimize this performance benefit. 

 ## Optimizing download of components via configuration files

@@ -54,7 +54,7 @@ On the one end of the spectrum, we provide combined configuration files which co

 As the name suggests, combined configuration files combine various components into one large file. This allows page authors to specify the components they want to load up front as one big file rather than many parallel files later, speeding up processing. For example, the `TeX-AMS_HTML` configuration file loads the TeX-input with its AMS-math extensions as well as configuring the HTML-output.

-On the other end of the spectrum, a page author who wants everything to load asynchronously can use extremely light configurations which leave it to MathJax to queue the download of its components. This is often good for community sites that have pages with  math.
+On the other end of the spectrum, a page author who wants everything to load asynchronously can use extremely light configurations which leave it to MathJax to queue the download of its components. This is often good for community sites that have pages with math, but also pages without it.

 Many sites do not configure MathJax efficiently. We could provide tools to analyze configurations and create more options.

@@ -72,29 +72,31 @@ An Input-Jax will process the input into MathJax's internal format (which is ess

 This process is already very fast. While it could theoretically benefit from parallelization (e.g. via webworker), the benefits will only be noticeable in pages with a very large amount of mathematical content or extremely large equations (e.g. we've seen a 80,000 line MathML equation a while ago). Other bottlenecks are much more critical.

-Since the input processors are modular, network latency can create delays as components are loaded as they are needed. This is the core problem of balancing modularity vs network activity and needs to be revisited as network speed and processing power develop.
+Since the input processors are modular, network latency can create delays as components are loaded as they are needed. This is the core problem of balancing modularity vs network activity and needs to be revisited as network speed and processing power develop.  We also need to develop more quantitative tools to make it easier to analyze the trade-offs.
+
+Because network connections for different users vary (e.g., mobile users have much slower connections, in general), there is no "one size fits all" solution to this problem. The settings that work best for a user with a desktop computer on a high-speed network may not be the best ones for a tablet user on a wi-fi network.

 ### Output-processing.

-The third part of MathJax processing is the generation of its output which currently comes in two ways: HTML-CSS or SVG.
+The third part of MathJax processing is the generation of its output which currently comes in one of two forms: HTML-CSS or SVG.

 The output generation is the second performance bottleneck of MathJax.

-The key problem with the MathJax output lies in the problem that math layout is a bottom-up process while HTML-CSS is a top-down process. CSS layout algorithm determines the width of a parent element and then descends to its children to determine their widths and later on determines the heights. This limits the quality of output one can gain with current HTML methods.
+The key problem with the MathJax output lies in the fact that math layout is a bottom-up process while HTML-CSS display is a top-down process. CSS layout algorithm determines the width of a parent element and then descends to its children to determine their widths and later on determines the heights. This limits the speed of output one can gain with current HTML methods.

 MathJax essentially implements the Knuth-Plass algorithm, which goes bottom-up, determining the widths and heights of the children before determining the width and height of a parent.

-This is the core problem: top-down vs bottom up.
+This is the core problem: top-down vs bottom-up.

-However, SVG is often ~25% faster than HTML which is due to an additional problem with HTML layout. While the SVG output can reliably calculate relative sizes within an equation internally, HTML/CSS runs into browser deficiencies that force it to layout the content -- a performance drain as browsers are not designed to layout content repeatedly.
+However, SVG is often ~25% faster than HTML which is due to an additional problem with HTML layout. While the SVG output can reliably calculate relative sizes within an equation internally, HTML/CSS must ask the browser to determine these for it, requiring the browser to do a complete top-down layout before it can report the width and height of a subexpression.  This is a performance drain, as browsers are not designed to layout content repeatedly.

 First, browsers do not reliably allow the calculation of width -- simply put, the sum of the width of characters is not the width of the string as it's laid out by the browser. To get around this, MathJax has to measure the substrings/subequations by laying them out and asking the browser to measure them. This problem naturally occurs recursively and shows dramatically in complex equations.

 Next, browsers do not provide javascript access to all font metrics (let alone modern features like OpenMath tables). That's why MathJax need to provide the metrics separately, which is the reason why MathJax only supports a handful of fonts. 

-While width can be measured correctly as mentioned above, height cannot be measured correctly since browsers provide only the font height/depth (the maximal height/depth of *any* character in the font). MathJax has to compensate for these incorrect measurements.
+While widths can be measured correctly as mentioned above, heights cannot be measured accurately since browsers provide only the font height/depth (the maximal height/depth of *any* character in the font). Since this is the same for every character in the font, MathJax has to compensate for these incorrect measurements itself.

-Preliminary tests have shown that deactivating these measurements will speed up the HTML output to the level of the SVG output. However, this will currently come at a loss of rendering quality (although the preliminary tests have shown that modern browsers do a much better job). We can work with browser vendors to improve things on their end, e.g. the Chrome team seems interested in this; the necessary browser improvements could increase typesetting quality in browsers in general.
+Preliminary tests have shown that deactivating these measurements will speed up the HTML output to the level of the SVG output. However, this will currently come at a loss of rendering quality (although the preliminary tests have shown that modern browsers do a much better job than those in place when MathJax was initially conceived). We can work with browser vendors to improve things on their end, e.g. the Chrome team seems interested in this; the necessary browser improvements could increase typesetting quality in browsers in general.


 ## Ways forward
@@ -118,7 +120,8 @@ We should investigate how to optimize this. Some ideas are
 * lazy pre-loading 
  * Creating an option for MathJax components to download in the background after a page has finished. This would improve performance on subsequent pages and dynamically created content.
    * in particular webfonts could be loaded separately
-
+* perform component loading in parallel
+  * Currently, in an expression loads an extension, processing waits until that extension is loaded; instead, Mathjax could continue to process other equations while the needed component is being delivered.

 #### Optimizing the current output algorithm

@@ -140,24 +143,22 @@ Both the latency and performance issues are especially a perception problem. Eve
 By tweaking the way content appears on the page, we could reduce the impression.

 * multi-pass layout
-  * We can add a first "quick&dirty" rendering and then re-render until full TeX-quality is achieved.
+  * We can add a first "quick & dirty" rendering and then re-render until full TeX-quality is achieved.
  * We can 
 * rendering small equations before large ones
-  * Due to the recursive nature of our output, complex equations take much longer. In combination with equation-chunking (the number of equations MathJax will reveal on a page at once), this can lead to negative perceived performance. For example, a page rarely starts with a highly complex equation but usually has a number of small inline equations before a complicated one shows up. However, the chunking prevents those small ones to show up until the large ones are typeset. An size-oriented chunking could reduce this problem.
+  * Due to the recursive nature of our output, complex equations take much longer. In combination with equation-chunking (the number of equations MathJax will reveal on a page at once), this can lead to negative perceived performance. For example, a page rarely starts with a highly complex equation but usually has a number of small inline equations before a complicated one shows up. However, the chunking prevents those small ones to show up until the large ones are typeset. A size-oriented chunking could reduce this problem.
 * local storage
  * Local storage could save rendered output and MathJax wouldn't have to re-typeset while a user browses back and forth.

-
 #### Improving browser infrastructure 

 We can try to work with browser vendors to improve the browser behavior.

-* Enabling better webfont APIs (to reduce our hacks to detect webfonts arrival)
+* Enabling better webfont APIs (e.g., to reduce our hacks to detect webfonts arrival)
 * remove the width-measuring problems
-* allow javascript to access font metrics, openmath tables to become font agnostic
+* allow javascript to access font metrics, openmath tables, etc, to become font agnostic
 * improve a new layout algorithm that is HTML-focused

-
 The advantage would be that MathJax could help move browser vendors to enable better typesetting tools in general. This would be a big step forward in general. 


@@ -180,10 +181,10 @@ A simple test indicates that the output rendering speed varies greatly across pl
 For example, on a 2011 macbook pro, rendering (no downloads, everything cached) of https://en.wikipedia.org/wiki/Matrix_multiplication

 * Chrome: html: ~2500ms, svg:~1850ms
-* Safari: html: 1450ms, svg:~1000ms, mathml: ~300ms
+* Safari: html: ~1450ms, svg:~1000ms, mathml: ~300ms
 * Firefox: html:  ~3300ms, svg:~2400ms, mathml:~880ms

-Disabling the measurements that HTML needs but SVG doesn't, brings HTML output up to SVG speed. (But comes at the cost of rendering quality.)
+Disabling the measurements that HTML needs but SVG doesn't, brings HTML output up to SVG speed (but comes at the cost of rendering quality).

 ### Notes