Frédéric Wang · 71bffb1b
--- a/Fuzz-testing.md
+++ b/Fuzz-testing.md
+# Fuzz Testing
+## Overview
+From Wikipedia: "Fuzz testing or fuzzing is a software testing technique, often
+automated or semi-automated, that involves providing invalid, unexpected, or
+random data to the inputs of a computer program. The program is then monitored
+for exceptions such as crashes, or failing built-in code assertions or for
+finding potential memory leaks. Fuzzing is commonly used to test for security
+problems in software or computer systems."
+Current status: Our test suite contains various unit tests for
+MathJax's public API, LaTeX to MathML conversions, the configuration options,
+the javascript MathML rendering engine etc The idea is to have a minimal
+test case to verify a specific feature (e.g. one configuration option or one
+LaTeX command). Non regression tests are also created from reduced test cases
+for issues entered in our tracker. This allows to automate the
+verification of the fix on all platforms and to ensure that the issues won't
+happen again in the future. Using a unit tests allows to easily understand a
+test failure, to choose the appropriate format and tools for testing a feature
+(e.g javascript tests for the MathJax API, reftests for MathML rendering) and
+to avoid failures unrelated to the feature intended to be verified by a test.
+Rationale for Fuzz testing: while the current approach is good for test
+debugging and maintenance, it also has some shortcomings: only simple pages
+are tested and we rely exclusively on user feedback to discover more involved
+bugs with a complex markup, sophisticated configuration etc 
+Even slightly more complex markup may not be detected by our framework. A
+concrete example is
+[issues294](https://github.com/mathjax/MathJax/issues/294):  a unit test to
+reproduce this bug needed to make MathJax compute the space between a
+&lt;mmultiscripts&gt; element and another element, but our testsuite only
+tested at best a single &lt;mmultiscripts&gt; on a page.
+Of course, it's not possible to test all possible input but using large
+random pages can still be very helpful to exhibit this kind of problem.
+## Basic ideas
+* Randomly generate large test pages to check for:
+  * browser/plugin crashes
+  * MathJax crash (javascript error)
+  * [Math Processing Error]
+  * hangs
+* A test page will contain:
+  * &lt;script&gt; tags to load MathJax and the testsuite header.
+  * Some configuration options
+  * Several LaTeX/MathML/AsciiMath fragments in various locations
+  * Javacript code to add/remove/move/modify nodes and attributes
+  * Some MathJax API calls. Especially those asking to reprocess/rerender the
+    page or some parts of it, change the output mode etc
+  * Possibly other Web languages not parsed by MathJax such that
+    HTML/SVG/CSS.
+  * Possibly some UI actions when this is implemented via Selenium 2.
+* Use our current testing infrastructure:
+  * Create reftest manifest for the pages generated and mark them as "load"
+    tests.
+  * Run the tests in as many browsers as possible.
+  * The configuration may be randomly set in the page itself.
+* Two interesting cases to consider:
+  * Pages following some kind of grammar rules (valid tests): check that
+    MathJax works correctly in standard situation.
+  * Pages violating a bit the rules (almost valid tests): check that MathJax
+    handles edge cases nicely.
+* How pages are contructed:
+  * Use small fragments as starting points
+  * Recursively create big pieces of code by grouping together smaller
+    fragments. You can try to follow some grammar rules.
+  * Use Javascript to add mutation rules for the DOM, either before MathJax
+    starts (use delayStartupUntil) or after it started
+    (use e.g. MathJax.Hub.Typeset)
+  * Add random configuration options. Some of them may be mandatory to make the
+    page valid (e.g. extensions for a LaTeX command used in the page).
+  * Add random MathJax API called, simulation of UI interactions etc
+* How starting points are obtained:
+  * Use grammar tokens (MathML tokens, LaTeX variables etc)
+  * Use DOM/AsciiMath/Javascript fragments from known unit tests
+    (for example our own test suite or Mozilla's reftests/crashtests)
+* Additional processing:
+  * Record the fuzz actions to reproduce the bug. I plan to encode the UI
+    actions in the page itself and so only saving the page should be enough.
+  * Add "ignores" rules to avoid known bugs to be found again and remove them
+    once the bug is fixed.
+  * Reduce fuzz testcases via a divide and conquer algorithm and save it in
+    our crashtests/ unit tests.
+  * Maintain an improve the list of starting fragments and generation rules.
+## Issues to consider
+* Fuzz testing requires to create many times large test cases but MathJax is
+  slow to render large pages. Currently, the [Torture Test](https://github.com/fred-wang/MathJax-test/tree/master/testsuite/MathMLToDisplay/TortureTests/Size)
+  from the MathML test suite are skipped. We will have to use powerful machine,
+  increase Selenium timeout and perhaps run the fuzzer a long time / regularly.
+* Because Fuzz testing is often used for security purposes it seems that the
+  source code repositories are not public to prevent people from finding
+  security fails. What will be our policy? Detection of
+  "[Math Processing Error]" is not too serious but crashes in browsers or
+  MathPlayer probably should probably be kept confidential.
+## References
+* [Fuzz_testing](https://en.wikipedia.org/wiki/Fuzz_testing) (Wikipedia)
+* [Fuzzing or how to help computers cope with the unexpected](http://cdn.ttgtmedia.com/searchSecurityUK/downloads/RHUL_Fuzzing_final.pdf)
+* [Jesse Ruderman's posts about Fuzzing](http://www.squarefree.com/categories/fuzzing/)
+* [Fuzzing At Mozilla](http://www.squarefree.com/fuzzing2010/fuzzing2010.xhtml)
+* [Analysis of Lithium's algorithm](http://www.squarefree.com/lithium/algorithm.html)
+* [Bugzilla's Metabugs for fuzz-testing tools](https://bugzilla.mozilla.org/show_bug.cgi?id=316898)
+* [cross_fuzz](http://lcamtuf.coredump.cx/cross_fuzz/)
+* private communication with Abhishek Arya (Google) during Chrome 24's MathML
+  testing.