Synopsis

flexfringe is a passive automata learning model with an emphasis on flexibility in merge heuristic and model type. Currently it can learn DFA, PDFA, Mealy Machines, Regression Automata, and ad-hoc models. You can access the respository with all code here.

Building flexfringe

To build flexfringe,  you need to have the libpopt-dev library (libpopt-dev packagein Ubuntu) for argument parsing. Some heuristic functions bring their own dependencies: We provide an implementation of a likelihood-based merge function for probabilistic DFAs. It needs the GNU scientific library (development) package (e.g. the libgsl-dev package in Ubuntu). You can remove the likelihood-based .cpp files from the evaluation directory to build flexfringe without it.

If you want to use the reduction to SAT and automatically invoke the SAT solver, you need to provide the path to the solver binary. flexfringe has been tested with lingeling (which you can get from http://fmv.jku.at/lingeling/ and run its build.sh), but should also work with others such as glucose and cryptominisat.

You can build and compile the flexfringe project by running

$ make clean all

in the main directory to build the executable named flexfringe.

We typically build and use flexfringe on MacOS and Ubuntu. If you encounter any build errors, please email us your problem, and include some details of the system and compiler you’re using.

A Brief Note on the Branches in the Repository

The branch oldstable contains a legacy version of DFASAT we used in several papers and is not recommended to use. The branch development contains the most recent version, but not all versions compile to a fully functioning and running binary (though they should compile). Lastly, the branch multivariate contains an extension by Sicco Verwer that includes transition splitting for inputs.

The master branch contains a version of either development or multivariate where all documented and published functionality should work correctly and reliably.

Using flexfringe

We provide a Python interface and a command line interface. The Python interface usually lags behind the command line version, as it has to be built separately. You can find an IPython notebook and some information about it here.

The following sections focus on the command line version of flexfringe based on the C++ code.

Running flexfringe

Run ./flexfringe –help to get help.

To avoid typing long parameter lists, the start.sh script takes parameters stored in .ini files and builds the correct command line call from it to provide shortcuts for commonly used settings. The parameters for the script are the name of the .ini file and the data file as parameters.

Usage example:

$ ./start.sh mealy-batch.ini data/simple.traces

See the sample .ini files for details of parameters and flags used. The script only creates the command line but defers all checks to flexfringe itself. You can also run the flexfringe binary directly:

$ ./flexfringe --heuristic-name overlap_driven --data-name overlap_data state_count 50 \
--symbol_count 25 --sinkson 1 staminadata/1_training.txt.dat

It is strongly recommended to use the long-form of the parameters over the short form.

The input format for flexfringe is based on the grammar inference community format used in competitions (Abbadingo format):

#sample #alphabetsize
label length s1 s2 ... sn

The first line contains the number of samples and the size of the alphabet. Each subsequent line contains a label, the length, and separated by spaces, a sequence of elements forming the word.

The elements (of the alphabet) can be printable sequences of arbitrary length, with some exceptions for characters encoding different input types. Each sequence element can additionally have an arbitrary payload (e.g. real numbers or elements from an output alphabet) separated by a dash /. For a Mealy machine, the input could look like

1 7 al1/la pfr2/rp lr/none none/lrC pfr2/rp lr/none none/lrC

Another input extension uses the colon : to implement transition guards, although this functionality is not fully tested yet.

Using the right parameters

flexfringe can learn different types of regular automata: deterministic and probabilistic finite state automata, Mealy machines, regression automata, and other ad-hoc defined machines. The two main parameters are heuristic-name and data-name. They control which merge heuristic is used and with it what type of output is generated and come in pairs. A list of corresponding pairs is:

  • evidence_driven and edsm_data (DFA)
  • overlap_driven and overlap_data (DFA)
  • alergia and alergia_data (PDFA)
  • conflict_driven and conflict_data (DFA)
  • mealy and mealy_data (Mealy Machines)
  • mse_error and mse_data (Regression Automata)

Many parameters control which part of the tree is considered for merging. Important parameters are the symbol and state thresholds –-symbol_count and –state_count to ignore parts of the tree with low occurrence frequencies. Similarly, –sinkson activates sinks to aggregate irrelevant states and transitions.

A number of parameters control the way the state-merger works, in partuclar the merge order and search space: –-extend, –finalred, –largestblue, –blueblue, and –-lowerbound. We are currently preparing an extensive study on the influence of these parameters.

Understanding the Output

flexfringe will output some information to STDOUT on the screen and some .dot and .json files to disk. The screen output contains the action type and the evidence score for the action:

start greedy merging
x47 m21 m21 m13 m12 m10 m9 x12 m19
x23 m19 m9 m8 m5 m3 x20 m13 x14
m7 m6 m5 no more possible merges

where x indicates extend operation and m indicates merge operations. The interpretation of the score value depends on the merge heuristic.

Using graphviz’s dot or the show.sh script, the .dot files can be visualized. We are currently updating and unifying the formats across the different heuristics. There is currently no visualization for the json output.

You can visualize the .dot files using the show.sh script or running dot explicitly

$ dot -Tpdf file.dot -o outfile.pdf or $ ./show.sh final.dot

You can find some description and working examples in the ICSME tool paper for flexfringe.

Switching between different modes

flexfringe supports three different modes of operation. The first mode, batch, is also the default mode. It takes and processes a file by reading it, constructing the initial prefix tree, and then running the state-merging process. The second mode, stream, takes a file and reads it in small batches and then runs the state-merging process, interleaving reading and merging. The last mode, interactive, offers a choice of merge pair at each iteration of the red-blue fringe and offers some more commands such as restart/reset, undo, fast-forward, and others. More information about the interactive mode is also available in the ICSME tool paper.

The mode can be chosen with the command line flag –mode batch|inter|stream. The best way to use interactive mode is the script start-interactive.sh which takes the same parameters as start.sh but requires the inotify-tools package and evince installed.

Demos

Here is a short demonstration of flexfringe in the command line:

Is it fast?

flexfringe is built both for extensibility as well as speed of inference. While the core, its red-blue fringe state-merging algorithm is fast, calling complicated merge heuristics can be expensive and slow down the inference considerably. The following graph shows a call profile running on a stamina competition problem containing software traces, using the overlap-driven heuristic. The heuristic itself is fairly inexpensive but needs to check for overlapping transitions of all merge candidates. As you can see, most time and calls are spent in the test_merge method.

Profiling graph of a run of flexfringe on a dataset from stamina.
Profiling graph of a run of the overlap heuristic on data from the Stamina competition. Click to enlarge and see in original size.

Extending flexfringe

To learn your own versions of extended finite state machines using flexfringe, you need to create your own evaluation heuristic consisting of an evaluation_function and evaluation_data class. You can derive objects from the templates provided in the evaluation/ directory. The doxygen documentation provides some further information on the necessary methods and data structures. You can checkout the latest version of flexfringe over at bitbucket. We’re happy to hear about pull requests for new features.

A more detailed guide is under preparation.