Categories
spotlight Uncategorized

A passive automata learning tutorial with dfasat

I am very happy to announce that we finally have a nice introduction to our dfasat tool: a short python notebook tutorial  (html preview) originally developed for a 2-hour hands-on session at the 3TU BSR winter school.
The notebook works you through basic usage and parameter setting. It also contains a small task to familiarize the user with the effect of different parameter settings. At the moment, dfasat has about 30 different options to choose from. Some can be combined, whereas other combinations have never been tried in combination. The easiest way to use the introduction is to download the virtual appliance for VirtualBox (3GB download, password for user winter/sudo: ‘iscoming’). It contains the practical data sets and the python notebook (ipynb/html). You can also download the files separately, and clone the dfasat repository or install the dfasat python package. I personally recommend using the virtual appliance: It is well tested by 20 students during the session at the winter school. Please contact me for assistance. My email address is included in the notebook.

Categories
Announcement spotlight

The Performance of PDFA-Learning in the SPiCE Competition

The Sequence PredIction ChallengE (SPICE), organized and co-located with the International Conference on Grammatical Inference (ICGI) 2016,  was won by Chihiro Shibata, who combined LSTM neural networks and strictly piecewise grammars (SP-k, proposed by Heinz et al), the latter capturing long-term dependencies in the input words. The combination beat the competitors using “pure” LSTM- and CNN-based neural networks. Overall, all networks used were not very deep (2 hidden layers), and deeper networks decreased performance.

The task of the competition was to predict a (ranked) list of most likely continuations (a_1, …, a_5) for a given prefix (y_0, .., y_i), based on learning from a training set of complete words.

One of my students (competing as team PING) placed 7th, using the dfasat tool. The main goal was to test a python interface for dfasat (early release here). But what can we  take away from placing 7th? Is PDFA-learning not competitive for sequence prediction? The answer is a solid jein: By using dfasat, we assumed that all problem sets were generated by a deterministic and probabilistic finite state automaton (PDFA). In practice, most problem sets were generated by HMMs or contained linguistic data. Both data types cannot necessarily be learned very well by our PDFA models. The results reflect this, as outlined in the following table. For the HMM problems, we obtain OK-scores. That is expected because our PDFA models are not quite as expressive as the HMMs used to generate the data, but the gap is not too large. On the linguistics data, we really struggle to obtain reasonable scores (e.g. problem 10).

Table of the performance metrics.
Table of the performance metrics.

But problem 9 is a very interesting case: it contains software traces. For this problem type, PDFA models obtained the second best score and beat most of the RNN and CNN approaches. I expect that LSTM/RNN approaches can obtain equally good or better scores, but require a lot more data to learn a model of equal predictive quality. I am planning to analyze the character-level networks (e.g. with methods used here) used by the competitors to understand better what aspects they managed to learn.

I will add a more detailed description of the problem sets later on.

Categories
Announcement spotlight

Succes at the RERS challenge 2016 for automata learning

This year a team from the Radboud University and TU Delft used automata learning to compete in the RERS challenge 2016. The challenge provides (generated) source code where the challenge is to (dis)prove certain LTL formulas and to analyze which error states are reachable. Information on this challenge can be found here: http://www.rers-challenge.org/2016/. Commonly, learning is not used in this competition and only white-box methods are used.

This year, however, automata learning was applied to great success. For the problems where LTL formulas had to be (dis)proven, the team managed to get a perfect score. Other teams did not manage to get so many results here. For the reachability problems, they performed well but did not win in the rankings. The team applied state of the art learning algorithms, but did not tweak or alter them for the challenge.

It is interesting that a black box technique can get such good scores, compared to white box methods. Indeed, less information is used and by using black box techniques one cannot have 100% guarantees. But more results are obtained. So it seems one can trade confidence for scaling to bigger problems.

More information will follow.

Categories
Announcement spotlight

Useful links added

We made a page with some useful resource on automata learning. Of course, any help is welcome. So if you know some tools, benchmarks, use cases, or anything related, please contact us.