The Performance of PDFA-Learning in the SPiCE Competition

Categories Announcement, spotlight
Table of the performance metrics.

The Sequence PredIction ChallengE (SPICE), organized and co-located with the International Conference on Grammatical Inference (ICGI) 2016,  was won by Chihiro Shibata, who combined LSTM neural networks and strictly piecewise grammars (SP-k, proposed by Heinz et al), the latter capturing long-term dependencies in the input words. The combination beat the competitors using “pure” LSTM- and CNN-based neural networks. Overall, all networks used were not very deep (2 hidden layers), and deeper networks decreased performance.

The task of the competition was to predict a (ranked) list of most likely continuations (a_1, …, a_5) for a given prefix (y_0, .., y_i), based on learning from a training set of complete words.

One of my students (competing as team PING) placed 7th, using the dfasat tool. The main goal was to test a python interface for dfasat (early release here). But what can we  take away from placing 7th? Is PDFA-learning not competitive for sequence prediction? The answer is a solid jein: By using dfasat, we assumed that all problem sets were generated by a deterministic and probabilistic finite state automaton (PDFA). In practice, most problem sets were generated by HMMs or contained linguistic data. Both data types cannot necessarily be learned very well by our PDFA models. The results reflect this, as outlined in the following table. For the HMM problems, we obtain OK-scores. That is expected because our PDFA models are not quite as expressive as the HMMs used to generate the data, but the gap is not too large. On the linguistics data, we really struggle to obtain reasonable scores (e.g. problem 10).

Table of the performance metrics.
Table of the performance metrics.

But problem 9 is a very interesting case: it contains software traces. For this problem type, PDFA models obtained the second best score and beat most of the RNN and CNN approaches. I expect that LSTM/RNN approaches can obtain equally good or better scores, but require a lot more data to learn a model of equal predictive quality. I am planning to analyze the character-level networks (e.g. with methods used here) used by the competitors to understand better what aspects they managed to learn.

I will add a more detailed description of the problem sets later on.

I am a PhD student in the SEDAN group at the Interdisciplinary Centre for Security, Reliability and Trust in Luxembourg.

Leave a Reply