This page describes the competition protocol in details. It also provides information about the target machines, the training and test samples and the evaluation of submitted test results.

### Competition set up

The competition server has a total of 100 *problems* – each problem is a sample of positive and negative strings generated by randomly walking a target machine. These problems vary in terms of their difficulty, related to

- the
__size of the alphabet__in the target machine - the
__sparsity of the sample__(the extent to which it covers the behaviour of the target machine).

The 100 problems are divided into *cells* of 5 problems each, where each cell corresponds to a particular combination of sparsity and alphabet size. The table below shows how the cells are organised. Easier problems (with a smaller alphabet and a larger sample) are towards the upper-left of the table, and the harder problems (larger alphabet and smaller sample) are towards the bottom-right.

Sparsity of the training sample | |||||

100% | 50% | 25% | 12.5% | ||

Alphabet size | 2 | 1 – 5 | 6 – 10 | 11 – 15 | 16 – 20 |

5 | 21 – 25 | 26 – 30 | 31 – 35 | 36 – 40 | |

10 | 41 – 45 | 46 – 50 | 51 – 55 | 56 – 60 | |

20 | 61 – 65 | 66 – 70 | 71 – 75 | 76 – 80 | |

50 | 81 – 85 | 86 – 90 | 91 – 95 | 96 – 100 |

Sparsity of the training sample | |||||

100% | 50% | 25% | 12.5% | ||

Alphabet size | 2 | 1 | 1 | 3 | 3 |

5 | 1 | 2 | 4 | 4 | |

10 | 1 | 3 | 4 | 4 | |

20 | 1 | 3 | 4 | 4 | |

50 | 2 | 3 | 4 | 4 |

Participation in the competition consists in attempting to *solve* cells. A cell is considered to be solved if all of the 5 problems it contains are solved with a BCR score of at least 0.99. Each cell is given a number of points (from 1 to 4) according to its difficulty, based on the average performance of the Blue-Fringe algorithm, computed with our scoring procedure. __The first technique to solve a hardest cell, among those solved in the competition, will be the winner__. The organizers will add additional problems if all cells appear to be solved more than one month before the final deadline.

### Machines

The target machines are generated using a variant of the Forest-Fire algorithm and present characteristics that are similar to state machines that model the behaviour of software systems:

- Number of states
- State machines in the competition have approximately 50 states. Although somewhat larger than the conventional state machines identified in the Software Engineering literature, this is to ensure that any techniques submitted to this competition could scale to infer models for reasonably complex software systems. A roughly
__equal proportion of accepting and rejecting states__has been chosen, a feature shared with Abbadingo.

- Larger alphabets
- State machine transitions are labelled by some alphabet symbol. Such a symbol may typically correspond to the input that triggers a transition or to a system function that governs the transition. For simplicity in this competition, alphabet symbols are represented by integer literals. Alphabet sizes in the competition range from 2 to 50 symbols (while previous competitions such as Abbadingo focussed on 2 letter alphabets)

- Degree distribution
- State machines of software systems tend to contain a
__small proportion of states with a high out-degree__while__most states have in- and out-degrees of one or two transitions__. Some states are sink accepting states, that is, they have an out-degree of 0.

- Deterministic and minimal
- The machines considered in this competition are both
__deterministic and minimal__. A machine is deterministic if, for all states, there is at most one outgoing transition on any alphabet symbol. A machine is minimal if no smaller deterministic machine defines the same regular language.

Learn more about target machine generation…

### Training and test samples

For each problem, disjoint training and test samples have been generated using a dedicated generation procedure whose aim is to simulate the way examples of system behaviour are usually obtained in the software engineering community (a collection of program traces at an implementation level or the generation of scenarios at a design level, for instance). Samples present the following characteristics:

- Generated by the target
- Positive strings have been generated by
__randomly walking the target machines__. Negative strings are generated by__randomly perturbing positive strings__. Three kinds of edit operation are considered: substituting, inserting, or deleting a symbol.

- Training samples
- Training sets are sampled with
__different levels of sparsity__(100%, 50%, 25%, 12.5%) and usually__contain duplicates__, as a consequence of the random walk generation from the target machine.

- Test samples
- Each test sample contains
__1,500 strings__and contains__no duplicates__(to avoid favoring repeated strings in the scoring metric).__Training and test sets do not intersect__.

- Empirically adjusted
- The different parameters of the generation procedure (the sample sizes, in particular) have all been empirically adjusted to ensure good induction results using Blue-Fringe on the simplest problems (alphabet of size 2 with a learning sample sparsity of 100%) without breaking the cell.

Learn more about sample generation…

### Scoring

Once the test set for each problem has been downloaded, it is up to the competitor to establish whether each test string is accepted or rejected by their candidate machine for that problem. The competitor should produce a binary sequence of labels where, for each test string, a 1 is added to the sequence if the string is considered to be accepted, and a 0 otherwise:

`0100100101101010001011010011100101...`

(up to the size of the test sample)

The score is computed according to a Balanced Classification Rate (*BCR*) measure. The submitted binary sequence of labels is compared to the actual labels, and four sets are produced:

*TP*: True positives

*TN*: True negatives

*FP*: False positives

*FN*: False negatives

These figures are used to compute two numbers, *C ^{plus}* and

*C*

^{minus}*C*=

^{plus}*TP*/ (

*TP*+

*FN*)

*C*=

^{minus}*TN*/ (

*TN*+

*FP*)

The BCR value is the harmonic mean between *C ^{plus}* and

*C*and is defined as follows:

^{minus}*BCR*= (2 *

*C**

^{plus}*C*) / (

^{minus}*C*+

^{plus}*C*)

^{minus}A *problem* is deemed to be solved if the *BCR* value is greater than or equal to 0.99. A *cell* is deemed to be solved if all of its 5 problems are solved. Cells of the competition and participant result grids turn to green when solved.