Competition set up
The competition server has a total of 100 problems – each problem is a sample of positive and negative strings generated by randomly walking a target machine. These problems vary in terms of their difficulty, related to
- the size of the alphabet in the target machine
- the sparsity of the sample (the extent to which it covers the behaviour of the target machine).
The 100 problems are divided into cells of 5 problems each, where each cell corresponds to a particular combination of sparsity and alphabet size. The table below shows how the cells are organised. Easier problems (with a smaller alphabet and a larger sample) are towards the upper-left of the table, and the harder problems (larger alphabet and smaller sample) are towards the bottom-right.
|Sparsity of the training sample|
|Alphabet size||2||1 – 5||6 – 10||11 – 15||16 – 20|
|5||21 – 25||26 – 30||31 – 35||36 – 40|
|10||41 – 45||46 – 50||51 – 55||56 – 60|
|20||61 – 65||66 – 70||71 – 75||76 – 80|
|50||81 – 85||86 – 90||91 – 95||96 – 100|
|Sparsity of the training sample|
Participation in the competition consists in attempting to solve cells. A cell is considered to be solved if all of the 5 problems it contains are solved with a BCR score of at least 0.99. Each cell is given a number of points (from 1 to 4) according to its difficulty, based on the average performance of the Blue-Fringe algorithm, computed with our scoring procedure. The first technique to solve a hardest cell, among those solved in the competition, will be the winner. The organizers will add additional problems if all cells appear to be solved more than one month before the final deadline.
The target machines are generated using a variant of the Forest-Fire algorithm and present characteristics that are similar to state machines that model the behaviour of software systems:
- Number of states
- State machines in the competition have approximately 50 states. Although somewhat larger than the conventional state machines identified in the Software Engineering literature, this is to ensure that any techniques submitted to this competition could scale to infer models for reasonably complex software systems. A roughly equal proportion of accepting and rejecting states has been chosen, a feature shared with Abbadingo.
- Larger alphabets
- State machine transitions are labelled by some alphabet symbol. Such a symbol may typically correspond to the input that triggers a transition or to a system function that governs the transition. For simplicity in this competition, alphabet symbols are represented by integer literals. Alphabet sizes in the competition range from 2 to 50 symbols (while previous competitions such as Abbadingo focussed on 2 letter alphabets)
- Degree distribution
- State machines of software systems tend to contain a small proportion of states with a high out-degree while most states have in- and out-degrees of one or two transitions. Some states are sink accepting states, that is, they have an out-degree of 0.
- Deterministic and minimal
- The machines considered in this competition are both deterministic and minimal. A machine is deterministic if, for all states, there is at most one outgoing transition on any alphabet symbol. A machine is minimal if no smaller deterministic machine defines the same regular language.
Learn more about target machine generation…
Training and test samples
For each problem, disjoint training and test samples have been generated using a dedicated generation procedure whose aim is to simulate the way examples of system behaviour are usually obtained in the software engineering community (a collection of program traces at an implementation level or the generation of scenarios at a design level, for instance). Samples present the following characteristics:
- Generated by the target
- Positive strings have been generated by randomly walking the target machines. Negative strings are generated by randomly perturbing positive strings. Three kinds of edit operation are considered: substituting, inserting, or deleting a symbol.
- Training samples
- Training sets are sampled with different levels of sparsity (100%, 50%, 25%, 12.5%) and usually contain duplicates, as a consequence of the random walk generation from the target machine.
- Test samples
- Each test sample contains 1,500 strings and contains no duplicates (to avoid favoring repeated strings in the scoring metric). Training and test sets do not intersect.
- Empirically adjusted
- The different parameters of the generation procedure (the sample sizes, in particular) have all been empirically adjusted to ensure good induction results using Blue-Fringe on the simplest problems (alphabet of size 2 with a learning sample sparsity of 100%) without breaking the cell.
Learn more about sample generation…
Once the test set for each problem has been downloaded, it is up to the competitor to establish whether each test string is accepted or rejected by their candidate machine for that problem. The competitor should produce a binary sequence of labels where, for each test string, a 1 is added to the sequence if the string is considered to be accepted, and a 0 otherwise:
0100100101101010001011010011100101... (up to the size of the test sample)
The score is computed according to a Balanced Classification Rate (BCR) measure. The submitted binary sequence of labels is compared to the actual labels, and four sets are produced:
TN : True negatives
FP : False positives
FN : False negatives
These figures are used to compute two numbers, Cplus and Cminus
Cminus = TN / (TN + FP)
The BCR value is the harmonic mean between Cplus and Cminus and is defined as follows:
A problem is deemed to be solved if the BCR value is greater than or equal to 0.99. A cell is deemed to be solved if all of its 5 problems are solved. Cells of the competition and participant result grids turn to green when solved.