Training & Test Sets

Training and test strings for each of the 100 problems can be downloaded using the two tables below. A cell is considered broken if your induction algorithm solves the 5 problems it contains (see the protocol page for details, and the section below about file formats).

Alternatively to these tables, you can download the .tar.gz archive that contains all of the training and test sets:

Training sets
Sparsity of the training sample
100%50%25%12.5%
Alphabet size 21 - 2 - 3 - 4 - 56 - 7 - 8 - 9 - 1011 - 12 - 13 - 14 - 1516 - 17 - 18 - 19 - 20
521 - 22 - 23 - 24 - 2526 - 27 - 28 - 29 - 3031 - 32 - 33 - 34 - 3536 - 37 - 38 - 39 - 40
1041 - 42 - 43 - 44 - 4546 - 47 - 48 - 49 - 5051 - 52 - 53 - 54 - 5556 - 57 - 58 - 59 - 60
2061 - 62 - 63 - 64 - 6566 - 67 - 68 - 69 - 7071 - 72 - 73 - 74 - 7576 - 77 - 78 - 79 - 80
5081 - 82 - 83 - 84 - 8586 - 87 - 88 - 89 - 9091 - 92 - 93 - 94 - 9596 - 97 - 98 - 99 - 100
Test sets
Sparsity of the training sample
100%50%25%12.5%
Alphabet size 21 - 2 - 3 - 4 - 56 - 7 - 8 - 9 - 1011 - 12 - 13 - 14 - 1516 - 17 - 18 - 19 - 20
521 - 22 - 23 - 24 - 2526 - 27 - 28 - 29 - 3031 - 32 - 33 - 34 - 3536 - 37 - 38 - 39 - 40
1041 - 42 - 43 - 44 - 4546 - 47 - 48 - 49 - 5051 - 52 - 53 - 54 - 5556 - 57 - 58 - 59 - 60
2061 - 62 - 63 - 64 - 6566 - 67 - 68 - 69 - 7071 - 72 - 73 - 74 - 7576 - 77 - 78 - 79 - 80
5081 - 82 - 83 - 84 - 8586 - 87 - 88 - 89 - 9091 - 92 - 93 - 94 - 9596 - 97 - 98 - 99 - 100

File formats

The files in the grids above contain input strings, one per line. Training sets contain positive (starting with +) and negative strings (starting with -) in any order. Test sets contain test strings only (starting with ?). Symbols are always integer literals and are separated by one space:

For training sets:

+             # the empty positive string (lambda)
-             # the empty negative string
+ 1 23 5 49   # a positive string with 4 symbols
- 2 1         # a negative string with 2 symbols

For test sets:

?             # the empty string could of course be classified
? 21 5 6      # a test string to be classified as well (3 symbols)