PDN parsing issues

This section gives an overview of some issues with existing PDN definitions. In particular those issues that are important when writing a PDN parser.

Game Separator (1)

The * symbol is both used as a game terminator/separator and as a move strength indicator to denote a forced move. This introduces a nasty ambiguity. For example the string 1-6* 32-28 can be interpreted as one game containing two moves, or as two games separated by a *. Since * is commonly used in draughts publications to denote forced moves, the preferred solution would be to disallow * as a game separator, and to use a different symbol like #. However, this would completely destroy backward compatibility. A less intrusive solution is to disallow * as a move strength indicator. Note that there is an alternative available in the form of the $7 numeric annotation glyph. Yet another solution is to demand that there can be no space between a move and it’s corresponding move strength. Then a move and it’s corresponding move strength can be defined as one token.

Game Separator (2)

It is common practice to terminate games with their result. In PGN this is no problem, since the chess results differ substantially from chess moves. But in draughts some results like 1-0 and 1-1 are very similar to normal draughts moves. This complicates parsing. For example, if the result 1-1 is defined as a token, then parsers may easily get confused by a move like 1-18. Several parsers insist on parsing this as 1-1 followed by an 8. This problem is likely to occur when a move is split up into separate tokens.

Since the result of a game can already be specified in PDN using the Result tag, there is no need to use a game result as a game separator. It can even be considered as bad style to have two different ways to specify the result of a game. It seems therefore logical to forbid using the result of a game as a game terminator (or separator).

Capture Separator

The squares of a capture are separated using the symbol x, for example in the move 32x23. If one defines a capture as a production

CaptureMove = Square “x” Square

then there can easily be conflicts with identifier tokens. Tokenizers are often greedy, which means that they can insist on parsing x23 as an identifier token, instead of a capture separator x followed by a square 23. Some parsers offer solutions to this type of problem, but not all of them. Note that this problem can be avoided by defining a move as a single token.

Move token

It is an important question whether a move should be defined as a single token (by means of a regular expression), or as a production consisting of multiple elements. A production has the benefit that the structure of a move can be represented more clearly. But as explained above, then a more powerful parser is needed. If a move is defined as a token, then a simple LL(1) parser is enough to parse PDN.

Move strength

In draughts publications a move strength can be wrapped in parentheses, like in 31-27(?). Parentheses are also used to define variations in an analysis, for example 1.32-28 18-23 2.38-32 ( 2.37-32? 23-29! ) 12-18. This introduces an ambiguity, but in most parsers this can be resolved by defining a move strength as a single token.