### Project Coin: Literal Grammar Hackery

#### By darcy on Jul 16, 2009

**Correction:** External to this blog, it was been pointed out to me that the original grammar disallowed two-digit numbers, which is unintended. The fix is to make the *DigitsAndUnderscores* component in *Digit DigitsAndUnderscores Digit* optional, as done in the corrected grammar below

Circling back to look at some unresolved technical details of the underscores in numbers proposal, I wrote up a combined grammar to allow binary literals as well as underscores as separators *between* digits. That is, underscores cannot appear as the first or last character in a sequence of digits.

The basic grammar change is to convert the definition of *Digits* (in any base) from the simple left recursive list of digits found in JLSv3, like

Digits:DigitDigits Digit

to a list where underscores can appear between numbers but the list must start and end with a digit:

*Digits:**Digit**Digit DigitsAndUnderscores*_{opt}Digit

*DigitsAndUnderscores:**DigitOrUnderscore**DigitsAndUnderscores DigitOrUnderscore*

This grammar is unambiguous, but as written it requires a look ahead of more than 1 because the recursion is in the middle of the *Digits* production. I have not attempted any of the usual grammar refactorings to restore a look ahead of 1 since in practice purging the underscores will be implemented by a small amount of additional logic in the scanner as opposed to the actual parsing machinery.

The existing rules for distinguishing decimal and octal literals cause minor grammar complications to accommodate underscores immediately after the first digit.
Octal numbers must start with a leading zero digit and nonzero decimal numbers must start with a nonzero digit, requirements reflected in rules like
*NonZeroDigit Digits _{opt}*. To allow underscores after the first digit, a new rule requiring at least one underscore is added, such as

*NonZeroDigit Underscores Digits*. The structure of binary literals is straightforward and entirely analogous to hexadecimal ones. Changing the digit-level productions automatically allows underscores in floating-point literals without the need to explicitly update the rules for those literals.

Productions in blue below are additional or changed productions to existing non-terminals; the other non-terminals below are newly introduced to support the enhanced literal syntax.

IntegerLiteral:DecimalIntegerLiteralHexIntegerLiteralOctalIntegerLiteralBinaryIntegerLiteral

BinaryIntegerLiteral:BinaryNumeral IntegerTypeSuffix_{opt}

BinaryNumeral:`0`

`b`

BinaryDigits`0`

`B`

BinaryDigits

DecimalNumeral:`0`

NonZeroDigit Digits_{opt}NonZeroDigit Underscores Digits

Underscores:`_`

Underscores`_`

Digits:DigitDigit DigitsAndUnderscores_{opt}Digit

DigitsAndUnderscores:DigitOrUnderscoreDigitsAndUnderscores DigitOrUnderscore

DigitOrUnderscore:Digit`_`

HexDigits:HexDigitHexDigit HexDigitsAndUnderscores_{opt}HexDigit

HexDigitsAndUnderscores:HexDigitOrUnderscoreHexDigitsAndUnderscores HexDigitOrUnderscore

HexDigitOrUnderscore:HexDigit`_`

OctalNumeral:`0`

OctalDigits`0`

Underscores OctalDigits

OctalDigits:OctalDigitOctalDigit OctalDigitsAndUnderscores_{opt}OctalDigit

OctalDigitsAndUnderscores:OctalDigitOrUnderscoreOctalDigitsAndUnderscores OctalDigitOrUnderscore

OctalDigitOrUnderscore:OctalDigit`_`

BinaryDigits:BinaryDigitBinaryDigit BinaryDigitsAndUnderscores_{opt}BinaryDigit

BinaryDigitsAndUnderscores:BinaryDigitOrUnderscoreBinaryDigitsAndUnderscores BinaryDigitOrUnderscore

BinaryDigitOrUnderscore:BinaryDigit`_`

BinaryDigit: one of`0`

`1`

Can underscores appear next to the decimal point in a floating-point literal? I recall there was an example which did that in the proposal.

Posted by

Celtic Minstrelon July 17, 2009 at 01:37 PM PDT #@Celtic,

No, underscores next to the decimal point or next to the float type suffix (trailing "f" or "d") are not allowed by the grammar as written.

Posted by

Joe Darcyon July 18, 2009 at 10:50 AM PDT #