Project Coin: Literal Grammar Hackery

Correction: External to this blog, it was been pointed out to me that the original grammar disallowed two-digit numbers, which is unintended. The fix is to make the DigitsAndUnderscores component in Digit DigitsAndUnderscores Digit optional, as done in the corrected grammar below

Circling back to look at some unresolved technical details of the underscores in numbers proposal, I wrote up a combined grammar to allow binary literals as well as underscores as separators between digits. That is, underscores cannot appear as the first or last character in a sequence of digits.

The basic grammar change is to convert the definition of Digits (in any base) from the simple left recursive list of digits found in JLSv3, like

Digits:
Digit
Digits Digit

to a list where underscores can appear between numbers but the list must start and end with a digit:

Digits:
Digit
Digit DigitsAndUnderscoresopt Digit
DigitsAndUnderscores:
DigitOrUnderscore
DigitsAndUnderscores DigitOrUnderscore

This grammar is unambiguous, but as written it requires a look ahead of more than 1 because the recursion is in the middle of the Digits production. I have not attempted any of the usual grammar refactorings to restore a look ahead of 1 since in practice purging the underscores will be implemented by a small amount of additional logic in the scanner as opposed to the actual parsing machinery.

The existing rules for distinguishing decimal and octal literals cause minor grammar complications to accommodate underscores immediately after the first digit. Octal numbers must start with a leading zero digit and nonzero decimal numbers must start with a nonzero digit, requirements reflected in rules like NonZeroDigit Digitsopt. To allow underscores after the first digit, a new rule requiring at least one underscore is added, such as NonZeroDigit Underscores Digits. The structure of binary literals is straightforward and entirely analogous to hexadecimal ones. Changing the digit-level productions automatically allows underscores in floating-point literals without the need to explicitly update the rules for those literals.

Productions in blue below are additional or changed productions to existing non-terminals; the other non-terminals below are newly introduced to support the enhanced literal syntax.

IntegerLiteral:
DecimalIntegerLiteral
HexIntegerLiteral
OctalIntegerLiteral
BinaryIntegerLiteral
BinaryIntegerLiteral:
BinaryNumeral IntegerTypeSuffixopt
BinaryNumeral:
0 b BinaryDigits
0 B BinaryDigits
DecimalNumeral:
0
NonZeroDigit Digitsopt
NonZeroDigit Underscores Digits
Underscores:
_
Underscores _
Digits:
Digit
Digit DigitsAndUnderscoresopt Digit
DigitsAndUnderscores:
DigitOrUnderscore
DigitsAndUnderscores DigitOrUnderscore
DigitOrUnderscore:
Digit
_
HexDigits:
HexDigit
HexDigit HexDigitsAndUnderscoresopt HexDigit
HexDigitsAndUnderscores:
HexDigitOrUnderscore
HexDigitsAndUnderscores HexDigitOrUnderscore
HexDigitOrUnderscore:
HexDigit
_
OctalNumeral:
0 OctalDigits
0 Underscores OctalDigits
OctalDigits:
OctalDigit
OctalDigit OctalDigitsAndUnderscoresopt OctalDigit
OctalDigitsAndUnderscores:
OctalDigitOrUnderscore
OctalDigitsAndUnderscores OctalDigitOrUnderscore
OctalDigitOrUnderscore:
OctalDigit
_
BinaryDigits:
BinaryDigit
BinaryDigit BinaryDigitsAndUnderscoresopt BinaryDigit
BinaryDigitsAndUnderscores:
BinaryDigitOrUnderscore
BinaryDigitsAndUnderscores BinaryDigitOrUnderscore
BinaryDigitOrUnderscore:
BinaryDigit
_
BinaryDigit: one of
0 1
Comments:

Can underscores appear next to the decimal point in a floating-point literal? I recall there was an example which did that in the proposal.

Posted by Celtic Minstrel on July 17, 2009 at 01:37 PM PDT #

@Celtic,

No, underscores next to the decimal point or next to the float type suffix (trailing "f" or "d") are not allowed by the grammar as written.

Posted by Joe Darcy on July 18, 2009 at 10:50 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

darcy

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll