This document is the primary reference for the Rust programming language grammar. It provides only one kind of material:
This document does not serve as an introduction to the language. Background familiarity with the language is assumed. A separate guide is available to help acquire such background.
This document also does not serve as a reference to the standard library included in the language distribution. Those libraries are documented separately by extracting documentation attributes from their source code. Many of the features that one might expect to be language features are library features in Rust, so what you're looking for may be there, not here.
Rust's grammar is defined over Unicode codepoints, each conventionally denoted
U+XXXX
, for 4 or more hexadecimal digits X
. Most of Rust's grammar is
confined to the ASCII range of Unicode, and is described in this document by a
dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF
supported by common automated LL(k) parsing tools such as llgen
, rather than
the dialect given in ISO 14977. The dialect can be defined self-referentially
as follows:
grammar : rule + ;
rule : nonterminal ':' productionrule ';' ;
productionrule : production [ '|' production ] * ;
production : term * ;
term : element repeats ;
element : LITERAL | IDENTIFIER | '[' productionrule ']' ;
repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ;
Where:
LITERAL
is a single printable ASCII character, or an escaped hexadecimal
ASCII code of the form \xQQ
, in single quotes, denoting the corresponding
Unicode codepoint U+00QQ
.IDENTIFIER
is a nonempty string of ASCII letters and underscores.repeat
forms apply to the adjacent element
, and are as follows:
?
means zero or one repetition*
means zero or more repetitions+
means one or more repetitionsThis EBNF dialect should hopefully be familiar to many readers.
A few productions in Rust's grammar permit Unicode codepoints outside the ASCII range. We define these productions in terms of character properties specified in the Unicode standard, rather than in terms of ASCII-range codepoints. The section Special Unicode Productions lists these productions.
Some rules in the grammar — notably unary operators, binary operators, and keywords — are given in a simplified form: as a listing of a table of unquoted, printable whitespace-separated strings. These cases form a subset of the rules regarding the token rule, and are assumed to be the result of a lexical-analysis phase feeding the parser, driven by a DFA, operating over the disjunction of all such string table entries.
When such a string enclosed in double-quotes ("
) occurs inside the grammar,
it is an implicit reference to a single member of such a string table
production. See tokens for more information.
Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8. Most Rust grammar rules are defined in terms of printable ASCII-range codepoints, but a small number are defined in terms of Unicode properties or explicit codepoint lists. 1
The following productions in the Rust grammar are defined in terms of Unicode
properties: ident
, non_null
, non_eol
, non_single_quote
and
non_double_quote
.
The ident
production is any nonempty Unicode string of
the following form:
U+0041
to U+005A
("A" to "Z"), U+0061
to U+007A
("a" to "z"), or U+005F
("_").U+0030
to U+0039
("0" to "9"),
or any of the prior valid initial characters.as long as the identifier does not occur in the set of keywords.
Some productions are defined by exclusion of particular Unicode characters:
non_null
is any single Unicode character aside from U+0000
(null)non_eol
is any single Unicode character aside from U+000A
('\n'
)non_single_quote
is any single Unicode character aside from U+0027
('
)non_double_quote
is any single Unicode character aside from U+0022
("
)comment : block_comment | line_comment ;
block_comment : "/*" block_comment_body * "*/" ;
block_comment_body : [block_comment | character] * ;
line_comment : "//" non_eol * ;
FIXME: add doc grammar?
whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ;
whitespace : [ whitespace_char | comment ] + ;
simple_token : keyword | unop | binop ;
token : simple_token | ident | literal | symbol | whitespace token ;
_ | abstract | alignof | as | become |
box | break | const | continue | crate |
do | else | enum | extern | false |
final | fn | for | if | impl |
in | let | loop | macro | match |
mod | move | mut | offsetof | override |
priv | proc | pub | pure | ref |
return | Self | self | sizeof | static |
struct | super | trait | true | type |
typeof | unsafe | unsized | use | virtual |
where | while | yield |
Each of these keywords has special meaning in its grammar, and all of them are
excluded from the ident
rule.
Not all of these keywords are used by the language. Some of them were used before Rust 1.0, and were left reserved once their implementations were removed. Some of them were reserved before 1.0 to make space for possible future features.
lit_suffix : ident;
literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit | bool_lit ] lit_suffix ?;
The optional lit_suffix
production is only used for certain numeric literals,
but is reserved for future extension. That is, the above gives the lexical
grammar, but a Rust parser will reject everything but the 12 special cases
mentioned in Number literals in the
reference.
char_lit : '\x27' char_body '\x27' ;
string_lit : '"' string_body * '"' | 'r' raw_string ;
char_body : non_single_quote
| '\x5c' [ '\x27' | common_escape | unicode_escape ] ;
string_body : non_double_quote
| '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;
common_escape : '\x5c'
| 'n' | 'r' | 't' | '0'
| 'x' hex_digit 2
unicode_escape : 'u' '{' hex_digit+ 6 '}';
hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
| 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
| dec_digit ;
oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ;
dec_digit : '0' | nonzero_dec ;
nonzero_dec: '1' | '2' | '3' | '4'
| '5' | '6' | '7' | '8' | '9' ;
byte_lit : "b\x27" byte_body '\x27' ;
byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ;
byte_body : ascii_non_single_quote
| '\x5c' [ '\x27' | common_escape ] ;
byte_string_body : ascii_non_double_quote
| '\x5c' [ '\x22' | common_escape ] ;
raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;
num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ?
| '0' [ [ dec_digit | '_' ] * float_suffix ?
| 'b' [ '1' | '0' | '_' ] +
| 'o' [ oct_digit | '_' ] +
| 'x' [ hex_digit | '_' ] + ] ;
float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ;
exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ;
dec_lit : [ dec_digit | '_' ] + ;
bool_lit : [ "true" | "false" ] ;
The two values of the boolean type are written true
and false
.
symbol : "::" | "->"
| '#' | '[' | ']' | '(' | ')' | '{' | '}'
| ',' | ';' ;
Symbols are a general class of printable tokens that play structural roles in a variety of grammar productions. They are cataloged here for completeness as the set of remaining miscellaneous printable tokens that do not otherwise appear as unary operators, binary operators, or keywords.
expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ;
expr_path_tail : '<' type_expr [ ',' type_expr ] + '>'
| expr_path ;
type_path : ident [ type_path_tail ] + ;
type_path_tail : '<' type_expr [ ',' type_expr ] + '>'
| "::" type_path ;
expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ';'
| "macro_rules" '!' ident '{' macro_rule * '}' ;
macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ;
matcher : '(' matcher * ')' | '[' matcher * ']'
| '{' matcher * '}' | '$' ident ':' ident
| '$' '(' matcher * ')' sep_token? [ '*' | '+' ]
| non_special_token ;
transcriber : '(' transcriber * ')' | '[' transcriber * ']'
| '{' transcriber * '}' | '$' ident
| '$' '(' transcriber * ')' sep_token? [ '*' | '+' ]
| non_special_token ;
FIXME: grammar? What production covers #![crate_id = "foo"] ?
FIXME: grammar?
item : vis ? mod_item | fn_item | type_item | struct_item | enum_item
| const_item | static_item | trait_item | impl_item | extern_block_item ;
FIXME: grammar?
mod_item : "mod" ident ( ';' | '{' mod '}' );
mod : [ view_item | item ] * ;
view_item : extern_crate_decl | use_decl ';' ;
extern_crate_decl : "extern" "crate" crate_name
crate_name: ident | ( ident "as" ident )
use_decl : vis ? "use" [ path "as" ident
| path_glob ] ;
path_glob : ident [ "::" [ path_glob
| '*' ] ] ?
| '{' path_item [ ',' path_item ] * '}' ;
path_item : ident | "self" ;
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
const_item : "const" ident ':' type '=' expr ';' ;
static_item : "static" ident ':' type '=' expr ';' ;
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
extern_block_item : "extern" '{' extern_block '}' ;
extern_block : [ foreign_fn ] * ;
vis : "pub" ;
See Use declarations.
attribute : '#' '!' ? '[' meta_item ']' ;
meta_item : ident [ '=' literal
| '(' meta_seq ')' ] ? ;
meta_seq : meta_item [ ',' meta_seq ] ? ;
stmt : decl_stmt | expr_stmt | ';' ;
decl_stmt : item | let_decl ;
See Items.
let_decl : "let" pat [':' type ] ? [ init ] ? ';' ;
init : [ '=' ] expr ;
expr_stmt : expr ';' ;
expr : literal | path | tuple_expr | unit_expr | struct_expr
| block_expr | method_call_expr | field_expr | array_expr
| idx_expr | range_expr | unop_expr | binop_expr
| paren_expr | call_expr | lambda_expr | while_expr
| loop_expr | break_expr | continue_expr | for_expr
| if_expr | match_expr | if_let_expr | while_let_expr
| return_expr ;
FIXME: grammar?
FIXME: Do we want to capture this in the grammar as different productions?
See Literals.
See Paths.
tuple_expr : '(' [ expr [ ',' expr ] * | expr ',' ] ? ')' ;
unit_expr : "()" ;
struct_expr_field_init : ident | ident ':' expr ;
struct_expr : expr_path '{' struct_expr_field_init
[ ',' struct_expr_field_init ] *
[ ".." expr ] '}' |
expr_path '(' expr
[ ',' expr ] * ')' |
expr_path ;
block_expr : '{' [ stmt | item ] *
[ expr ] '}' ;
method_call_expr : expr '.' ident paren_expr_list ;
field_expr : expr '.' ident ;
array_expr : '[' "mut" ? array_elems? ']' ;
array_elems : [expr [',' expr]*] | [expr ';' expr] ;
idx_expr : expr '[' expr ']' ;
range_expr : expr ".." expr |
expr ".." |
".." expr |
".." ;
unop_expr : unop expr ;
unop : '-' | '*' | '!' ;
binop_expr : expr binop expr | type_cast_expr
| assignment_expr | compound_assignment_expr ;
binop : arith_op | bitwise_op | lazy_bool_op | comp_op
arith_op : '+' | '-' | '*' | '/' | '%' ;
bitwise_op : '&' | '|' | '^' | "<<" | ">>" ;
lazy_bool_op : "&&" | "||" ;
comp_op : "==" | "!=" | '<' | '>' | "<=" | ">=" ;
type_cast_expr : value "as" type ;
assignment_expr : expr '=' expr ;
compound_assignment_expr : expr [ arith_op | bitwise_op ] '=' expr ;
paren_expr : '(' expr ')' ;
expr_list : [ expr [ ',' expr ]* ] ? ;
paren_expr_list : '(' expr_list ')' ;
call_expr : expr paren_expr_list ;
ident_list : [ ident [ ',' ident ]* ] ? ;
lambda_expr : '|' ident_list '|' expr ;
while_expr : [ lifetime ':' ] ? "while" no_struct_literal_expr '{' block '}' ;
loop_expr : [ lifetime ':' ] ? "loop" '{' block '}';
break_expr : "break" [ lifetime ] ?;
continue_expr : "continue" [ lifetime ] ?;
for_expr : [ lifetime ':' ] ? "for" pat "in" no_struct_literal_expr '{' block '}' ;
if_expr : "if" no_struct_literal_expr '{' block '}'
else_tail ? ;
else_tail : "else" [ if_expr | if_let_expr
| '{' block '}' ] ;
match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ;
match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ;
match_pat : pat [ '|' pat ] * [ "if" expr ] ? ;
if_let_expr : "if" "let" pat '=' expr '{' block '}'
else_tail ? ;
while_let_expr : [ lifetime ':' ] ? "while" "let" pat '=' expr '{' block '}' ;
return_expr : "return" expr ? ;
FIXME: is this entire chapter relevant here? Or should it all have been covered by some production already?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
FIXME: grammar?
closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|'
[ ':' bound-list ] [ '->' type ]
lifetime-list := lifetime | lifetime ',' lifetime-list
arg-list := ident ':' type | ident ':' type ',' arg-list
An empty type
never_type : "!" ;
FIXME: grammar?
FIXME: grammar?
bound-list := bound | bound '+' bound-list '+' ?
bound := ty_bound | lt_bound
lt_bound := lifetime
ty_bound := ty_bound_noparen | (ty_bound_noparen)
ty_bound_noparen := [?] [ for<lt_param_defs> ] simple_path
FIXME: grammar?
FIXME: this is probably not relevant to the grammar...
FIXME: is this entire chapter relevant here? Or should it all have been covered by some production already?
Substitute definitions for the special Unicode productions are provided to the grammar verifier, restricted to ASCII range, when verifying the grammar in this document. ↩