Top-down parsing is a parsing technique where the parsing starts from the top of the parse tree and moves down towards the leaves. It begins with the start symbol of the grammar and applies production rules to perform a recursive descent through the input. This process continues until the input is completely parsed or an error is encountered.
The top-down parsing process can be implemented using various methods, and one common approach is recursive descent parsing. In recursive descent parsing, each non-terminal in the grammar is associated with a parsing function. These parsing functions are recursively called to parse the input based on the production rules.
Let’s consider a simple example of a top-down parser for the CFG.
def parse_program(tokens):
parse_declaration_list(tokens)
parse_statement_list(tokens)
def parse_declaration_list(tokens):
while tokens and tokens[0] in ['int', 'float', 'char']:
parse_declaration(tokens)
def parse_declaration(tokens):
parse_type(tokens)
parse_identifier(tokens)
# Consume semicolon
tokens.pop(0)
def parse_type(tokens):
if tokens[0] in ['int', 'float', 'char']:
# Consume the type token
tokens.pop(0)
def parse_identifier(tokens):
if tokens[0].isalpha():
# Consume the identifier token
tokens.pop(0)
def parse_statement_list(tokens):
while tokens:
parse_statement(tokens)
def parse_statement(tokens):
if tokens[0].isalpha():
parse_assignment_statement(tokens)
elif tokens[0] == '(' or tokens[0].isdigit():
parse_expression(tokens)
# Consume semicolon
tokens.pop(0)
def parse_assignment_statement(tokens):
parse_identifier(tokens)
# Consume equals sign
tokens.pop(0)
parse_expression(tokens)
# Consume semicolon
tokens.pop(0)
def parse_expression(tokens):
parse_term(tokens)
if tokens and tokens[0] in ['+', '-']:
# Consume the operator
tokens.pop(0)
parse_expression(tokens)
def parse_term(tokens):
parse_factor(tokens)
if tokens and tokens[0] == '*':
# Consume the multiplication symbol
tokens.pop(0)
parse_term(tokens)
def parse_factor(tokens):
if tokens[0].isalpha():
parse_identifier(tokens)
elif tokens[0].isdigit():
# Consume the digit token
tokens.pop(0)
elif tokens[0] == '(':
# Consume the opening parenthesis
tokens.pop(0)
parse_expression(tokens)
# Consume the closing parenthesis
tokens.pop(0)
# Example usage:
tokens = ['int', 'main', ';', 'float', 'result', ';', 'main', '=', '10', ';', 'result', '=', '(', 'main', '+', '5', ')', '*', '2', ';']
parse_program(tokens)
This example illustrates a simple top-down parser in Python for the provided CFG. The functions such as parse_program
, parse_declaration_list
, etc., represent the recursive descent parsing functions for the corresponding non-terminals in the grammar. The input tokens are consumed as the parsing progresses.