Lecture 13

Derivations can be expressed naturally and succinctly as trees (called parse trees):

13-01

13-01

For every leftmost (or rightmost) derivation, there is a unique parse tree.

E.g. leftmost derivation for \(a + b * c\)

S => S op S => S op S op S => a op S op S => a + S op S => a + b op S => a + b * S => a + b * c
or
S => S op S => a op S => a + S => a + S op S => a + b op S => a + b * S => a + b * c

These produce different parse trees:

13-02

13-02

A grammar for which some word has more than one distinct leftmost derivation (equivalent, parse tree) is called ambiguous.

\(S\rightarrow S\enspace op\enspace S|a|b|c\), \(op\rightarrow +|-|*|/\) is an ambiguous grammar.

If we only care answering whether \(w\in L(G)\), ambiguity does not matter. But as compiler writers, we want to know why \(w \in L(G)\). i.e. the derivation matters.

The shape of the parse tree describes the meaning of the string, so a word with multiple parse tree can have multiple meanings.

13-03

13-03

So \(a+b*c\) could mean \((a+b)*c\) or \(a+(b*c)\). What do we do?

  1. Use heuristics (precedence) to guide the derivation process.
  2. Make the grammar unambiguous.

\(E\rightarrow E\enspace op\enspace T|T\)
\(T\rightarrow a|b|c\)
\(op\rightarrow +|-|*|/\)

a + b * c: E => E op T => E op T op T => T op T op T => a op T op T => ... => a + b * c
13-04

13-04

This grammar is strict left-to-right precedence.

What if we want to give \(*,/\) precedence over \(+, -\)?

\(E\rightarrow E\enspace PM\enspace T|T\)
\(T\rightarrow T\enspace TD\enspace F|F\)
\(F\rightarrow a|b|c|(E)\)
\(PM\rightarrow +|-\)
\(TD\rightarrow *|/\)

a + b * c: E => E PM T => T PM T => F PM T => a PM T => a + T => a + T TD F => a + f TD F => a + b TD F => a + b * F => a + b * c
13-05

13-05

Q: If \(L\) is contest-free, is there always an unambiguous grammar \(G\) such that \(L(G)=L\)?
A: No! There are inherently ambiguous languages that only have ambiguous grammars.

Q: Can we construct a tool that tell us if a grammar is ambiguous?
A: No! Undecidable.

Equivalence of grammars \(G_1+G_2\) (i.e. \(L(G_1)=L(G_2)\)) is also undecidable.

Recognizers

What class of computer programs is needed to recognize a Context-Free Language?

But we need more than just a Y/N answer -- need the derivation (parse tree) or message. Problem of finding the derivation is called parsing.

Given: Grammar \(G\), start symbol \(S\), word \(W\)
Find: \(S\Rightarrow ... \Rightarrow w\) (or report that there is no derivation)

How can this be done? 2 choices:

  1. Forwards (top-down): start at \(S\), expand until you reach \(w\).
  2. Backwards (bottom-up): start at \(w\), figure out how to get back to \(S\).