CS445/545 Compiler Construction (Spring 2023)

Prerequisites:
CS241 Computer Organization (assembly and C programming)
CS341 Programming Languages (high-level language constructs)
CS344 Data Structures and Algorithms (linked lists, binary trees, hash tables, etc., pattern matching)
CS345 Automata Theory and Formal Languages (finite automata, pushdown automata, Turing machines)

Instructor: Christino Tamon
Class: TR 3:00-4:15 (SC342)
Office hours: TR 9:15-11:00, 2-3 (SC373)


Syllabus

A study of compiler design. Overview of the compilation process. Formal definition of syntax, lexical scanning, parsing including LL and LR grammars, run-time structures, intermediate code generation, and storage allocation. The goal is to develop a compiler for a high-level programming language using standard compiler tools.

Grading:


Text:
Strongly recommended (not required): A. Aho, R. Sethi, J. Ullman, "Compilers: Principles, Techniques and Tools", first edition, Addison-Wesley, 1986.

Dragon original

The following texts are also highly recommended:


Objectives and Outcomes

Objective: to understand the fundamentals of compiler theory and its application in constructing a real working compiler.

Specific outcomes:


Requirements and Policies
Although attendance is not mandatory, students are responsible for all course materials covered in lectures and any exams given during class periods. Students that need to make up missing course work must provide the required Clarkson official exempt form. All students must submit their own work; the exchange of ideas are encouraged but ultimately the submitted work must be the student's own. Please refer to the Clarkson University Regulations for more guidelines on academic integrity and related matters.

Tentative Outline

All chapter references are for the Dragon book (1986 edition).


Schedule (tentative)

Tuesday Thursday
Jan 10

Jan 12
Chapter 1: compilation (what-why-how).
Jan 17
Chapter 2: calculator
Lexical analysis (scanner): regular languages, regular expressions (recap of CS345).
Jan 19
Chapter 2: calculator
Syntax-directed translation.
Jan 24
No (power) class.
Jan 26
Chapter 2: calculator (parsing)
Context-free grammars (recap of CS345).
Jan 31
Chapter 2: calculator (recursive descent parsing)
Context-free grammar: ambiguity, left-recursion, left-factoring.
Feb 2
Chapter 2: calculator (semantics)
Delayed semantics: intermediate representation (syntax trees).
Feb 7
Enter the Dragon: grammar.
Chapter 9: "tree-walking" gencode (Section 9.10).
Feb 9
Compiler tools: lex and yacc.
To-do: checklist.
Feb 14
Code Generation.
Reading: Section 9.10.
Feb 16
RegExp2DFA: no NFA, no cry.
Reading: Chapter 3 (section 3.9).
Feb 21
RegExp2DFA, DFA minimization.
Reading: Chapter 3 (section 3.9).
Feb 23
Short break.
Feb 28
String matching
Knuth-Morris-Pratt. Failure function.
Mar 2
Pattern matching on tries
Aho-Corasick: pdf.
Reading: Chapter 3, Exercises 3.31-3.32.
Mar 7
More on Aho-Corasick.
Exercises: sample.
Mar 9
Test 1 (sol).
Mar 14
Spring break.
Mar 16
Spring break.
Mar 21
Symbol table.
Reading: Chapter 7. Hash tables (hashpjw).
Mar 23
Semantic analysis. Type checking.
Reading: Chapters 5,6.
Testing: /afs/cu/class/cs445/public/s23/Semantic-Testing.
Mar 28
Dragon coding session.
Mar 30
Apr 4
Dragon coding session.
Apr 6
Top-down LL(1) parsing.
Reading: Chapter 4.1-4.4.
Apr 11
Bottom-up SLR(1) parsing.
Reading: Chapter 4.5-4.7.
Exercises: sample.
Apr 13
Runtime environments.
Reading: Chapter 7,9.10.
Apr 18
Intel assembly.
Reading: Chapter 7,9.10.
Apr 20
Test 2 (sol).
Apr 25
Code generation.
Apr 27
Code generation.


Project: DRAGON

CHECK LIST

(1.0) Lexical Analysis

  1. Line numbering
  2. Two styles of comments
  3. (optional) Scientific notation

(1.5) Syntax Analysis: grammar adjustments

  1. Unlimited nesting of subprograms
  2. Array access on both sides of assignment
  3. Allow for statements.
  4. (optional) Another loop construct
  5. (optional) Multidimensional arrays
  6. (optional) Records and pointers

(2.0) Symbol Table

  1. Memory-leak handler
  2. (optional) Statistical analysis of hashpjw

(2.5) Syntax Tree (Intermediate Code Generation)

  1. Visual print
  2. Memory-leak handler

(3.0) Semantic Analysis & Type Checking

  1. Check list
  2. Error reporting
  3. (optional) Error recovery

(4.0) Code Generation

  1. Input/Output statements
  2. Simple expressions (arithmetic and relational): gencode
  3. Statements (assignment, conditional, loop)
  4. Nonlocal names: base frame pointer (static scope parent)
  5. Recursive routines (example: GCD program)
  6. Complex expressions (register spilling)
  7. (optional) Arrays (L-value, R-value, parameters, nonlocal)
  8. (optional) Floating-point support

Extra Trails (under construction)


Miscellany
Cryptic trails Sidetracks: