|
1 |
| -### PHP-PYTHON-MIPS Compiler Design |
| 1 | +# PHP to MIPS compiler written in Python |
| 2 | + |
| 3 | +This project is purely for education purpose only. The project aims to provide a high level overview of compilers and their implementation. |
| 4 | + |
| 5 | +## Introduction |
| 6 | + |
| 7 | +At high level, a compiler is a program that takes a language code as its argument and return a desired language code. For example, given a peace of code in `C` language, we may want it to convert into `Javascript` code. |
| 8 | + |
| 9 | +```c |
| 10 | +#include<stdio.h> |
| 11 | + |
| 12 | +int main { |
| 13 | + printf("Hello World"); |
| 14 | + return 0; |
| 15 | +} |
| 16 | +``` |
| 17 | +The above `C` code can be converted to `Javascript` and we want following output. |
| 18 | + |
| 19 | +```js |
| 20 | +process.stdout.write("Hello World"); |
| 21 | +``` |
| 22 | + |
| 23 | +Once can write a program (compiler) to automate this conversion. Here `C` and `Javascript` are marely for demonstration purpose only and can be swapped with any language of choice. We can also use `Hindi` to `English` or `C` to `Machine Code` as input -> output languages for our compiler. |
| 24 | + |
| 25 | +### Languages |
| 26 | + |
| 27 | +As we already discussed, compilers needs languages. Each language has its own `Way of Saying or Doing` things. For example, to say `Hello World`, we use different ways (syntax) for different languages: |
| 28 | + |
| 29 | +``` |
| 30 | +C => printf("Hello World"); |
| 31 | +PHP => echo "Hello World"; |
| 32 | +Java => System.output.println("Hello World"); |
| 33 | +English => Hello World |
| 34 | +Hindi => नमस्ते दुनिया |
| 35 | +``` |
| 36 | +But they all result (output) in the same thing. |
| 37 | + |
| 38 | +We can not compile a language if we don't know the language itself! Our general speaking languages (e.g. Hindi, English, Korean etc.) are very hard to fully learn because people pronounce single language in differently which makes is even harder to write a compiler for it. But on the other side, computer languages are properly defined and writing it differently will cause errors. |
| 39 | + |
| 40 | +You may also have noticed that languages has a way to construct a particular statement. `Doing am I what ?` doesn't make any sense because it is not following something. That missing thing is `Grammar`. Along with grammer, misspelling words also causes issues. In computer programs, these words are called `Tokens`. |
| 41 | + |
| 42 | +#### Grammar |
| 43 | + |
| 44 | +A grammar defines the constructing rules for a language. Without it, the language simply can not exists without ambiguity. Writing `c int;` causes error in a `C` program because the grammar say to write `int c;`. |
| 45 | + |
| 46 | + |
| 47 | +#### Token |
| 48 | + |
| 49 | +Tokens are the basic building blocks of any languages. |
| 50 | + |
| 51 | +```c |
| 52 | +int n = 10; |
| 53 | +printf("%d", n); |
| 54 | +``` |
| 55 | +
|
| 56 | +Here `int`, `n`, `=`, `10`, `;`, `printf`, `(`, `"`, `%`, `d`, `"`, `,`, `n`,`)`,`;` are all tokens. |
| 57 | +
|
| 58 | +Some tokens are special. Here `int` is a language **keyword** and has special meaning for `C` language whereas `n` is not. We can put any values for `n` token like replace `n` with `x` and the meaning of program won't change. |
| 59 | +
|
| 60 | +To learn a language, we need to learn its grammar and tokens. |
| 61 | +
|
| 62 | +
|
| 63 | +Now that we have a good understanding of languages and it's constructors, we are ready to write our compiler. |
| 64 | +
|
| 65 | +## Tokenization |
| 66 | +
|
| 67 | +> TODO: |
| 68 | +
|
| 69 | +
|
| 70 | +
|
0 commit comments