marklar v0.3 Language Documentation Programming Language Design Paradigms -- COP6557 Mark Price (prim0001) Jason Rupard (rupj0001) 4-19-2004 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -- Table of Contents 1. Introduction to Marklar 1.0 What does Marklar actually a refer to? 1.1 Data Types 1.1.1 Marklar Types 1.1.1.1 Ordered Marklar (om) 1.1.1.2 Ordered Conformist Marklar (ocm) 1.1.1.3 Unordered Marklar (um) 1.1.1.4 Unordered Conformist Marklar (ucm) 1.1.2 Marklar Methods 1.1.3 Marklar Naming Requirements 1.2 strings and numbers 1.2.1 strings 1.2.2 numbers 1.2.3 Naming Requirements 1.3 Operators 1.3.1 Arithmetic 1.3.2 Comparison 1.4 Supplied Methods 1.4.1 print() 1.5 Methods 1.6 Warnings 1.6.1 Reference Loops 1.6.2 Marklar Assignment Chains 1.6.3 Marklar Use assignments 1.6.3.1 Fixes for Marklar Use Assignment 1.6.3.1.1 typeof() 1.6.3.1.2 the "any" type 1.7 Additional Language Information 2. Building and Running 2.1 Directory Layout 2.2 Building the Marklar Compiler 2.3 Compiling Marklar Sources 2.4 Running a Marklar Program 2.4.1 On Step 2.4.2 Two Steps 2.4.3 Making Parrot Bytecode 2.5 Marklar Examples 3. Marklar Scanner 3.1 Regular Expressions for Lex Tokens 3.2 Keywords 4. Marklar Parse 4.1 Marklar Grammar 5. Symbol Tables and Semantics 6. Marklar Code Generation 7. Runtime Environment 7.1 Target Platform 7.1.1 What is Parrot 7.2 How We Use Parrot 7.3 Marklar Support Libraries 8. A Short Explanation of the Cultural Context 1. Introduction to Marklar Marklar is built around three basic types: Marklars, strings and numbers. A Marklar is a set with methods, and has four basic subtypes: Ordered Marklars, Ordered Conformist Marklars, Unordered Marklars and Unordered Conformist Marklars. Ordered Marklars have arrays and Unordered Marklars have hash tables. Extended Marklars override or add methods to the base Marklar types. Furthermore, Marklar has purely procedural methods, including the main() method which is the default entry point. 1.1 Data Types 1.1.1 Marklar Types A Marklar can be thought of as a set with methods. There are four base marklar types. 1.1.1.1 Ordered Marklar (om) A heterogenous array with methods Example om: Index with number only: var. 1.1.1.2 Ordered Conformist Marklar (ocm) A homogenous array with methods Example ocm: <| "eth0", "192.168.35.1", "255.255.255.0" |> Index with number only: var.<| number |> 1.1.1.3 Unordered Marklar (um) A heterogenous hash table with methods Example um: {? 1, -1, 0, "my_um", my_um ?} Index may be of any data type: var.{? anything ?} "anything" gets converted into a string representation of itself 1.1.1.4 Unordered Conformist Marklar (ucm) A homogenous hash table with methods Example um: {| "sat", "np", "010101110", "1010", foo |} Index may be of any data type: var.{| anything |} "anything" gets converted into a string representation of itself. 1.1.2 Marklar Methods The following methods are provided with the base marklar types and are inherited by extended marklars. #var is a marklar of some sort var.size() # return the size of the marklar's set var.clear() # clears the marklar's set var.toString() # returns the string representation of the # marklar set var.keys() # ONLY FOR UNORDERED TYPE MARKLARS (um,ucm, # extensions). returns a ocm of the keys # contained in var's marklar set self # The special variable that is itself # Similar to this in C++ or JAVA Example of self: self.methN(...) self.size() self.{? key ?} A user can define their own marklars with their own set of methods, potentially overriding the parent methods. Example Marklar: #User defined marklar types must start with uppercase letter #My_ocm extends basetype Ordered Conformist Marklar (ocm) ocm My_ocm { void meth1(...){...}; .. .. .. void methN(...){...}; }; Marklar Use Examples: ocm fd = <| 1,2,3 |>; fd.<|3|> = 7; om asdf = ; asdf. = 8; um u = {? "hello", "there." ?}; u.{? "how are you" ?} = "good"; ucm.{? u ?} = asdf; 1.1.3 Marklar Naming Requirements Marklar types ids begin with a capital letter and markler methods begin with lower case letters. Further characters may be alphanumberic or underscores. Marklar variable names follow the same rules as those for strings and numbers. 1.2 strings and numbers 1.2.1 strings These are traditional strings, delimited by double quotes. Strings may not contain double quotes, but may contain single quotes as well as traditional whitespace characters like \n and \t. 1.2.2 numbers The "number" data type encompasses integers and floats. 1.2.3 Naming Requirements All variable names start with lower case letters followed by alphanumerics and underscores. Example Strings and numbers: string aString = "asdf"; string bString = ocm.<|0|>; number pie = 3.14; number inthesky = 90; 1.3 Operators 1.3.1 Arithmetic +,-,*,/ operate on numbers as expected. + acts as the concat operator for strings, and application of + to a number and a string forces the number into string context. Thus "abc" + 1 yields "abc1". 1.3.2 Comparison <,>,<=,>= and == are provided. Numbers compare as expected. Strings compare by dictionary sort. Numbers compared to strings are coerced into strings. 1.4 Supplied Methods 1.4.1 print() number and string variables can be chained along as such: number cnt = 9; string animal = "cats"; ocm cat = ; print("my " + animal + "has " + cnt + " lives: " + cat.toString) 1.5 Methods Marklar method names and regular method names must start with a lower letter. Variables may be declared at any point within a method given that the declaration is its own statement. 1.6 Warnings 1.6.1 Reference loops within marklar sets will cause. See, for example, src-reference-loop.mc. 1.6.2 Marklar assignment chains: given: number a; number b; ocm c; statements like a=b=1 will work but statements like a=c.<|0|>=1 will not. a will never receive its assignment. 1.6.3 Marklar Use assignments let: om a = ; om b = ; number c; string d; then: c = b. will not fail but later attepts to use c as a number will fail. and likewise for: d = b. The above two examples will internally turn c and d into marklars due to our copious semantic checks. 1.6.3.1 Fixes for Marklar Use Assignment Currently, it is the coder's responsibility to keep track of the contents of a marklar set in terms of knowing the data types. We have ways that the language could be improved in this matter. 1.6.3.1.1 First, a typeof(a) method could be added to the marklar data types, which would return "string", "number" or a's Marklar Type ID if a was a marklar. 1.6.3.1.2 An abstract "any" data type could be introduced which would have relaxed semantic rules. It would be left up to the coder to use the variable properly. Then in the example shown in 1.6.3, if: any g = b.; then: print(g. * 5 - 1.2 + "marklar!\n")); would print "23.8marklar!\n" 1.7 Additional Language Information Scope: Static Typing: Strong Exception: coersion from number to string in comparisons and addition Exception: all types coerce to string in unordered marklar key context Argument Passing: Call-by-Value: strings and numbers Call-by-Reference: All Marklar types Global Variables: NO User Marklar Definitions: A user defined marklar must be defined before a variable is declared of that type. Function Definitions: A function can be put anywhere in the global scope, it doesn't have to be defined before it is used. Functinos may not be defined within functions. Printing: The print function is defined to be: print( simple_expression ); Anything that can be done with a simple expression can be printed Recursion: YES Tail Recursion Optimization: YES ------------------------------------------------------------------------------- 2. Building and Running 2.1 Directory Layout marklar-0.3-cpp/ #main directory | --> marklar-src/ #Project's cpp source directory | --> marklar-lib/ #Project's parrot based libraries | --> parrot #The Parrot executable, out runtime environment | --> mc #The Marklar Compiler executable | --> *.mc #Marklar source files | --> *.imc #Parrot Intermediate/Assembly code produced by mc | --> marklar #A script to compile and run a Marklar source | #It is important that parrot and mc are in the | #same directory | --> Makefile #Makefile to build the Marklar Compiler 2.2 Building the Marklar Compiler prompt$ make #builds the Marklar Compiler #produces ./mc prompt$ make clean #cleans the project 2.3 Compiling Marklar Sources The Marklar Compiler will only take one source file on the command line #Compiles example.mc and produces example.imc prompt$ ./mc example.mc - or - #mc will work on stdin prompt$ ./mc < example.mc 2.4 Running a Marklar Program There are three ways a Marklar Program can be ran. The first is to run the .imc that was produce by the Marklar compiler. The second is to use the marklar script to compile and run a .mc. The third way is to have Parrot produce ByteCode from a .imc and then run the ByteCode with Parrot, this decreases the execution time of the program. Examples for each execution method are listed below. 2.4.1 One Step Running of a .mc prompt$ ./marklar example.mc #where example.mc is the Marklar source code 2.4.2 Two Step: Running a .imc prompt$ ./mc example.imc prompt$ ./parrot example.imc # where example.imc was produced by # compiling example.mc 2.4.3 Making Parrot ByteCode for Faster Startup Time prompt$ ./mc example.mc prompt$ ./parrot --to-pbc -o example.pbc example.imc prompt$ ./parrot --pbc example.pbc 2.5 Marklar Examples Files: - src1.mc is a simple example of the basic language. This file contains zero marklars. Shows: logic, iteration, selection, function calling, string concatenate, printing, Call-by-Value - src2.mc is a simple example of a basic user defined marklar Shows: User Defined Marklar, method calling, using self variable, Marklar assigment and retrieval, Marklar-to-Marklar assignment - recurse.mc - is showing off tail recursion - mattmc.mc - code written by student Matt Tyson after a few minutes of description. - src-tst.mc is the primary testing file for finding problems and running diagnostics. - src-reference-loop.mc - demonstrates the problems with marklars referencing themselves and 2 marklars referencing each other. - src-assgn.mc - demonstrate the problem with a = b. = n; ------------------------------------------------------------------------------- 3. Marklar Scanner The lexical scanner for the Marklar Language was constructed with GNU's FLEX. 3.1 Regular Expressions for Lex Tokens DIGIT [0-9] INT "-"?{DIGIT}+ FLOAT "-"?{DIGIT}+"."{DIGIT}* NUMBER {INT}|{FLOAT} STRING "\""[^\"]*"\"" LITERAL {STRING}|{NUMBER} #ID is used for variable, function, and method names ID [a-z][a-zA-Z0-9_]* #MTYPEID is used for a user defined marklar type(starts with uppercase) MTYPEID [A-Z][a-zA-Z0-9_]* COMMENT "#" 3.2 Keywords if else while return marklar ordered unordered conformist string number void print size clear keys om um ocm ucm ------------------------------------------------------------------------------- 4. Marklar Parser The Marklar Parser was constructed with GNU's BISON which is an LALR(1) parser. The grammar contained 0 shift/reduce and reduce/reduce conflicts. We consider the parsing to be the first pass. Some semantics were check during this pass. 4.1 Marklar Grammar e = empty production program-> declaration_list declaration_list-> declaration_list declaration | declaration declaration-> fun_declaration | marklar_definition type_specifier-> NUMBER | STRING | MTYPEID | marklar_type marklar_type-> ORDERED MARKLAR | CONFORMIST MARKLAR | ORDERED CONFORMIST MARKLAR | UNORDERED MARKLAR | OM | UM | OCM | UCM fun_declaration-> type_specifier ID ( parameters ) compound_statement | VOID ID ( parameters ) compound_statement marklar_definition-> marklar_type MTYPEID { marklar_meth_op_list } ; marklar_meth_op_list-> marklar_meth_op_list marklar_meth_op | marklar_meth_op marklar_meth_op-> fun_declaration ; parameters-> parameters_list | e parameters_list-> parameters_list , parameter | parameter parameter-> type_specifier ID compound_statement-> { statement_list } statement_list-> statement_list statement | e statement-> expression_statement | compound_statement | selection_statement | iteration_statement | return_statement | print_statement print_statement-> PRINT ( simple_expression ) ; expression_statement-> expression ; | ; selection_statement-> IF ( simple_expression ) statement | IF ( simple_expression ) statement ELSE statement iteration_statement-> WHILE ( simple_expression ) statement return_statement-> RETURN ; | RETURN simple_expression ; expression-> var _EQUAL expression | marklar_use _EQUAL expression | type_specifier var | type_specifier var _EQUAL expression | simple_expression var-> ID simple_expression-> logical_OR_expression logical_OR_expression-> logical_AND_expression | logical_OR_expression || logical_AND_expression logical_AND_expression-> equality_expression | logical_AND_expression && equality_expression equality_expression-> relational_expression | equality_expression eqop relational_expression relational_expression-> relational_expression relop additive_expression | additive_expression eqop-> == | != relop-> < | > | <= | >= additive_expression-> additive_expression addop term | term addop-> + | - term-> term mulop factor | factor mulop-> * | / factor-> var | literal | ( expression ) | call | marklar_use | marklar_set | marklar_provided_methods marklar_use-> ID.{? simple_expression ?} | ID. | ID.<| simple_expression |> | ID.{| simple_expression |} marklar_provided_methods-> ID.SIZE ( ) | ID.CLEAR ( ) | ID.KEYS ( ) | ID.TOSTRING ( ) marklar_set-> {? args ?} | | <| args |> | {| args |} call-> ID ( args ) | ID.ID( args ) args-> arg_list | e arg_list-> arg_list , expression | expression ------------------------------------------------------------------------------- 5. Symbol Tables and Semantics Three symbol tables where constructed, one for variables, functions, and user defined Marklars. The second pass of the Marklar compiler was used to finish up the semantics that could not be completed in the first pass. The semantic rules are as follows: #string concat string = string + number #turn number into its string representation string = string + string undefined = string / | * | - number #math ops number = number + | - | * | / number #1 for true , 0 for false number = string < | > | <= | >= | != | == string number = number < | > | <= | >= | != | == number number = number < | > | <= | >= | != | == string #turn number into its string #no operators for marklars defined undefined = any_marklar op any_marklar # compare elements of marklars if # desired ------------------------------------------------------------------------------- 6. Marklar Code Generation The code generation by the Marklar Compiler is a mix of Parrot Intermediate and Assembly code. If you would like to look though this code it can be found in the .imc file that is produced by the Marklar Compiler. More information about Parrot code can be found at www.parrotcode.org. ------------------------------------------------------------------------------- 7. Runtime Environment 7.1 Target Platform The target platform is the parrot virtual machine. 7.1.2 What is Parrot? (www.parrotcode.org) Parrot is a virtual machine designed to be target by various languages, with features similar to .NET in that libraries written for one language will become accesible to others. The difference is that Parrot is an efficient target for dynamically typed languages whereas .NET and the JVM are built for languages whose types are almost entirely determined at compile time. It is thought that Perl6 will run on .NET and c or c#, for example, will run on Parrot, but in each case the non-native language will be orders of magnitude slower, barring interesting hacks like special dll's for perl on Windows. Parrot is the intended target for Perl 6 which is in the design phase. The current versions of Parrot already ship with a partial Perl 6 compiler and implementations of several others: BASIC, befunge, bf, cola, forth, m4, python, ruby, scheme and tcl are the more well known. Ports of c and java and other statically typed languages are expected as the machine matures.\ Parrot is designed as a good target platform for a wide variety of paradigms, including oo, procedural with method calls and functional, and various method stack manipulation strategies, with the most cozy support being for static scope, dynamic typing and an easy mix of the procedural and oo paradigms. See the JVM and Microsoft's CLR are optimized for statically typed languages. Compilers may target intermediate code (.imc, aka pir for parrot intermediate representation), parrot assembly (.pasm), or parrot byte code, although it is highly recommended that .imc be targeted rather than .pasm, as the developers are concentrating there. The .imc and .pasm can then be compiled down to bytecode. 7.2 How we use Parrot Parrot is, as described in 7.1, a virtual machine which may be targeted by intermediated code which is slightly more abstract than assembly. Code written by the programmer is compiled and emitted as parrot intermediate code (.imc). There are several libraries of infrastructure code, which are located in marklar-0.3-cpp/marklar-lib. 7.3 Marklar Support Libraries marklar-0.3-cpp/marklar-lib contains several suppport libraries of interest. The most prominant are OCM.imc, OM.imc and UM.imc which put together define the ordered conformist, ordered, unordered and unordered conformist marklar. Next is operators.imc which define how +,-,*,/,<,>,<=,>= and == operate on strings and numbers. There is also classname.imc which define some support methods for the marklar data types. 8. A Short Explanation of the Cultural Context and Relevance of the word Marklar From a conversation between Stan Marsh, a South Park, Colorado resident, and Marklar, you might gain a little bit of understanding. Marklar 1: Here on Marklar, we use the word Marklar to describe any person place or thing. Stan: But doesn't that get confusing? Marklar 1: Oh, no, its quite simple. Hey, Marklar! Marklar 57: Yes? You will be further enriched by this conversation between Marklar an some Lions: (Open to space. A space ship heads for Earth. Inside ship.) Marklar #1: Marklar! This is Marklar! Approaching Marklar! Marklar #2: (On intercom) Proceed with Marklar and make first contact! Marklar #1: Marklar! (The ship lands in the desert in Ethiopia. Marklar #1 gets off the ship and addresses some lions.) Marklar #1: Greetings, Marklars! I am Marklar! Lions: Snif Marklar #1: I come in Marklar! Lions: Growl! (Approach Marklar #1) Marklar #1: Oh, Marklar! Lions: Growl! (Attack Marklar #1) Marklar #1: Aaaaaaaaah! (his is finally ripped to death.) Definately, the Lions were interested in the insides of Marklar. We claim that you also should be interested in Marklars.