Implementation of a Scripting Language

Article 169451 of rec.games.programmer:
From: amitp@Xenon.Stanford.EDU (Amit Patel)
Newsgroups: rec.games.programmer
Subject: Re: Scripting thru DLLs and multitasking
Date: 23 Jan 1998 20:10:09 GMT
Organization: Stanford University, CA 94305, USA

 Carlos DaSilva <cdasilva@earthlink.net> wrote:
| How would I go about making my game engine "multitask" when using regular C
| functions in DLLs for scripting?
| 
| Basically, when my engine executes a script for an object, I don't want the
| whole game world to halt while that script is running. Is there something
| basic I'm missing? I'm trying to think through the whole process and I don't
| see how to circumvent this problem.

(I don't suggest using threads)


If your scripting language "compiles" down to some kind of bytecode,
AND if your bytecode interpreter is self-contained (i.e., the stack is
maintained explicitly, rather than in the C stack), then it's fairly
simple -- you can just interpret N bytecode instructions, and then
return from your interpreter.  The state (stack, registers, heap,
etc.) are all sitting there waiting for you to call the interpreter
again.  You can even have a "multithreaded" scripting language by
maintaining several stacks, and then you can run N instructions on one
stack and N instructions on another stack.  (Multithreading scripting
might be useful if you have lots of objects in the game and each one
wants to run a script at the same time.)


I should write an article about scripting languages someday.  Hm.


	- Amit
Article 173648 of rec.games.programmer:
From: "James Slaughter" <James.Slaughter@susano.demon.co.uk>
Newsgroups: rec.games.programmer
Subject: Re: Scripting
Date: Mon, 16 Feb 1998 19:48:46 -0000

>In article <01bd3a3e$dcc31e00$572342d1@ghema.jaguarsystems.com>,
"Matt"
><guestm@usa.netNOSPAM> wrote:
>
>> Hey all,
>>
>> I am in the process of planning this game and have decided to add a
>> scripting language to help out in level creation.
 [...]
>> wall to create a trap, and so on.  Anyway, I know how to tokenise
>> the script file, and have some ideas on how to go about the actual
>> compiling and executing of the script, but I was hoping someone out
>> there has done this and could give me some tips on how to run the
>> script at game quality speed.

There are libraries, but I prefer to write my own so that I know
what's going on. In my opinion, the best way to increase execution
time is to spend more during compilation. I try to convert all my
scripts into a series of instructions, similar to an assembly Op-Code.
Each Instruction is an integer, and each takes two operands, one or
both of which may be null if it is unused. The redundancy is there to
make it easy to move memory about for loading and script processing.
Then when it comes to executing the code, I index an array of pointers
to class member functions with the OpCode (which are pre-defined in
ascending numeric order, so it's OK), and dereference it passing to
the function each parameter, plus a virtual instruction pointer, and a
memory area.

Basically, my execution routine, once the script has been loaded looks
like this in pseudo code:

unsigned int CS_IP = StartPoint();   // Starting offset is never zero;

while(CS_IP)
{
   InstructionTable[Script[CS_IP].OpCode](Script[CS_IP],&CS_IP,Memory);
}

...and an instruction sets CS_IP to zero to quit, else CS_IP is
automatically incremented to the next instruction, or set to something
else for a JMP instruction, etc. So far, I have not thought of anything
faster than this for execution, though I would welcome any suggestions.

When compiling, I take the following steps:

1) Parse into tokens
Divide the source file up into individual symbols, replace #define
style macro replacements, etc. Be careful to keep literals as such;
"SMEG" should remain as one token, not '"','SMEG','"'. Also, make sure
that two-symbol tokens are parsed as such, the equality operator
should not suddenly become to consecutive assignment operators. I
chose to add a 'fumble' in that it when it found something along the
lines of LValue op equals RValue (eg, A+=B), it substituted it for
LValue equals LValue op RValue (or A=A+B). Again, be sure to parse
'B--B' as 'B-(-B)' (pointless, I know), rather than 'B-- B' which is
illegal. Many commercial C++ compilers have a hard time with
statements such as '++A+++++A++;', which we might equate to ((++A)++)
+ ((++A)++), IE=2*(A+2), so I don't see why yours or mine should
bother if they're only a psuedo implementation of a custom script
language. Of course, strip out comments and whitespace. Caution! 'int
A/*COMMENT*/B' is illegal, because it is equivelent to 'int A B', but
if you just strip out comments, it becomes 'int AB' which is legal, if
AB is not already defined. Now's the time to decide if your language
is case sensitive. No? Convert the case now, then!

2) Mark Tokens as their type.
My entire language is based arround lists of a structure that defines
the name of the instruction as it appears in the script, the op-code
it should be replaced with, it's precedence (so * comes before +), the
types the operator deals with, and what format the operator is: unary,
binary, or keyword. It also stores the function pointer ready to
export to the instruction table for execution. I just index the Token,
if it is not in the list, then it must be an identifer, else I look up
its type. There can be a conflict with unary and binary
operators: -A-B, for example. I use the fact that a unary operator is
only ever next to one identifier.

3) Handle references
Now's the time to build a list of prototypes, and handle nasty include
directives. Do the includes first. Try to make sure that every
identifier is declared somewhere.

4) Type Checking
Does your language allow float=string*integer? Even if it does, you'll
want to make sure that the right operators being used. Do all type
checking now, and do it strictly to make running the script easy and
safe.

5) Recursive Descent.
Having assigned types, to split each expression up, I used a recursive
descent technique, not always terribly efficiant, but it works and is
quite elegant with a minimal of coding. In this, you start with an
expression 'A=B+C*D' for example. You take the first delimeter ('='),
and see if it is a lower priority than the one after ('+'), and if it
is, you call the procedure again, but with the bit after the first
operand. It is, so we start a new function with ('B+C*D'), and repeat
the process. We find that '*' is more important than '+', and so call
ourself again, with 'C*D' as the equation. There is nothing after
this, so we write it down in our 'to-do' list. We would also write it
down if whatever came after was equally or less important.

1st Operation: C*D;

The function returns, and we are now in the one started with
'B+[C*D]'. We would now look to see if whatever comes after the C*D is
still more important than the '+'. There isn't anything, so it's time
to do the Add. Our to-do list shows:

1st Operation: C*D;
2nd Operation: B+C;

Now we are back in the first call. There is nothing left at all, so we
write the instruction down:

1st Operation: C*D;
2nd Operation: B+C;
3rd Operation: A=B;

That's it! We would of course have to preserve C & B, since there
contents get overwritten during our approach. You could either
automatically 'push' any variable that is modified at the start, and
'pop' it back, which is a simple task, or add a result variable to
your instruction format, so that each opcode takes three parameters.
It's up to you. Be careful when popping values back, particularly when
doing stuff like if(A+B==C+D), where you can't say Push A, A+B, Pop A
is A equal to [...], you have to wait until the whole statements over,
which is OK here, but not for if(A+B==A+C), where it would check that
A+B is equal to A+A+C, which is impossible.

5) Variables
You cannot go on refering to variables by name. Well, actually, I
suppose you could, but it's hardly fast, even with the might of the
Standard C++ Library template classes ;), is it? No. Variables become
addresses, but they cannot be absolute addresses. Sometimes a function
will be called, and other times it won't. This would totally screw up
the creation of variables at run-time, so unless you want to define
all variables used in your script at start up and use giga bytes of
memory in the process, you need relative numbers. The easiest
soloution is to save an index at the start of each call, and use an
offset from that index.

6) Resolve names
Bugger about with name mangaling, etc. I'll go into this only if you
wan't me to. Very little to do unless you're into native Object
Orientation in your script language. (Yeah, I'm getting fed up typing
this message that few people are going to bother to read, and fewer
will find it interesting).

7) Write the Op Codes.
You also need to make function calls work some how. This is basically
a jump, and then a jump back, or, since this is a runtime thing, you
might like to spawn another execution routine that handles the sub
function, and let the recursion dump you back to where you were. Up to
you, really.

8) Execute, see above.

Comments? Criticism? Sorry about the last few steps... I expect you
can figure it out. I really would like to discuss ways to improve on
all this in a constructive manner... I'm not perfect, neither are you,
but together we come pretty close ;->.

One thing that I never decided on was the best techniques for I/O. I
currently follow a C/C++ style 'don't have any _keywords_' technique,
although it does support disk I/O since it's virtually the same on all
platforms. For anything else I have a system with which the script
communicates with the host interpreter to ask for services, which I
plan to include linking to external modules. What's the standard way,
I'm self taught...

Paul F. Snively wrote in message ...
>Scripting languages are supposed to be small and simple, so I'm
>inclined to recommend a good, fast, embeddable Scheme interpreter,
>which to me suggests
[...]
Not arguing, just asking, but how small is small, how simple is
simple? Sounds like I've made a complete arse of mine, but it is still
quite fast, and is very flexible.

I hope someone gets something out of this.
Regards,
James.
Article 173980 of rec.games.programmer:
From: chewy@mcione.com (Paul F. Snively)
Newsgroups: rec.games.programmer
Subject: Re: Scripting
Organization: Red Door Technologies, Inc.
Date: Thu, 19 Feb 1998 01:36:48 GMT

In article <6cfl25$6lg$1@nntp.Stanford.EDU>, amitp@Xenon.Stanford.EDU (Amit
Patel) wrote:

> I was trying to decide on a scripting for my game, and I considered
> Scheme but decided against it for the same reason I decided against
> Python and C++:
> 
>     A general purpose language isn't going to be as well suited 
>     for my game as a special purpose language.
> 
> Yes, I know I can write macros in Scheme, but it's still going to be a
> functional/imperative style language, and it's still going to have a
> lot of decisions made for me, like static scope, parenthesized syntax,
> basic data types, lists using cons-cells, and so on.

Well, there are a lot of "ifs" and "buts" to respond to this with, e.g.
Scheme is lexically scoped, but you can often find implementations with
"dynamic-bind," "dynamic-let," and/or "dynamic-wind." The parenthesized
syntax can be done away with by writing a parser of some kind using, e.g.
Zebu. Of course it has basic data types and lists using cons cells. So
what? It had better have basic data types!

> As an example, suppose I was writing an RPG [I'm really not], and I
> wanted to attach scripts to various events that might occur in the
> game.  I might want to write:
> 
> OBJECT Fred IS-A Paladin:
>     ON attacked-by X:
>         if X.evil: attack(X)
>     ON time-is 12pm:
>         walk to: tavern
>     ON time-is 9pm:
>         walk to: home
> 
> I can write a `bytecode' representation of this much more efficiently
> than if I had used Scheme (or some other general purpose language),
> because I know exactly what kinds of constructs are common in my code
> and what to optimize for.  I would expect that my compiled
> representation would be two to five times smaller than that of a
> general purpose language being used for this specific task.  I would
> guess (but I am not certain) that the virtual machine could be
> somewhat smaller for a special purpose language as well.  (The last
> time I wrote a scripting language, which admittedly was many years
> ago, the VM was under 10k, and it's hard to find that in SCM or
> MIT-Scheme or any other variants I know of...)

Sure. This is always the trade off: specificity vs. generality. Given that
I'm targeting PC's and Macs rather than, say, Nintendo 64's and Sony
Playstations, I'm not overly concerned about the size of the code. I'm
concerned with how easily I can integrate it with C++ and what I can
express in it.

> I also get to write things in a natural style, like "12pm", rather
> than something like "(make-time 12 'pm)".

Again, macros or a preprocessor easily handle issues like this.

> Consider how much easier it
> is to use strings, lists, vectors, and functions in Scheme or Python
> than in C.  Yes, you _can_ code it in C, but it's much easier in
> Scheme because it's built-in.  Similarly, if I want to write programs
> about objects, times, events, and locations, then it's easier to do
> this in a language that understands these abstractions than to do it
> in a language that merely allows you to build these abstractions.

One reason I like MzScheme is that it has an object system. A major reason
that I like Lisp dialects in general is that it's easy to use them as
metalanguages: a language for writing a language that's specific to your
application. MzScheme has the added benefit that I can build the
abstractions, find that they are indeed too cumbersome in terms of either
memory or performance, and rewrite them in C++, providing them to the rest
of my MzScheme code as if they were native functionality of MzScheme.

> This is especially true when the overall structure of your program
> does not fit the conventional procedural (or functional, or
> object-oriented) style of programming.

I think this is your strongest point so far. It's been my experience that,
in game programming, you almost always end up wanting some sort of
delegation semantics, and none of the popular application or scripting
language naturally support such a semantics.

> Anyway, I agree that Scheme may serve as a good scripting language,
> but I disagree that ``there's really no excuse for reinventing the
> wheel''.  As the language becomes more specialized, there is more to
> gain by writing your own language.  You may be looking for a bicycle
> wheel, and a motorcycle wheel, although more suitable than an airplane
> wheel, just won't do.

While I agree in principle, I think there are larger questions that have to
be answered before making a choice, such as: are you adding a scripting
language for yourself or for potential third parties? If for yourself, then
knock yourself out and invent a new language, but don't be surprised if you
someday down the road find yourself wishing that your tight, specialized
little scripting language did something it doesn't, and can't easily
without turning its syntax, if not its semantics, into a hodgepodge. On the
other hand, if you really want third parties to add cool new objects to
your game or even build whole new games using nothing but the core
functionality of your engine, I still have to recommend sticking with an
existing embeddable language. In fact, I would even if only I'd be
scripting, again because I have a personal preference for expressive power,
and hence generality, vs. application specificity, which might indeed have
led me to a smaller and possibly faster (but I doubt it) implementation.

>         - Amit

Paul
Article 173952 of rec.games.programmer:
From: hoffoss@volition-inc.com (Jason Hoffoss (Tclord))
Newsgroups: rec.games.programmer
Subject: Re: Scripting
Date: Wed, 18 Feb 1998 23:06:01 GMT

On Mon, 16 Feb 1998 19:48:46 -0000, "James Slaughter"
<James.Slaughter@susano.demon.co.uk> wrote:

>>In article <01bd3a3e$dcc31e00$572342d1@ghema.jaguarsystems.com>,
>"Matt"
>><guestm@usa.netNOSPAM> wrote:
>>
>>> Hey all,
>>>
>>> I am in the process of planning this game and have decided to add a
>>> scripting language to help out in level creation.
> [...]
>>> wall to create a trap, and so on.  Anyway, I know how to tokenise
>>> the script file, and have some ideas on how to go about the actual
>>> compiling and executing of the script, but I was hoping someone out
>>> there has done this and could give me some tips on how to run the
>>> script at game quality speed.
>
>There are libraries, but I prefer to write my own so that I know
>what's going on.

This is exactly why I wanted to do the same thing. :)  Plus, I think
it's actually probably easier to do it yourself that figure out how
these other libraries and such work.  I've never been able to do it.
They try and make it very generic and flexible for all cases, and it
just complicates it too much, etc.  Writing your own, you can tailor
it down to just what you need.  I think it works out a lot better.
Plus, you understand how it all works, and learned something new along
the way, which is always handy, I don't care what anyone says.

>In my opinion, the best way to increase execution
>time is to spend more during compilation. I try to convert all my
>scripts into a series of instructions, similar to an assembly Op-Code.

That's what I did.  I have to think there's a good reason CPU's have
the sort of instruction sets that they do, and work the way they do.
So the best way to do an imbedded language is to model it after a CPU
I think, for doing general purpose things.  Specific things it makes
more sense to have some hook to it, like the way interrupts worked
under DOS.  If you really needed to, you could write the interpreter
in assembly for speed, and it probably wouldn't be very hard to write
it because your instructions would be so similar to the CPU
instructions.

Btw, the compiler I wrote for my language is blazing fast, and I
didn't even try writing it for speed.  I put limitations on sizes so I
can load the source file completely into memory.  Most real compilers
don't make an assumption like this and stream it off disk, which slows
it down incredibly I guess.

>Each Instruction is an integer, and each takes two operands, one or
>both of which may be null if it is unused. The redundancy is there to
>make it easy to move memory about for loading and script processing.

Your instructions don't model a CPU as much then I guess.  With CPUs,
generally you do operations with a register and an operand, which is
how my language works as well.  I only have one register, however, the
accumulator, which cuts down the number of instructions quite a bit.
You also have operators for branching, pushing values on the stack,
etc.  I did combine most binary operations (i.e. math operations) with
the top value on the stack (i.e. acc = stack val * acc), since you
generate situations like this a lot when compiling a language.

Btw, is your language more like a real language, or more of an
assembler?  It sounds more like an assember.

>Then when it comes to executing the code, I index an array of pointers
>to class member functions with the OpCode (which are pre-defined in
>ascending numeric order, so it's OK), and dereference it passing to
>the function each parameter, plus a virtual instruction pointer, and a
>memory area.

I have an int for the PC (program counter), which I used as the index
into a block of memory that is the byte-code.  I basically use a big
switch statement to handle each op-code.  If an op-code uses data, the
data directly follows the op-code (just like machine language).  For
any decent compiler, this switch statement should turn into a jump
table in the executable.

>Basically, my execution routine, once the script has been loaded looks
>like this in pseudo code:
>
>unsigned int CS_IP = StartPoint();   // Starting offset is never zero;
>
>while(CS_IP)
>{
>   InstructionTable[Script[CS_IP].OpCode](Script[CS_IP],&CS_IP,Memory);
>}
>
>...and an instruction sets CS_IP to zero to quit, else CS_IP is
>automatically incremented to the next instruction, or set to something
>else for a JMP instruction, etc. So far, I have not thought of anything
>faster than this for execution, though I would welcome any suggestions.

Looks like it will accomplish the same thing.  Don't know what all
that info is you are passing in.  I'd probably just make the CS_IP a
global and not bother passing it into the function.  Of course, even
calling a function has it's overhead, which I guess is why I went with
a switch statement inside an infinate loop (with an RTS instruction
returning from the loop and thus exiting the loop).

>When compiling, I take the following steps:
>
>1) Parse into tokens
>Divide the source file up into individual symbols, replace #define
>style macro replacements, etc. Be careful to keep literals as such;
>"SMEG" should remain as one token, not '"','SMEG','"'. Also, make sure
>that two-symbol tokens are parsed as such, the equality operator
>should not suddenly become to consecutive assignment operators. 

I don't really bother with tokenizing.  I just compare the current
text with what is allowed at whatever point I'm at.  For example, if I
just found an '=', then I know there much be a variable name or a
number next.  If not, throw out an error.  My compiling method is
pretty object oriented in how it works, which isn't really the
tranditional way compiler are writen.

>I
>chose to add a 'fumble' in that it when it found something along the
>lines of LValue op equals RValue (eg, A+=B), it substituted it for
>LValue equals LValue op RValue (or A=A+B). 

I don't remember what I did here.  I think it turns out something like
this in the end, though:

Load acc, A
Add acc, B
Store A, acc

So I guess it gets treated like A=A+B really. :)

>Again, be sure to parse
>'B--B' as 'B-(-B)' (pointless, I know), rather than 'B-- B' which is
>illegal. 

In my system, if there is a space between the minus signs, it works
fine.  If there isn't, it will see it as either a pre or post
increment operator (has higher precedence).  So the results turn out
the same as would happen in C, and I don't run into problems like this
in C really, so I don't expect I will writing scripts. :)

>Many commercial C++ compilers have a hard time with
>statements such as '++A+++++A++;', which we might equate to ((++A)++)
>+ ((++A)++), IE=2*(A+2), so I don't see why yours or mine should
>bother if they're only a psuedo implementation of a custom script
>language. 

Right.  The way I set things up, though, it kind of came for free.  It
would first see ++A, and then see (++A)++, but at this point you have
a post-increment on a non-LVALUE, so it would throw an error (who the
hell writes something like this anyway? :)

>Of course, strip out comments and whitespace. Caution! 'int
>A/*COMMENT*/B' is illegal, because it is equivelent to 'int A B', but
>if you just strip out comments, it becomes 'int AB' which is legal, if
>AB is not already defined. Now's the time to decide if your language
>is case sensitive. No? Convert the case now, then!

Because I work with C so much, I wanted to make mine work as much like
C as possible, so it is case sensitive as well.  I think a scripting
language is a lot better the more it is like a real language a lot of
people know.  Programmers will be able to learn and use it much
easier, and anyone who doesn't know has a wide range of books to help
learn it from.  You don't have to try teaching it to them.

The comment problem was easy to deal with for me, as I know where I
can expect white space, and I call a function at these points to skip
the whitespace.  It also skips comments here as well.  A nice property
of comments is that they are only valid where whitespace is valid.

>2) Mark Tokens as their type.
>My entire language is based arround lists of a structure that defines
>the name of the instruction as it appears in the script, the op-code
>it should be replaced with, it's precedence (so * comes before +), the
>types the operator deals with, and what format the operator is: unary,
>binary, or keyword. It also stores the function pointer ready to
>export to the instruction table for execution. I just index the Token,
>if it is not in the list, then it must be an identifer, else I look up
>its type. There can be a conflict with unary and binary
>operators: -A-B, for example. I use the fact that a unary operator is
>only ever next to one identifier.

I check for identifiers as the highest possible precedence, basically.
They always start with an alpha-numeric, and end with anything not an
alpha-numeric, numeric or underscore character.  I also have a list of
all reserved keywords (just like with C), and make sure when I find a
variable name it isn't one of these.  Unary operators don't cause
problems, as long as you follow your precedences correctly.

>3) Handle references
>Now's the time to build a list of prototypes, and handle nasty include
>directives. Do the includes first. Try to make sure that every
>identifier is declared somewhere.

Function calls is one place my language is very different from C (or
C++).  I don't allow overloading, and all parameters to a function are
basically defaults.  If you don't provide them, they will have
standard values (0 if you don't explicitly specify one).  So, for a
function like:

func(int a, int b=2, int c, string d)

You could call it with wither func(), func(4), func(5, 5), etc.  They
are all allowed.  You just can't pass in more arguments than are
allowed for it, like trying to do func(1, 2, 3, "blah", 9).  Also,
types have to match.  I only have integers and strings as types.
Strings are act much like constant strings in C.  You don't modify
them.  Another nice feature I added to my language is not having to
use prototypes.  You can make a call to a function ahead of actually
defining the function, and the compiler will handle it all properly
for you.  Prototypes were always one handle I never really cared for
much. :)

Another interesting feature I have is with the whole pass by address
vs pass by value thing, which relates to your reference handling.
Instead of the function deciding what should be passed how, I decided
to make the caller decide this.  So, for a function:

func(int a, int b)

I can call it as func(a, b) to pass by value, or func(&a, &b) to pass
by reference.  Completely reverse how any other language I've ever
seen does things, but I think doing it this way could have some
interesting advantages.  I haven't really put it into practice much to
see if it works well yet, though.  It makes sense to me that it would
be more useful for the caller to have control over this, though.  The
function just does things generically for everyone who calls it.  It's
each caller that has specific things it wants to do with it's data.

>4) Type Checking
>Does your language allow float=string*integer? Even if it does, you'll
>want to make sure that the right operators being used. Do all type
>checking now, and do it strictly to make running the script easy and
>safe.

Can't do this in my language.  What would this mean?  You could just
call a function to convert the string to an integer, if the string
just has digits.  I don't think converting like this is really all
that common, so a library function seemed like a logical thing to do.
Generally, strings in my language are just useful for passing to
library function, such as asking for input, getting the name of an
object, comparing 2 strings, etc.  This is a good example of how I've
limited my language down to just what was needed to fit the job I had
in mind for it.  I don't have to worry about trying too hard to make
it very general and work with everything anyone can think of.

>5) Recursive Descent.
>Having assigned types, to split each expression up, I used a recursive
>descent technique, not always terribly efficiant, but it works and is
>quite elegant with a minimal of coding. In this, you start with an
>expression 'A=B+C*D' for example. You take the first delimeter ('='),
>and see if it is a lower priority than the one after ('+'), and if it
>is, you call the procedure again, but with the bit after the first
>operand. It is, so we start a new function with ('B+C*D'), and repeat
>the process. We find that '*' is more important than '+', and so call
>ourself again, with 'C*D' as the equation. There is nothing after
>this, so we write it down in our 'to-do' list. We would also write it
>down if whatever came after was equally or less important.
>
>1st Operation: C*D;
>
>The function returns, and we are now in the one started with
>'B+[C*D]'. We would now look to see if whatever comes after the C*D is
>still more important than the '+'. There isn't anything, so it's time
>to do the Add. Our to-do list shows:
>
>1st Operation: C*D;
>2nd Operation: B+C;
>
>Now we are back in the first call. There is nothing left at all, so we
>write the instruction down:
>
>1st Operation: C*D;
>2nd Operation: B+C;
>3rd Operation: A=B;

I do a more object oriented method.  Basically, since = has the lowest
priority, I check for this first:

int check_equ()
{
   check_plus();
   skip_white();
   match("=");
   check_plus();
   out(OP_EQU);  // stack-1 = acc
}

This looks for X=X type of things.  It doesn't really worry about what
X might be, though.  Next, in check_plus(), I break X down..

int check_plus()
{
   check_mult();
   skip_white();
   match("+");
   check_mult();
   out(OP_ADD);  // acc += stack - 1
}

So this looks for Y+Y within an X.  So combined, we look for something
like (Y+Y)=(Y+Y).  Since in check_equ() we first call check_plus(),
the whole Y+Y that is before the = gets processed before control
returns back to check_equ() and it can look for the =.  So things are
all checked for in their correct precedence as a result.  Works pretty
slick.  Check_mult() would look like this..

int check_mult()
{
   check_var();
   skip_white();
   match("*");
   check_var();
   out(OP_MUL); // acc *= stack - 1
}

Check_var() looks for a variable name or number/string, which has the
highest priority, so it doesn't make any calls down to anything else.
Whatever it finds it puts on the stack.  So it's like recursion, but
it's always calling new functions.  What's nice about using new
functions for each level is that you can just check for what you can
expect there, instead of having to check everything.  Keeps each
function simpler.

>That's it! We would of course have to preserve C & B, since there
>contents get overwritten during our approach. You could either
>automatically 'push' any variable that is modified at the start, and
>'pop' it back, which is a simple task, or add a result variable to
>your instruction format, so that each opcode takes three parameters.
>It's up to you. Be careful when popping values back, particularly when
>doing stuff like if(A+B==C+D), where you can't say Push A, A+B, Pop A
>is A equal to [...], you have to wait until the whole statements over,
>which is OK here, but not for if(A+B==A+C), where it would check that
>A+B is equal to A+A+C, which is impossible.

Right, == has lowest priority, so (A+B) gets put on stack, (C+D) gets
put in acc when you actually execute the == operation.

>5) Variables
>You cannot go on refering to variables by name. Well, actually, I
>suppose you could, but it's hardly fast, even with the might of the
>Standard C++ Library template classes ;), is it? No. Variables become
>addresses, but they cannot be absolute addresses. Sometimes a function
>will be called, and other times it won't. This would totally screw up
>the creation of variables at run-time, so unless you want to define
>all variables used in your script at start up and use giga bytes of
>memory in the process, you need relative numbers. The easiest
>soloution is to save an index at the start of each call, and use an
>offset from that index.

Actually, I create 2 symbol tables, one for globals, one for local
vars.  The max number for each is 256, so I can turn all variable
references into a byte that represents what variable I am dealing
with.  I think a max of 256 is enough.  I don't expect people to make
huge scripts, and limiting so I can use 1 byte for variable references
reduces the size a lot.  Compiled scripts are pretty compact.  In the
code to actually run the script, local and global variables each is an
integer array, and I use the variable number as the index to get the
value.

>6) Resolve names
>Bugger about with name mangaling, etc. I'll go into this only if you
>wan't me to. Very little to do unless you're into native Object
>Orientation in your script language. (Yeah, I'm getting fed up typing
>this message that few people are going to bother to read, and fewer
>will find it interesting).

I'm one of the exceptions then I guess. :)  Since I don't allow
overloading, I don't have to worry about this one.

>7) Write the Op Codes.
>You also need to make function calls work some how. This is basically
>a jump, and then a jump back, or, since this is a runtime thing, you
>might like to spawn another execution routine that handles the sub
>function, and let the recursion dump you back to where you were. Up to
>you, really.

Again, I model it after CPU instructions, so I have a CALL instruction
and an RTS instruction.  I also have branch instructions, which you
need for things like loops and gotos.  You don't use them for function
calls.  You can't just jump back, btw, because you can call functions
from many places.  How do you know which to jump back to?  So you need
a call stack, etc.

>8) Execute, see above.
>
>Comments? Criticism? Sorry about the last few steps... I expect you
>can figure it out. I really would like to discuss ways to improve on
>all this in a constructive manner... I'm not perfect, neither are you,
>but together we come pretty close ;->.

Well there's a glimpse from my experiences with messing with all this.
I've successfully written a compiler and interpreter for my language,
and they work.  I haven't actually used the interpreter in a real game
yet, though. :)  So you can call this a little more than theory, but
less than proven.

>One thing that I never decided on was the best techniques for I/O. I
>currently follow a C/C++ style 'don't have any _keywords_' technique,
>although it does support disk I/O since it's virtually the same on all
>platforms. For anything else I have a system with which the script
>communicates with the host interpreter to ask for services, which I
>plan to include linking to external modules. What's the standard way,
>I'm self taught...

The way I would handle it is with library functions, just like C.  My
language doesn't really have library functions, though.  They are
actually 'system functions', which are hooks to do special things like
this.  For example, if you want to display a string to the user, you
can add in a 'print()' system function.  All that is needed for me is
to add it to my 'system function' header file (which tells the
compiler what all the system functions are), and then add support for
it doing something useful in the interpreter.  So it's pretty flexible
and easy to extend as I come up with new things I need to support.
The language itself never changes.  You can extend the abilities of C
with new libraries without changing the fundimentals of C.  You'll
notice I base a lot of my ideas on other things out there (C, CPU,
etc).  I figure they did what they did for good reasons after
researching it extensively, so why re-invent the wheel?  Take a
shortcut instead and do it how they do it.  And of course, their
approaches are proven and stable.

I'm still fixing up my compiler and interpreter right now, but
eventually I plan to release this for others to use for their games.
Anyway, if you have any question about anything I mentioned, feel free
to ask.

>Paul F. Snively wrote in message ...
>>Scripting languages are supposed to be small and simple, so I'm
>>inclined to recommend a good, fast, embeddable Scheme interpreter,
>>which to me suggests
>[...]
>Not arguing, just asking, but how small is small, how simple is
>simple? Sounds like I've made a complete arse of mine, but it is still
>quite fast, and is very flexible.

What I sort of invision for my language is to have many small scripts
instead of just one big one (that's the idea I'm playing with anyway).
Going to the first post, he wanted a script to handle shooting out a
fireball as part of a trap.  So you could have a script just for that
one trap.  Another trap elsewhere would be a seperate script, etc.  So
having a lot of little scripts in one reason I wanted to keep the
compiled byte-code compact.  It can make it easier to maintain as
well.  Of course, you might have a lot more source files to deal with.
Tradeoffs for everything, as always.

   -Jason
Article 174481 of rec.games.programmer:
From: "Sam Inala" <sami@ersatz.microsoft.com>
Newsgroups: rec.games.programmer
Subject: Re: Scripting
Date: 22 Feb 1998 21:11:10 GMT
Organization: Microsoft


Amit Patel <amitp@Xenon.Stanford.EDU> wrote in article
<6cfl25$6lg$1@nntp.Stanford.EDU>...
> I was trying to decide on a scripting for my game, and I considered
> Scheme but decided against it for the same reason I decided against
> Python and C++:
> 
>     A general purpose language isn't going to be as well suited 
>     for my game as a special purpose language. [..]
>
> As an example, suppose I was writing an RPG [I'm really not], and I
> wanted to attach scripts to various events that might occur in the
> game.  I might want to write:
> 
> OBJECT Fred IS-A Paladin:
>     ON attacked-by X:
>         if X.evil: attack(X)
>     ON time-is 12pm:
>         walk to: tavern
>     ON time-is 9pm:
>         walk to: home

Gee, this looks just like Visual Basic events! 

It seems like two approaches have been suggested:
1. Reinventing the wheel (write parser, compiler, and virtual machine)
2. Reusing a wheel (embedding Scheme, Tcl, Lua)

How about a third approach?
3. Interchangeable wheels (expose the game's object model)

Exposing the object model means that your game implements
some standard COM interfaces. Then you can use any scripting
language you want: Visual Basic, Java, C++, Perl, etc. You
can easily add custom events to be fired to any object and
custom properties and methods. Scripting support is lightweight
and mostly trivial to implement using ATL. Plus, you can leverage
the already existing knowledge of one million VB coders.

More information:
	_Inside COM_ by Rogerson
	_Effective COM_ by Box
	_Inside OLE, 2nd Edition_
	MSDN

--
Sam Inala < s a m i @ m i c r o s o f t . c o m >
Article 174497 of rec.games.programmer:
From: hoffoss@volition-inc.com (Jason Hoffoss (Tclord))
Newsgroups: rec.games.programmer
Subject: Re: Scripting
Date: Mon, 23 Feb 1998 01:34:29 GMT

On 22 Feb 1998 21:11:10 GMT, "Sam Inala" <sami@ersatz.microsoft.com>
wrote:

>It seems like two approaches have been suggested:
>1. Reinventing the wheel (write parser, compiler, and virtual machine)
>2. Reusing a wheel (embedding Scheme, Tcl, Lua)
>
>How about a third approach?
>3. Interchangeable wheels (expose the game's object model)
>
>Exposing the object model means that your game implements
>some standard COM interfaces. Then you can use any scripting
>language you want: Visual Basic, Java, C++, Perl, etc. You
>can easily add custom events to be fired to any object and
>custom properties and methods. Scripting support is lightweight
>and mostly trivial to implement using ATL. Plus, you can leverage
>the already existing knowledge of one million VB coders.

One thing I see about this is that it might take longer to learn this
stuff than to use one of the first 2 methods.  Also, I'm not sure if
COM object just give you access to functions or to data as well, but
if you allow access to the data, then you will lose the advantage of
validating the data.  While C and a lot of power, it can also be
pretty fragile.  If you don't really know what you are doing, and you
do something bad, like try to access an illegal index in an array, or
have a bad pointer, or whatever, you can crash the program, and it's
not always an easy matter to find the problem.  With a scripting
language, you can design it to be a lot more crash-proof.  Another
thing I don't like about doing this is that your end users now have to
be programmers to write the scripts, and have to own a compiler.  If
you write your own scripting language, they just need a text editor,
and only have to learn a simplified language, rather than a full-blown
language.  And lastly, for me anyway, it's fun to write a compiler,
and I think it's always good to know as much as you can.  Writing a
compiler can teach you a number of things.

   -Jason
Article 174507 of rec.games.programmer:
From: "Sam Inala" <sami@ersatz_microsoft.com>
Newsgroups: rec.games.programmer
Subject: Re: Scripting
Date: 23 Feb 1998 03:24:08 GMT
Organization: Microsoft


Jason Hoffoss (Tclord) <hoffoss@volition-inc.com> wrote in article
<34f3d068.1039135067@192.48.96.24>...
> On 22 Feb 1998 21:11:10 GMT, "Sam Inala" <sami@ersatz.microsoft.com>
> >It seems like two approaches have been suggested:
> >1. Reinventing the wheel (write parser, compiler, and virtual machine)
> >2. Reusing a wheel (embedding Scheme, Tcl, Lua)
> >
> >How about a third approach?
> >3. Interchangeable wheels (expose the game's object model) [using COM]
> 
> One thing I see about this is that it might take longer to learn this
> stuff than to use one of the first 2 methods.

Yes, it will. Maybe if you ignored everything but what you need for
scripting, it wouldn't take much longer, but I doubt it. Afterwards,
exposing an object model would be faster, I'm sure.

A Sun engineer once said that he classified people by the
time frames they operated in: milliseconds, microseconds,
nanoseconds, etc. Similarly, you could ask: how long will
people be writing scripts for my program?

Days:	Scripting is not appropriate. Use C/C++/DLLs.
Weeks:	Macro processing, recursive descent compiler with VM
Months:	Embeddable language / Object model.

> Also, I'm not sure if COM object just give you access to functions or
> to data as well, but if you allow access to the data, then you will
> lose the advantage of validating the data.

COM never allows direct access to data. Data may only be
accessed through methods. However, in languages like VB,
it will *look like* you are accessing a property directly. E.g.

' VB code
Debug.Print( dxmPlayer1.FileName )

actually calls a C++ method,
CDirectShowPlayer::get_FileName( BSTR* pbstrFilename ).

If you know the Eiffel language, this is like its concept 
of referential transparency. Whether data is retrieved by directly
accessing a member variable or by executing some code is
abstracted away. Access is always validated, making only
scripting errors possible.

> [..] And lastly, for me anyway, it's fun to write a compiler,
> and I think it's always good to know as much as you can.  Writing a
> compiler can teach you a number of things.

I agree. People who do this might like Jon Bentley's _More
Programming Pearls_, which discusses some little language
design. Also good is the neglected _Writing Interactive Compilers
and Interpreters_ and _Write your own programming
language in C++_.

If you wanted to be cool you could extend an embeddable
language to understand scriptable objects, like they did for Perl.
I'd like to see a COM savvy Forth or Lisp. 

--
Sam Inala < s a m i @ m i c r o s o f t . c o m >

Email me , or tweet @redblobgames, or comment: