Subject:      Re: Speeding up python
From:         Jim Roskind <jar@netscape.com>
Date:         1996/01/20
Message-Id:   <199601200115.RAA06623@iapp15.mcom.com>
Sender:       daemon@cwi.nl (The Daemons of the Dark)
Organization: CWI, Amsterdam
Newsgroups:   comp.lang.python

>    From: Guido van Rossum <guido@CNRI.Reston.VA.US>
>    Date: Wed, 17 Jan 1996 09:26:17 -0500
>    ...
>    Someone who would like to speed up loop overhead (or the general
>    overhead in the Python VM's main loop) could probably improve Python's
>    performance considerably, and not just for a particular data type, but
>    for all data types.

Back on my "measure it" soapbox, I have a few suggestions for those
interested in acquiring fame and fortune (well, at least the former
;-) ) by helping to identify performance bottlenecks.  Guido's
suggestion is right on, but there are some relatively easy projects
that just might yield some good fruit.

It would be very nice to do the basic profiling of the interpreter
loop, in order to figure out where all the time was being spent.  You
can get a little bit of this sense by profiling the C code using a C
language profiler (the results may vary from platform to platform!!!),
but the following would probably also be very interesting (though
progressively harder to code).  Be sure that you run all such tests on
*real* applications (not for-loops that add numbers).  If you can't
think of a "real" application to run it on, try to gather statistics
while running the Python debugger or Python Profiler (they do a *lot*
of interesting work, and would certainly enjoy being sped up):

a) What is the dynamic (actually executed) instruction mix?  All you
would have to do is have a global array that was indexed by the opcode
that is extracted from the byte-code, and bump the corresponding
entry.  At process termination, it would be nice to print this list
(maybe even sorted) along with the names of the corresponding opcodes.
This would at least indicate what opcodes are getting hit a lot.  I
did this via static analysis in my disassembler, and the information
was (seemingly) quite useful (it lead to the coalescing of several
opcodes). 

...if that was too easy...

b) How much time is taken in each the opcodes?  To do this you just
have to accumulate intervals into a counter that is kept for each
opcode, rather than just "bumping" it as in suggestion "a" above.  It
would be good to not count the time that was spent in your code (the
code that call ctime() or equivalent).

c) Do some "digraph" analysis... and see what the probabilities of
each opcode is, given the previous opcode.  To do this, you'd need an
array that is the square of the number of opcodes, and you'd always
have to remember what the "last" opcode was.  I'd expect that you'll
find that some opcodes are *always* followed by other opcodes.  You'd
also get to see how often a "BRANCH" instruction ended up branching to
yet another BRANCH instruction (yes, this does happen a fair
amount... but I don't have the actual statistics... so it is hard to
say how valuable doing such branch/flow optimization would be).

Well... those are some easy and (I think) fun suggestions.  I think
they'll provide a lot more direction for how to speed up Python than
guessing hither and yonder about (to reflect on the start of this
thread) how fast integer arithmetic is, or might be, under python.

Jim

p.s., If you're wondering why I'm suggesting and not doing, it 'cause
I'm too busy with some other stuff just now, and Python is not
(currently) critical to my job (toooo bad... it is such a cool
language!). 

-- 
Jim Roskind                                              voice: 415.528.2546
jar@netscape.com                                         fax:   415.528.4159
----------------------------------------------------------------------------
PGP 2.6.2 Key fingerprint = 0E 2A B2 35 01 9B 5C 58  2D 52 05 9A 3D 9B 84 DB


