21st Century Smalltalk

May 17, 2007

Speed Comparison of Smalltalk, IronPython, C# and ActionScript

Filed under: General — pfisk @ 9:36 pm

There was a question from a recent post about the relative speed of my Smalltalk implementation. To answer the question (and my own curiosity), I performed  a simple benchmark test on three versions of Smalltalk, IronPython, ActionScript and C#.

The test was run five times in each language and the lowest time was recorded. All tests were performed on the same machine (3 GHz Pentium, 1 GB memory, Windows Vista).

The results were:

Language Milliseconds
C# 9
ActionScript 15
VisualWorks 112
IronPython 430
Squeak 1233
Smalltalk/DLR 4000

Some notes about the testing:

  • C# – the test initially showed 2 milliseconds, probably because the compiler recognized that the computed values weren’t being used and “short-circuited” the test. After adding a statement at the end of the test to print out the values, the test took 9 milliseconds.
  • ActionScript – I ran two versions of the ActionScript test. The first used typed variables and took 15 milliseconds to run. The second test used untyped variables and took 371 milliseconds to run.
  • IronPython – the test was run in a browser using the DLRConsole demo.
  • Smalltalk/DLR – this is my 2 week old Smalltalk compiler. Once I learn more about DLR optimization, I expect the speed to be similar to IronPython.
  • C# and ActionScript use typed variables, whereas the other languages use untyped variables. Not surprisingly, the more information that a compiler has, the better it can optimize the code for speed.
  • It is very easy to call C# code from DLR-based languages – an strategy for increasing speed is to recode portions of a program in C# and then call it from the dynamic code.

Subjectively, I have always found ActionScript 3.0 to be amazingly fast – the results, for this benchmark at least, bear that out. Also, IronPython is doing very well for an alpha release.

Anyway, it was an interesting experiment.

Below is the code used in the tests.

Smalltalk/DLR

"-- Local variables --"
| index x y z |
"-- Import DateTime --"
Vm import: 'System.DateTime'.
"-- Write header --"
Transcript clear.
Transcript show: 'Starting 1,000,000 iterations'; cr; cr.
Transcript show: DateTime Now; cr.
"– Iterations –"
index := 0.
[index < 1000000] whileTrue: [
  index := index + 1.
  x := 3.45 + index.
  y := 7.89 + index.
  z := (x*x)+(y*y) + index].
"– Results –"
Transcript show: DateTime Now; cr; cr.
Transcript show: index; cr.
Transcript show: x; cr.
Transcript show: y; cr.
Transcript show: z; cr.

Squeak / VisualWorks

| index x y z |

Time millisecondsToRun:[
index := 0.
[index < 1000000] whileTrue: [
  index := index + 1.
  x := 3.45 + index.
  y := 7.89 + index.
  z := (x*x)+(y*y) + index]]

ActionScript 1 (using Number type variables)

private function test1():Number {
  var index:int;
  var x:Number;
  var y:Number;
  var z:Number;
  var t1:Number;
  var t2:Number;
  t1 = (new Date()).getMilliseconds();
  for(index=0;index<1000000;index++) {
    x=2.34+index;
    y=4.56+index;
    z=(x*x)+(y*y)+index;
  }
  t2 = (new Date()).getMilliseconds();
  return t2-t1;
  }

ActionScript 2 (using ”type undefined” variables)

private function test2():Number{
  var index:int;
  var x:*;
  var y:*;
  var z:*;
  var t1:Number;
  var t2:Number;
  t1 = (new Date()).getMilliseconds();
  for(index=0;index<1000000;index++) {
    x=2.34+index;
    y=4.56+index;
    z=(x*x)+(y*y)+index;
  }
  t2 = (new Date()).getMilliseconds();
  return t2-t1;
  }

 IronPython

def test():
  t1 = DateTime.Now
  for index in range(1000000):
  x = 2.34+index
  y = 3.45+index
  z = (x*x)+(y*y)+index
  t2 = DateTime.Now
  print (t2-t1).Milliseconds

C#

private double Test() {
  double x = 0, y = 0, z = 0;
  DateTime t1 = DateTime.Now;
  for (int index = 0; index < 1000000; index++) {
    x = 2.34 + index;
    y = 3.45 + index;
    z = (x * x) + (y * y) + index;
  }
  DateTime t2 = DateTime.Now;
  Console.WriteLine(x + y + z);
  return (t2 - t1).Milliseconds;
}

17 Comments »

  1. FYI, I recently did a comparison of C# and ActionScript (using int vars) calculating Fibonacci(35). My results were:
    C# .234s
    ActionScript in IE7 Flash 9: 2.8s
    ActionScript in Apollo: 2.4s

    Factor of 10. Perhaps the difference is in function dispatch overhead, which your test doesn’t exercise.

    Comment by Jonathan Edwards — May 17, 2007 @ 10:38 pm

  2. This is typical of benchmarks I have seen in the Lisp and Smalltalk community for over 25 years. Benchmarks like these can serve a purpose. Maybe this one should state its purpose.

    If I wanted the above code to run as fast as possible, I’d look at C or maybe Gambit Scheme, which allows fixnum-specific code and compiles to C.

    For most applications that I personally would choose any of the above languages for, this benchmark serves little if any purpose.

    Comment by Patrick Logan — May 17, 2007 @ 11:05 pm

  3. You have a typo in your IronPython code. “t1-t1″?!

    Comment by Seo Sanghyeon — May 18, 2007 @ 12:50 am

  4. Seo,

    There is no text copy (ctrl-c) function in the IronPython console, so I copied the code by hand. Looks like I made a mistake.

    Thanks for pointing it out.

    – Peter

    Comment by pfisk — May 18, 2007 @ 1:10 am

  5. Patrick,

    I did the benchmark because someone asked me for a specific comparison – that is why I included Squeak and VisualWorks.

    The area that I am interested in is “Rich Internet Applications” – applications that can be deployed through Internet browsers. Such applications can be written in C#, ActionScript, IronPython, Smalltalk/DLR and so these were the languages that I tested.

    Compiling to “C” defeats the purpose because the code can’t be run inside the browser environment.

    So the benchmark was only for languages that can be used for “frictionless” application deployment.

    Comment by pfisk — May 18, 2007 @ 1:27 am

  6. Small nitpick: for more accurate comparisions, use “xrange(1000000)” rather than “range(1000000)” in your python code.

    ‘range’ will actually create a lsit of a 1000000 integers and intialise it.

    Comment by Mike Thompson — May 18, 2007 @ 3:19 am

  7. I noticed you didn’t include a test for JScript in the DLRConsole. I just recently ran a similar test to compare the speed difference between IE’s native JScript and the DLR version and converted it to match your example for some further comparison. Hopefully you won’t mind me tacking my results on here.

    Using your test I found that omitting the var declarations caused the script to run almost 100x slower in the DLRConsole (471 vs 42633) but only 2-3x slower in IE (1055 vs 2879). The var-inclusive version is comparable to the performance delta you stated between IronPython and type undefined ActionScript (I got 471 for JScript/DLR and 540 for IronPython). FWIW, I tried using a timer closure as well as just comparing the dates inline with marginal difference (471 for inline, 504 w/ the timer).

    Comment by Ben Hopkins — May 18, 2007 @ 3:27 am

  8. Mike,

    Thanks for the information about xrange.

    I’m slowly working my way through the O’Reilly Python book :)

    – Peter

    Comment by pfisk — May 18, 2007 @ 1:22 pm

  9. Ben,

    Your JScript test is very interesting.

    It is amazing how much difference there is between including and excluding the “var” statements.

    Thanks

    Comment by pfisk — May 18, 2007 @ 1:32 pm

  10. As an old school Smalltalk implementor I have to back up Patrick Logon on his observation. You have to been extremely careful about what you extrapolate from a single benchmark like this one. In particular, I don’t believe that this benchmark will tell you anything about the relative performance of these applications when used to for “Rich Internet Applications”

    When writing benchmarks it’s essential that the programmer understand both the nature of the core computation of the benchmark and the relevance to that core computation to real world problems.

    In this cases, the core computation is simply an iterative floating point involving local variables (no array references). Statically typed languages have an inherent advantage for this type of computation. However, it’s not clear that such a computation is at all predictive about anything that is important to most Rich Internet Application.

    High performance floating point operations are directly supported at the hardware level of all modern processors. If a compiler or JIT knows that it is adding or multiply two floating point numbers all it has to do is generate the appropriate floating point machine instruction. Statically typed languages have the necessary knowledge, prior to program execution, to directly generate such instructions. Dynamically types languages typically don’t have that ahead of time knowledge so they have to surround the floating point instruction with additional instructions to dynamically determine if floating point values are involved. They also have to annotate the results of each floating point operation so that subsequence operations will know that it is a floating point value (“box” the value). It is actually quite good if a dynamic language implementation can accomplish all this extra work using just 9 additional instructions (roughly speaking, an order of magnitude slow down). However, if this style of computation was all you were doing you would never choose to use a dynamic language…you should use a highly optimizing C or FORTRAN compiler.

    Here is what numbers in the original post tell me:

    C# (9) This is the base line number for a modern, aggressively JIT’ing virtual machine based execution environment for statically typed languages. It would be interesting to have the equivalent number for a JVM (to compare a different implementation of comparable technology) and for an optimizing C/C++ or FORTRAN compile (to compare against an non-VM based implementation. However, beware that a really good optimizing compiler might completely eliminate all but the last computation of z)

    ActionScript (15) The ActionScript JIT isn’t quite as good as the CLR’s for floating point computations. However, this isn’t a big difference. It probably represent a one or two instruction difference in the code generated for floating point operations.

    VisualWorks (112) this is a very mature commercial dynamic language implementation and its floating point performance is about an order of magnitude slower than for the static language VM’s. Actually this is pretty darn good. A dynamic language runtime such as Strongtalk that uses very aggressive dynamic code specialization techniques might be able to approach the performance of C# or ActionScript but it would take a lot of work with an uncertain payback.

    ActionScript (371) As a dynamic language implementation, ActionScript still has a ways to good to get to reach the level of VisualWorks. However, it’s in the ballpark.

    IronPython (430) Compared to VisualWorks which is a purpose built dynamic language VM, we are probably seeing the overhead of the additional abstraction layers that the DLR has to create above the CLR. But overall, this is pretty good particular in comparison to dynamic ActionScript which is also a purpose built VM.

    Squeak (1233) Squeak has always had a much simple, morer straightforward VM implementation than of the various commercial Smalltalk implementation such as VisualWorks. However, even at two orders of magnitude slower than C# (for floating point) it’s still fast enough to support many highly innovative applications. This again is cause to question the relevance of this benchmark.

    Smalltalk/DLR(4000). Hey, it’s a start…I don’t think there’s any inherent reason you can’t get to the point of matching IronPython.

    Comment by Allen Wirfs-Brock — May 18, 2007 @ 8:14 pm

  11. Allen,

    Thanks for the thorough analysis.

    I don’t read too much into benchmarks in general. That being said, the results are generally in line with my recent experiences – particularly ActionScript which surprised me with its speed.

    The reason for including Squeak was that someone asked me a direct question about the relative speed of Squeak (and VW) compared to IronPython. I like Squeak a great deal and I have just spent the past several hours getting familiar with MIT’s “Scratch” which is based on Squeak – it’s a really excellent environment for children.

    As for the relevance of the benchmark, I am trying to get a general sense of the tradeoffs for languages that can run in a browser environment. The speed difference between C# and IronPython was much larger than I expected.

    For example, in the DLRConsole example (in the Silverlight SDK), all the console window functions were coded in IronPython. That may explain why they aren’t as responsive as I feel they should be. At the moment, I am recoding some Silverlight controls in C# to see how much difference it makes.

    Benchmarks provide data points to help in making better design decisions. In this particular case, they are lending support to coding all animation support (which uses a lot of arithmetic of the sort tested here) in C#.

    I don’t infer any broader conclusions from the results.

    Comment by pfisk — May 18, 2007 @ 9:42 pm

  12. I’d still caution you not to infer too much from a single benchmark such as this one. I really don’t think arithmetic is where you are likely to find a big performance impact. I’d want to see results over a wider variety of benchmarks particularly ones that measured things like member access, method invocation, closure creation, object instantiations, etc.

    Ultimately, the best benchmark is your actual application, so recoding your controls C# is an interesting test. If there is actually a noticeable or measurable difference then micro benchmarks of this sort might help to identify the problem areas.

    Finally, here http://www.cincomsmalltalk.com/userblogs/buck/blogView?showComments=true&printTitle=Smalltalk_performance_vs._C&entry=3354595110 is an interesting threading on benchmarking VisualWorks Smalltalk against C#. You might want to pick up these tests and try recoding them in the context of Silverlight and IronPython.

    Comment by Allen Wirfs-Brock — May 18, 2007 @ 10:36 pm

  13. Allen,

    Thanks again for the links and suggestions.

    – Peter

    Comment by pfisk — May 18, 2007 @ 11:02 pm

  14. [...] Speed Comparison of Smalltalk, IronPython, C# and ActionScript There was a question from a recent post about the relative speed of my Smalltalk implementation. To answer the […] [...]

    Pingback by Top Posts « WordPress.com — May 18, 2007 @ 11:58 pm

  15. I think that this should run slightly faster:

    |x y z |

    Time millisecondsToRun:[
    0 to: 999999 do: [:index |
    x := 3.45 + index.
    y := 7.89 + index.
    z := (x*x)+(y*y) + index]
    ]

    Comment by Fede — May 29, 2007 @ 11:46 pm

  16. Fede,

    I wanted to keep the same code between Smalltalk/DLR, VW, and Squeak.

    Smalltalk/DLR doesn’t support the “to:do:” message at this time.

    – Peter

    Comment by pfisk — May 30, 2007 @ 1:24 pm

  17. Hi peter,

    I thinks that Fede had a valid point. Most languages have their own ways of coding an operation. ?Soo running identical code in 4 languages many give a bias whereas code optimesed to perform the operation in a method that suits the available features of the language may give much better ‘real world’ benchmarks.

    Regards

    Pete

    Comment by applebypd — January 8, 2008 @ 9:43 am


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.