I chose to install the headless image of ubuntu 11.04. Because my laptop has a SD slot on /dev/mmcblk0, the installation process was as smooth as my Teflon pan.
- insert SD card to Laptop
- sudo umount /dev/mmcblk0
- sudo sh -c 'zcat ubuntu-11.04-preinstalled-headless-armel+omap4.img.gz > /dev/mmcblk0'
- sync
- insert SD card to pandaboard, plug power, LAN and USB-Serial cable in
- On the laptop terminal: TERM=vt100 minicom
- turn on the pandaboard, on minicom after the uboot comes the standard ubuntu installation process, and finally gives us a shell prompt.
The Go installation is the same as in my last post, only much faster. Now comes to the benchmark, how does 1GHz dual core ARM A9 compare to my 2GHz dual core Intel x86, and 1GHz single core ARM A8?
Not surprisingly, the number of cores does not count here since no parallel processing is benchmarked. For string processing 1GHz A9 is slightly faster than A8 , but still more than twice slower than 2GHz x86 core. A9's VFP has been greatly improved, 5x faster than A8 by gcc.
But surprisingly, and I am astonished to see, for the floating point crunching on A9, gc is 11x slower than optimized gcc. This is very unfortunate because what I am interested in Go on ARM is OpenGL ES, which is all about matrix operations on floating points.
[update] 11x slower is caused by unoptimized pkg/math/sqrt.go, since ARM VFP has VSQRT instruction, it should not be hard to speed it up.
[update 2] I made it. Now it is 7x faster
nbody -n 50000000
gcc -O2 -lm nbody.c 71.40u 0.00s 71.43r
gc nbody 120.93u 0.00s 120.94r
gc_B nbody 119.78u 0.00s 119.80r
[update] 11x slower is caused by unoptimized pkg/math/sqrt.go, since ARM VFP has VSQRT instruction, it should not be hard to speed it up.
[update 2] I made it. Now it is 7x faster
nbody -n 50000000
gcc -O2 -lm nbody.c 71.40u 0.00s 71.43r
gc nbody 120.93u 0.00s 120.94r
gc_B nbody 119.78u 0.00s 119.80r
[/update2]
go@localhost:~/go/test/bench$ cat /proc/cpuinfo
Processor : ARMv7 Processor rev 2 (v7l)
processor : 0
BogoMIPS : 2009.29
Processor : ARMv7 Processor rev 2 (v7l)
processor : 0
BogoMIPS : 2009.29
processor : 1
BogoMIPS : 1963.08
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3
Hardware : OMAP4 Panda board
BogoMIPS : 1963.08
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3
Hardware : OMAP4 Panda board
go@localhost:~/go/test/bench$ gomake timing
./timing.sh
reverse-complement < output-of-fasta-25000000
gcc -O2 reverse-complement.c 7.88u 1.55s 9.54r
./timing.sh
reverse-complement < output-of-fasta-25000000
gcc -O2 reverse-complement.c 7.88u 1.55s 9.54r
gc reverse-complement 18.36u 1.91s 20.29r
gc_B reverse-complement 17.75u 2.08s 19.85r
gc_B reverse-complement 17.75u 2.08s 19.85r
nbody -n 50000000
gcc -O2 -lm nbody.c 71.40u 0.00s 71.41r
gc nbody 862.53u 0.02s 862.78r
gc_B nbody 865.00u 0.05s 865.28r
gcc -O2 -lm nbody.c 71.40u 0.00s 71.41r
gc nbody 862.53u 0.02s 862.78r
gc_B nbody 865.00u 0.05s 865.28r