May 30, 2011

Easy Going with ARM

How easy would it be to Go with ARM? It depends, from super difficult to duper easy. This is the latter case. Just got one from Mouser for US$149 with free FedEx to Singapore. Opened the box, plugged the cables in, inserted the SD card, and pressed the power switch. 

As advertised, I should expect an instant boot-up. But nothing happened. Dead on arrival? Scratched my head. Where's the manual? Searched the box again. Not found. Looked at the board and thought, should I insert the SD or microSD? No harm to try. 

Sliced the microSD from the SD sleeve, put in that slot, powered up. Bingo. AND as I idly lying back on my armchair and glanced the box again... There you are. The manual and CD are just there, pasted on the back of the box cover!

From then on, it could not be easier to just follow the normal procedure to have a Go on the pre-installed Ubuntu Lucid on the IMX53 Starter-Kit.
  1. sudo apt-get update
  2. sudo apt-get install mercurial bison ed gawk gcc libc6-dev make
  3. hg clone -u release go
  4. cd go/src; ./make.bash
It just needs much longer time (an hour?), because it only runs on a 1GHz ARM Cortex-A8, with 1GB DDR3 RAM, and worst of all, SD is way too slow. Great news is this pixie has SATA port. Next time will try find a spare hard disk and see how speedy it could go.

How slow is it? Here's the go/test/bench on my PC and ARM. I just picked two to show the string processing and floating point crunching, and cpuinfo is shorten to show the relevance only.

#! cat /proc/cpuinfo
model name : Intel(R) Core(TM)2 Duo CPU     T7300  @ 2.00GHz
stepping : 11
cpu MHz : 800.000
cache size : 4096 KB
bogomips : 3990.32

#! make timing
reverse-complement < output-of-fasta-25000000
gcc -O2 reverse-complement.c 1.43u 0.23s 1.67r
gc reverse-complement 2.72u 0.26s 3.00r
gc_B reverse-complement 2.64u 0.31s 2.95r

nbody -n 50000000
gcc -O2 -lm nbody.c 27.78u 0.00s 27.92r
gc nbody 54.01u 0.00s 54.06r
gc_B nbody 52.18u 0.00s 52.27r

lucid@lucid-desktop:~$ cat /proc/cpuinfo
Processor : ARMv7 Processor rev 5 (v7l)
BogoMIPS : 999.42
Features : swp half thumb fastmult vfp edsp neon vfpv3 
CPU implementer : 0x41
CPU architecture: 7
Hardware : Freescale MX53 LOCO Board

lucid@lucid-desktop:~/go/test/bench$ gomake timing
reverse-complement < output-of-fasta-25000000
gcc -O2 reverse-complement.c 8.00u 1.14s 10.27r
gc reverse-complement 22.97u 1.26s 27.62r
gc_B reverse-complement 22.09u 1.52s 30.77r

nbody -n 50000000
gcc -O2 -lm nbody.c 316.08u 0.40s 389.46r
gc nbody  645.32u 686.06s 1843.30r
gc_B nbody 653.49u 640.93s 1373.40r

Not surprisingly, Cortex-A8 VFP is known to be very slow, due to its non-pipeline architecture. But I don't know it is so tortoise-like. Will be Cortex-A9 much better? Once I get my OMAP4430 panda board work, I will report it here.

Just to confirm I was using VFP not the soft-float, I created each version and compare:

$    5l -F -o nbody.arm5 nbody.5
$    5l -o nbody.arm6 nbody.5
$    time ./nbody.arm6 -n 50000
real 0m1.316s
user 0m1.310s
sys 0m0.000s

$    time ./nbody.arm5 -n 50000
real 0m30.788s
user 0m29.830s
sys 0m0.000s

By the way, the proper way to run glib is via pkg-config as below. But I gave up trying to fix run().

run 'gcc -O2 `pkg-config --cflags glib-2.0` k-nucleotide.c `pkg-config --libs glib-2.0`' a.out <x


(the mouse is for illustration only, not come with the board)