03-23-2018 09:17
03-23-2018 09:17
Hi @JonFitbit, I've got an important question for the dev team.
It seems that Ionic CPU is Cortex M4F core with 32-bit FPU, is that correct? However, the console.log( Number.MAX_VALUE ) prints 1.7976931348623157e+308, which seems to be 64-bit float.
03-23-2018 16:49
03-23-2018 16:49
http://jerryscript.net/internals/#number
---
There are two possible representation of numbers according to standard IEEE 754: The default is 8-byte (double), but the engine supports the 4-byte (single precision) representation by setting CONFIG_ECMA_NUMBER_TYPE as well.
---
I afraid that an evidence indicates that in the Fitbit case JerryScript uses the default 64-bit floats, which should be pretty terrible because the data sheet says its CPU supports just the single precision floats. https://toshiba.semicon-storage.com/info/docget.jsp?did=53672&prodName=TZ1201XBG
Can it be fixed or addressed somehow?
03-26-2018 17:10 - edited 03-26-2018 17:28
03-26-2018 17:10 - edited 03-26-2018 17:28
Just to give you some idea how slow is that. All numbers in JerryScript are treated as floats. By default, it's 64-bit floats, which are software emulated. Cortex M4F core has 32-bit FPU unit which is not being used. To give you some idea how well it performs, this code below takes about 300 msec to run. If you unroll the loop to make 100 iterations with 10 lines of x += 1 inside, you will cut the time to approx. 150 msecs.
Which gives you 6666 additions per second on 120 MHz CPU. There are some implications in switching math to the 32-bit floats, though, such as JS won't be ECMAScript compliant. I seriously don't know what is worse, and I don't believe FitBit will ever change it.
Anyway, that's the limitation any serious developer must be aware of.
let t = Date.now(), x = 1;
for( let i = 0; i < 1000; i++ ){
x += 1;
}
console.log( x, Date.now() - t );
03-27-2018 08:35 - edited 03-27-2018 08:41
03-27-2018 08:35 - edited 03-27-2018 08:41
I did some more tests and now I doubt that switching to 32-bit floats will be of any help. We compare integer and 64-bit float math JerryScript performance.
let t = Date.now(); // Integer array let x = 0; for( let i = 0; i < 1024; i++ ){ x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; x += 1; } console.log( x, Date.now() - t ); // float array t = Date.now(); x = 0; for( let i = 0; i < 1024; i++ ){ x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; x += 1.11111111; } console.log( x, Date.now() - t );
~6.5Kops for integer additions, and about of 5Kops float additions. JerryScript distinguishes integers from floats, but it seems not giving any help. Arithmetics seems not to be the bottleneck. I believe, it makes the question closed.
03-28-2018 14:14
03-28-2018 14:14
Thanks for the feedback. We’re using 64-bit floats for standards compliance with Javascript.
I'd be interested to know what kind of applications are you trying to develop for the platform?
03-28-2018 17:27 - edited 03-28-2018 21:42
03-28-2018 17:27 - edited 03-28-2018 21:42
6> I'd be interested to know what kind of applications are you trying to develop for the platform?
‘Velocity” - it calculate averages for the gps speed and perform some geometry calculations with gps heading angles. Was hoping to add more advanced math to calculate speed projections. That’s the one which is already developed and submitted for the review. I wouldn’t like to move any calculations to the companion in this case as it would defeat the whole purpose of having the gps chip on your wrist.
New watchface I’m working on calculates the sun position, twilight phases, time to sunset. It’s amazing, I will show it later. It took 1 second to recalculate and redraw the watchface screen before I started with optimizations though. There’s some substantial volume of math in my apps, yep.
Performance is surprisingly low, but the situation is not really that bad because Ionic has a decent GPU. And as you mentioned, companion can be used to compute the really heavy stuff.
Math performance is not that important issue as 64KB heap limit is. That’s the real problem for apps.
03-28-2018 17:36 - edited 03-29-2018 13:04
03-28-2018 17:36 - edited 03-29-2018 13:04
By the way, you know what could be helpful? Some native library with a few functions to process Float32Array using the built-in 32-bit FPU (like min, max, avg, sum), classical vector math operations (y = ax+y where x and y are Float32Array, x = ax, scalar product (x,y), per-element Float32Array multiply). And, maybe, pushAndShift (to implement the fixed length fifo queue over the preallocated Float32Array). Should be really easy to implement and they might solve the problem entirely. The question of CPU load on an embedded device is really the question of battery life.
Anyway, I can’t see any other way how FPU could be put to use with JerryScript VM, and it’s quite sad that such a hardware just sitting there without any use.
03-29-2018 10:18
03-29-2018 10:18
That's great feedback, thanks for putting such effort into this.
I will escalate this internally and see what I get can added to the roadmap!
03-29-2018 15:20 - edited 03-30-2018 11:52
03-29-2018 15:20 - edited 03-30-2018 11:52
@JonFitbit
To be specific, I propose to provide JS API for the single-precision subset of BLAS Level 1 API of the ARM Performance Library which would work on Float32Array, + few extensions.
BLAS Level 1 functions.
AXPY( a, x, y ) => y := a*x + y
COPY( x, y ) => y := x
DOT( x, y ) => scalar product of x and y
NRM2( x ) => sqrt( DOT( x, x ) ) euclidian norm
SCAL( a, x ) => x := a*x
SWAP( x, y ) => swap members of two float32 arrays
IAMIN( x ) => index of minimum element
IAMAX( x ) => index of maximum element
AXPBY( A, X, B, Y ) => y := a*x + b*y
Non-BLAS functions (could be methods of the Float32Array, and they are actually simpler and more useful):
// simple aggregate functions optimized for Float32Array
fl32arr.min(), fl32arr.max(), fl32arr.avg(), fl32arr.sum()
// Helper deque functions optimized for Float32Array (or an optimized copyWithin())
fl32arr.append( X, A ) => X.copyWithin( 0, 1 ); X[ X.length - 1 ] = A;
fl32arr.prepend( X, A ) => X.copyWithin( 1, 0, X.length - 1 ); X[ 0 ] = A;
I believe that would cover all the DSP needs which might ever arise.
How would I use these functions in my existing apps? The simplest example is the GPS data filtering.
1. Create Float32Array( 10 ) and Float32Array( 3 ) to store the last 10 and 3 GPS speed samples.
2. Use speeds.append( speed ) operation to append new values.
3. Use speeds.avg() to calculate the average speed.
As the result, the need for the custom Queue class will be eliminated, and it will be reduced to just the single line.
const speeds = new Float32Array( 10 );
...
const avgSpeed = speeds.append( speed ).avg();
03-29-2018 15:34 - edited 03-29-2018 15:35
03-29-2018 15:34 - edited 03-29-2018 15:35
These functions might be made the members of the Float32Array prototype, but as the performance is a major concern I guess it will be better if functions will not be polymorphic. An option giving the minimum call overhead should be chosen.
// Option 1
const fl32arr = new Float32Array( 10 );
...
fl32arr.axpy( a, y )
// Option 2
Float32Array.axpy( a, x, y );
const { axpy } = Float32Array;
axpy( a, x, y );
// Option 2a
Math.axpy( a, x, y );
// Option 3
import { axpy } from 'blas-level1'
axpy( a, x, y );
03-29-2018 16:22
03-29-2018 16:22
Awesome, thanks. Will discuss this with the relevant folks!