Skip to content

[Discussion] Floating point precision #104

@MichielDeMey

Description

@MichielDeMey

Hi

We're currently using data-forge for a fintech project and we're loving it!
Current usage is to aggregate a list of financial transactions by month and create a sum, basically giving us a time-series representation of our data.

This works great, however JavaScript is notorious for (incorrectly) handling floating points and losing precision in the process.
Unfortunately, when working with financial data you'd need that floating point precision sweetness.

Relevant snippet:

const result = data.groupBy((row) => moment(row.due_date).startOf(aggregation).format("YYYY-MM"))
  .select((group) => ({
    type: group.first().type,
    due_date: group.first().due_date,
    amount: group.deflate((row) => row.amount).sum()
  }))

Which will return something like

__index__   type     due_date    amount             
----------  -------  ----------  -------------------
2020-01-01  payable  2020-01-01  -300046.7899999998 
2020-02-01  payable  2020-02-01  -399881.57999999955
2020-03-01  payable  2020-03-01  -4724.579999999969 
2020-04-01  payable  2020-04-01  29564.520000000066 
2021-01-01  payable  2021-01-01  1750               

As you can see, the floating point precision is not very favourable.
Looking at the code, the sum function does exactly what you'd expect as it just sums up two number using plain JavaScript.

https://github.com/data-forge/data-forge-ts/blob/master/src/lib/series.ts#L4011-L4019

Floating point precision has been an issue for a long time and have been solved by several libraries already.
I noticed this library already using numeral, which can handle floating point precision.

With that in mind, we wrote a custom aggregation function to use numeral to fix our floating point precision.

const result = data.groupBy((row) => moment(row.due_date).startOf(aggregation).format("YYYY-MM"))
  .select((group) => ({
    type: group.first().type,
    due_date: group.first().due_date,
    amount: group
      .deflate((row) => row.amount)
      .aggregate((prev: number, value: number) =>
        numeral(prev).add(value).value() // Use Numeral to add two numbers, returning the resulting value
      )
  }))

And correctly returns

__index__   type     due_date    amount    
----------  -------  ----------  ----------
2020-01-01  payable  2020-01-01  -300046.79
2020-02-01  payable  2020-02-01  -399881.58
2020-03-01  payable  2020-03-01  -4724.58  
2020-04-01  payable  2020-04-01  29564.52  
2021-01-01  payable  2021-01-01  1750      

I created a separate CodeSandbox with the actual code so you can play with it:
https://codesandbox.io/s/eloquent-elbakyan-16v5i?file=/src/App.tsx


With this in mind, I'd like to open up the discussion of handling floating point precision in data-forge.

The way I see it, we can either

a) Provide support for floating point precision using numeral
This would improve the developer experience of this library as floating point precision "just works".
(For the manipulation functions supported by numeral (add, subtract, multiply, divide))

e.g.

amount: group.deflate((row) => row.amount).sum(true) // pass in "true" to allow floating point precision

or a separate util function

amount: group.deflate((row) => row.amount).sumPrecise()

b) Clearly document that data-forge does not handle floating point precision out-of-the-box and provide an example on how to keep floating point precision using numeral or another library (numbro, bigdecimal.js etc.).

Curious to hear your thoughts about this!

Metadata

Metadata

Assignees

No one assigned

    Labels

    version-2Saving this for Data-Forge version 2.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions