-
-
Notifications
You must be signed in to change notification settings - Fork 76
Description
Hi
We're currently using data-forge for a fintech project and we're loving it!
Current usage is to aggregate a list of financial transactions by month and create a sum, basically giving us a time-series representation of our data.
This works great, however JavaScript is notorious for (incorrectly) handling floating points and losing precision in the process.
Unfortunately, when working with financial data you'd need that floating point precision sweetness.
Relevant snippet:
const result = data.groupBy((row) => moment(row.due_date).startOf(aggregation).format("YYYY-MM"))
.select((group) => ({
type: group.first().type,
due_date: group.first().due_date,
amount: group.deflate((row) => row.amount).sum()
}))
Which will return something like
__index__ type due_date amount
---------- ------- ---------- -------------------
2020-01-01 payable 2020-01-01 -300046.7899999998
2020-02-01 payable 2020-02-01 -399881.57999999955
2020-03-01 payable 2020-03-01 -4724.579999999969
2020-04-01 payable 2020-04-01 29564.520000000066
2021-01-01 payable 2021-01-01 1750
As you can see, the floating point precision is not very favourable.
Looking at the code, the sum function does exactly what you'd expect as it just sums up two number using plain JavaScript.
https://github.com/data-forge/data-forge-ts/blob/master/src/lib/series.ts#L4011-L4019
Floating point precision has been an issue for a long time and have been solved by several libraries already.
I noticed this library already using numeral, which can handle floating point precision.
With that in mind, we wrote a custom aggregation function to use numeral to fix our floating point precision.
const result = data.groupBy((row) => moment(row.due_date).startOf(aggregation).format("YYYY-MM"))
.select((group) => ({
type: group.first().type,
due_date: group.first().due_date,
amount: group
.deflate((row) => row.amount)
.aggregate((prev: number, value: number) =>
numeral(prev).add(value).value() // Use Numeral to add two numbers, returning the resulting value
)
}))
And correctly returns
__index__ type due_date amount
---------- ------- ---------- ----------
2020-01-01 payable 2020-01-01 -300046.79
2020-02-01 payable 2020-02-01 -399881.58
2020-03-01 payable 2020-03-01 -4724.58
2020-04-01 payable 2020-04-01 29564.52
2021-01-01 payable 2021-01-01 1750
I created a separate CodeSandbox with the actual code so you can play with it:
https://codesandbox.io/s/eloquent-elbakyan-16v5i?file=/src/App.tsx
With this in mind, I'd like to open up the discussion of handling floating point precision in data-forge.
The way I see it, we can either
a) Provide support for floating point precision using numeral
This would improve the developer experience of this library as floating point precision "just works".
(For the manipulation functions supported by numeral (add
, subtract
, multiply
, divide
))
e.g.
amount: group.deflate((row) => row.amount).sum(true) // pass in "true" to allow floating point precision
or a separate util function
amount: group.deflate((row) => row.amount).sumPrecise()
b) Clearly document that data-forge does not handle floating point precision out-of-the-box and provide an example on how to keep floating point precision using numeral or another library (numbro, bigdecimal.js etc.).
Curious to hear your thoughts about this!