Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Generic DataFrame Math #6664

Open
wants to merge 42 commits into
base: feature/4.0
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
bdcb2c5
Change PrimitiveDataFrameColumnComputations to use Generic Math
JakeRadMSFT May 9, 2023
0e00904
Clean up and reduce helpers
JakeRadMSFT May 9, 2023
d8872b2
Clean up variables and consistency
JakeRadMSFT May 9, 2023
a29a5aa
Simplify initialize code
JakeRadMSFT May 9, 2023
c4d2999
Update GroupBy Product test
JakeRadMSFT May 9, 2023
28f63f0
Remove the alias types for now and get all tests pass.
JakeRadMSFT May 10, 2023
0e89dd0
Get rid of DateTimeDateFrameColumn alias
JakeRadMSFT May 10, 2023
b5de012
Get rid of converters
JakeRadMSFT May 10, 2023
53ef787
Convert ColumnArithmetic to generics
JakeRadMSFT May 10, 2023
4f56d89
Convert float/decimal/double versions of ColumnArithmetic to GenericMath
JakeRadMSFT May 10, 2023
c42c1e2
Remove all TT files.
JakeRadMSFT May 10, 2023
9836b30
Merge remote-tracking branch 'origin/main' into u/jakerad/generic-math
JakeRadMSFT May 11, 2023
e7f9daf
Fix using ToList on Row Collection
asmirnov82 May 15, 2023
f4d3a6e
Fix DataFrame bounds checking on indexing elements
asmirnov82 May 15, 2023
f91c6bf
Merge pull request #2 from JakeRadMSFT/u/jakerad/generic-math
asmirnov82 May 20, 2023
c1fc16e
Fix build of Microsoft.Data.Analysis.Tests
asmirnov82 May 20, 2023
f551fd1
Merge remote-tracking branch 'origin/main' into jakerad_generic_math
asmirnov82 May 20, 2023
c080c46
Fix merge issues
asmirnov82 May 20, 2023
c4388dc
Fix the behavior or column SetName method
asmirnov82 May 20, 2023
9218375
Merge branch 'fix_utility_parity_foreach' into jakerad_generic_math
asmirnov82 May 21, 2023
8cd3d39
Merge commit 'f4d3a6ecac44fbc83e121c9476fe240bf9329316' into jakerad_…
asmirnov82 May 21, 2023
92e2d72
Reset RowCount to zero, when DataFrame is empty
asmirnov82 May 21, 2023
e88b642
Remove redundant column names collection from DataFrameColumnCollection
asmirnov82 May 21, 2023
91d0f1f
Add missing implementation for datetime relevant arrow type
asmirnov82 May 21, 2023
4ebddeb
Fix DataFrame Merge issue
asmirnov82 May 21, 2023
da2cb99
Clean switch by type in binary operations
asmirnov82 May 25, 2023
1de06be
Simplify getting mutable buffers
asmirnov82 May 25, 2023
6be198d
Don't convert buffer to mutable if it not required
asmirnov82 May 25, 2023
2ba8944
Merge remote-tracking branch 'origin/main' into u/jakerad/generic-math
JakeRadMSFT Jun 23, 2023
a20d6e0
Merge remote-tracking branch 'origin/main' into jakerad_generic_math
JakeRadMSFT Jun 23, 2023
b78c0a7
Merge branch 'u/jakerad/generic-math' into jakerad_generic_math
JakeRadMSFT Jun 23, 2023
d24a594
Merge pull request #2 from asmirnov82/jakerad_generic_math
JakeRadMSFT Jun 23, 2023
373fc04
Merge branch 'feature/4.0' into u/jakerad/generic-math
JakeRadMSFT Jun 27, 2023
d3a0aae
Provide ability to filter by null value
asmirnov82 Jul 6, 2023
ab7e698
Add comments
asmirnov82 Jul 6, 2023
b0daf74
Fix merge issues (broken build)
asmirnov82 Jul 6, 2023
663db1f
Step 1
asmirnov82 Jul 6, 2023
33b6432
Cherry pick Step 2 commit from 6733
asmirnov82 Jul 6, 2023
11b5d6f
Fixed code review findings
asmirnov82 Jul 6, 2023
cb39ed8
Cherry pick PR 6724 (fix dataframe arithmetics for columns having sev…
asmirnov82 Jul 6, 2023
2856d3a
Fix tests
JakeRadMSFT Jul 6, 2023
1d57c45
Merge pull request #3 from asmirnov82/jakerad_generic_math
JakeRadMSFT Jul 6, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion eng/Versions.props
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
<SystemTextJsonVersion>6.0.1</SystemTextJsonVersion>
<SystemThreadingChannelsVersion>4.7.1</SystemThreadingChannelsVersion>
<!-- Other product dependencies -->
<ApacheArrowVersion>2.0.0</ApacheArrowVersion>
<ApacheArrowVersion>11.0.0</ApacheArrowVersion>
<GoogleProtobufVersion>3.19.6</GoogleProtobufVersion>
<LightGBMVersion>2.3.1</LightGBMVersion>
<MicrosoftCodeAnalysisAnalyzersVersion>3.3.0</MicrosoftCodeAnalysisAnalyzersVersion>
Expand Down
7 changes: 7 additions & 0 deletions src/Microsoft.Data.Analysis/AssemblyAttributes.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System.Runtime.Versioning;

[assembly: RequiresPreviewFeatures]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because you're targeting .NET 6?

The API surface of the generic math interfaces changed between 6 and 7 (such as moving all interfaces into namespace System.Numerics) and it would likely be better to keep this targeting 7/8

Copy link
Contributor Author

@JakeRadMSFT JakeRadMSFT May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - figured targeting the LTS made sense. @ericstj Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we're making this change, we'd want to target .NET 8 (LTS) and plan to ship it around or after November

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach we have recently been taking with NuGet packages here and in dotnet/runtime is to retain support for in-support frameworks. This is better than the previous approach which was never drop support for anything due to compatibility. .NET 6 remains in support until November of 2024 so our current plan would be to keep supporting it until then.

ML.NET's servicing story so far has been "update to latest" - we only maintain one active release branch. I could imagine a change where we start dropping in-support frameworks from our nuget package, but in conjunction with such a change we'd need to maintain more servicing branches, and as a result servicing changes would become more expensive since we'd need to port them between many codebases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a special consideration for .NET 6 vs .NET 8 and won't be "normal" as we don't often get features as big as generic math.

So we'd need to maintain a servicing branch for 6, but then wouldn't need to the same for 8 vs 10 without some other feature that is as meaningfully impactful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case what's the benefit from not supporting 6.0? Is it just removing that attribute or is there more to it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were concrete breaking changes to the API surface of the generic math feature between .NET 6 and .NET 7

We'd be maintaining two copies of the generic math code, one in preview and one "feature complete" as well as dealing with any subtle bugs or missing support in the broader product from it being an early preview.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also removes support for netstandard2.0 and full .Net frameworks (like 4.8, which is as far as I undertans also LTS). The company I worked before, used DataFrame on .Net 4.6.2 and they didn't have plan to migrate to .net core in the nearest future.
Could you please make at least one more release of Microsoft.Data.Analysis nuget targeting netstandart2.0 for such clients before moving to ,Net 8 (6) only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asmirnov82 we're still figuring out the timelines for this change but it will be after the next release.

Thanks for working with us!

33 changes: 0 additions & 33 deletions src/Microsoft.Data.Analysis/BooleanDataFrameColumn.cs

This file was deleted.

33 changes: 0 additions & 33 deletions src/Microsoft.Data.Analysis/ByteDataFrameColumn.cs

This file was deleted.

33 changes: 0 additions & 33 deletions src/Microsoft.Data.Analysis/CharDataFrameColumn.cs

This file was deleted.

Loading