Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Vec2 #9830

Open
10 tasks
Dunqing opened this issue Mar 17, 2025 · 6 comments
Open
10 tasks

Improve Vec2 #9830

Dunqing opened this issue Mar 17, 2025 · 6 comments
Assignees
Labels
A-ast Area - AST

Comments

@Dunqing
Copy link
Member

Dunqing commented Mar 17, 2025

The following tasks are for Vec, I intend to work on them in order.

@Dunqing Dunqing added the A-ast Area - AST label Mar 17, 2025
@Dunqing Dunqing self-assigned this Mar 17, 2025
@overlookmotel
Copy link
Contributor

overlookmotel commented Mar 17, 2025

Personally, I would say the best next step would be to update all the methods so they follow std. After that, we can start changing Vec to our desired shape and simplifying it, removing ZSTs support, optimizing etc. But if we start from std's implementation, we know we're starting from a good place.

The only thing I'm aware of which will be really tricky porting from std is the Extend trait. std::vec::Vec has specialized implementations of extend for slice iterators etc, where the number of elements the iterator yields is guaranteed - so it can cut out bounds checks etc. But this requires nightly features (specialization and TrustedLen trait), so unfortunately it's not possible for us.

To get some of those optimizations, we'd need separate methods e.g. extend_from_slice (which std has) and extend_from_array (which it doesn't).


The only thing I think that's missing from your list is removing the #[expect(...)] attrs and fixing the warnings that are currently suppressed by them. But, in my opinion, probably best to do std changes first - hopefully that will resolve most of the things which would cause lint warnings anyway.

@overlookmotel
Copy link
Contributor

It would also be good to remove Bump from the implementation - so we completely divorce Vec from bumpalo.

We could wrap Bump in another structure called e.g. Arena and Vec::new_in etc could take an &'a Arena instead of &'b Bump. resize could become a method on Arena. That would make it easier later on to turn Arena into our own custom allocator, and remove bumpalo entirely.

By the way, the reason why I think we should have separate Allocator and Arena types is that I have in mind further down the line that we may want Allocator to contain multiple arenas. e.g.:

struct Allocator {
    /// Stores the AST (same as now)
    ast: Arena,
    /// Stores `Semantic`
    semantic: Arena,
    /// "Scratch" space for temporary data in transformer etc, to avoid heap allocations
    temp: Arena,
}

But we can make Allocator deref to Arena, so you can also call Vec::new_in with an &Allocator. So there'd be no API changes for all our code that uses Vec.

How about we split the work - I'll work on the Arena changes, and you work on Vec?

@overlookmotel
Copy link
Contributor

overlookmotel commented Mar 17, 2025

A couple more vague thoughts:

Vec and RawVec

The len and capacity fields are going to have to be in the same struct to be able to have them as u32s and reduce the size of Vec. That's inescapable, but is unfortunate in a way, because the Vec / RawVec division is useful for 2 reasons:

  1. "Division of labour" with RawVec handling the low-level tasks of allocating etc, and Vec adding higher-level abstractions on top of it.
  2. RawVec is not generic over T, so it is better for compilation times.

I wonder if we can preserve this multi-level approach somehow? (not sure how)

Index Vec?

If we're changing the type of len and capacity, do we want to make that generic rather than a hard-coded u32?

We'd then also have an arena-allocated IndexVec<K, T> type, using the same code.

Or maybe that's unnecessary complication. All our IndexVec keys are NonMaxU32 anyway, so we don't actually need that flexibility.

@Dunqing
Copy link
Member Author

Dunqing commented Mar 18, 2025

A couple more vague thoughts:

Vec and RawVec

The len and capacity fields are going to have to be in the same struct to be able to have them as u32s and reduce the size of Vec. That's inescapable, but is unfortunate in a way, because the Vec / RawVec division is useful for 2 reasons:

  1. "Division of labour" with RawVec handling the low-level tasks of allocating etc, and Vec adding higher-level abstractions on top of it.
  2. RawVec is not generic over T, so it is better for compilation times.

I wonder if we can preserve this multi-level approach somehow? (not sure how)

I was thinking of changing the RawVec to a kind of a trait, then we can still impl the trait for other higher-level abstractions, but I am not sure, I will try while working on the above tasks

Index Vec?

If we're changing the type of len and capacity, do we want to make that generic rather than a hard-coded u32?

We'd then also have an arena-allocated IndexVec<K, T> type, using the same code.

Or maybe that's unnecessary complication. All our IndexVec keys are NonMaxU32 anyway, so we don't actually need that flexibility.

@Dunqing
Copy link
Member Author

Dunqing commented Mar 18, 2025

It would also be good to remove Bump from the implementation - so we completely divorce Vec from bumpalo.

We could wrap Bump in another structure called e.g. Arena and Vec::new_in etc could take an &'a Arena instead of &'b Bump. resize could become a method on Arena. That would make it easier later on to turn Arena into our own custom allocator, and remove bumpalo entirely.

By the way, the reason why I think we should have separate Allocator and Arena types is that I have in mind further down the line that we may want Allocator to contain multiple arenas. e.g.:

struct Allocator {
/// Stores the AST (same as now)
ast: Arena,
/// Stores Semantic
semantic: Arena,
/// "Scratch" space for temporary data in transformer etc, to avoid heap allocations
temp: Arena,
}
But we can make Allocator deref to Arena, so you can also call Vec::new_in with an &Allocator. So there'd be no API changes for all our code that uses Vec.

How about we split the work - I'll work on the Arena changes, and you work on Vec?

That's a good idea, I am okay to work on Vec changes. I guess this should be started when all tasks are done, right?

@Dunqing
Copy link
Member Author

Dunqing commented Mar 18, 2025

Personally, I would say the best next step would be to update all the methods so they follow std. After that, we can start changing Vec to our desired shape and simplifying it, removing ZSTs support, optimizing etc. But if we start from std's implementation, we know we're starting from a good place.

The only thing I'm aware of which will be really tricky porting from std is the Extend trait. std::vec::Vec has specialized implementations of extend for slice iterators etc, where the number of elements the iterator yields is guaranteed - so it can cut out bounds checks etc. But this requires nightly features (specialization and TrustedLen trait), so unfortunately it's not possible for us.

To get some of those optimizations, we'd need separate methods e.g. extend_from_slice (which std has) and extend_from_array (which it doesn't).

The only thing I think that's missing from your list is removing the #[expect(...)] attrs and fixing the warnings that are currently suppressed by them. But, in my opinion, probably best to do std changes first - hopefully that will resolve most of the things which would cause lint warnings anyway.

Thanks for your suggestions, I've updated the checklist and adjusted the order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area - AST
Projects
None yet
Development

No branches or pull requests

2 participants