-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add array_permutations()
#1013
add array_permutations()
#1013
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there, thanks for you effort.
I fear Permutations
has some quirks that I do not want to entrench in our code. I think the main problem boils down to the fact that it uses many different states instead of always working on indices
. (I see that it may be nice to sometimes avoid constructing it, but it appears to complicate the surrounding code.)
We should first try to clean up Permutations
to the usual pattern where we "operate on indices
and map indices
onto elements", so that we can - afterwards - go with the existing PoolIndex
(instead of a new PermIter
).
Unless this clean-up is infeasible, I want to avoid new abstractions. Thus, I'd like to postpone this PR until we have evidence that Permutations
is as simple as possible.
pub type ArrayPermutations<I, const K: usize> = PermutationsGeneric<I, [usize; K]>; | ||
/// Iterator for const generic permutations returned by | ||
/// [`.array_permutations()`](crate::Itertools::array_permutations) | ||
pub type Permutations<I> = PermutationsGeneric<I, Vec<<I as Iterator>::Item>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the second generic parameter Vec<<I as Iterator>::Item>
seems odd when PermutationsGeneric
has [usize; K]
.
[Array]Combinations
have Vec<usize>
and [usize; K]
, respectively. Unless there's a good reason, we should adopt this pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an error, array_permutations()
was only working on Iterator<Item = usize>
before. Fixed now.
vals: LazyBuffer<I>, | ||
state: PermutationState, | ||
_item: PhantomData<Item>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the decision Vec<usize>
vs [usize; K]
should be propagated onto PermutationState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current algorithm, PermutationState::Loaded.cycles
could have that type, but that field is not directly being used to index in the buffer, so it's a "coincidence".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, using a PoolIndex
for cycles itself is wrong, but we already exploit the fact that the length is k
(https://docs.rs/itertools/latest/src/itertools/permutations.rs.html#112). So it should possibly be PoolIdx<usize>::Item
.
.chain(once(*min_n)) | ||
.map(|i| vals[i].clone()) | ||
.collect(); | ||
let item = Item::extract_start(vals, *k, *min_n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a separate variant Buffered
prevents us from using PoolIndex
. Maybe always constructing indices
might allow PoolIndex
here, simplifying implementation and improving maintenance.
match state { | ||
PermutationState::Start { k: 0 } => { | ||
*state = PermutationState::End; | ||
Some(Vec::new()) | ||
Some(Item::extract_start(vals, 0, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see calling extract_start
with 0 here has the merit to save one function, but it requires us to account for the special case len==0
in extract_start
, right? Would unconditionally working on indices
simplify and get rid of special cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that whatever code goes here needs to generically produce a [T; K]
even though we know it will only ever be called if K == 0
. We could make this a separate function that does whatever arbitrary thing for K > 0
(say buf.get_array([0; K])
) since that code will never get called. The first thing I tried that doesn't work is the following:
impl PoolIndex for [usize; K]{
...
fn empty_item() -> Option<Self::Item> {
if K == 0 {
Some([])
} else {
None
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this attempt won't work. However, we could introduce PoolIndex::from_fn
to do whatever we want and return either a Vec
or an array.
} | ||
|
||
fn extract_from_prefix<I: Iterator<Item = T>>(buf: &LazyBuffer<I>, indices: &[usize]) -> Self { | ||
buf.get_array_from_fn(|i| indices[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this essentially buf.get_array(indices)
? (Would support my theory that PermItem
should actually be part of PoolIndex
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that when this function is called, there does not exist an instance of [usize; K]
to call buf.get_array()
on. We could do something like buf.get_array(indices.try_into().unwrap())
instead if that seems better.
|
||
/// A type that can be picked out from a pool or buffer of items from an inner iterator | ||
/// and in a generic way given their indices. | ||
pub trait PermItem<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this new trait really necessary? Couldn't we re-use/generalize/extend PoolIndex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what I started with but it seems that the operations required for ArrayPermutations
are different from the ones for {Array}Combinations
. Even in the existing Permutations
implementation you can see that there are as many instances of directly indexing the buffer as calling buf.get_at()
. We could create an intermediate slice/array of indices and call versions of these methods moved to the PoolIndex
trait but that seems like an unnecessary level of indirection.
if len == 0 { | ||
Vec::new() | ||
} else { | ||
(0..len - 1) | ||
.chain(once(last)) | ||
.map(|i| buf[i].clone()) | ||
.collect() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Special cases that might go away if we always worked on indices
.
My original intention was to reuse I can think about modifying the algorithm to only keep the |
Thanks for your reply. I experimented a bit (https://github.com/phimuemue/rust-itertools/tree/array_permutations_via_poolindex) and I think we should extend (and probably rename) |
Thanks! Does it make sense to flip trait PoolIndex<T> {
type Item;
fn extract_item(&self, pool) -> Self::Item
} have something like: trait PoolOutput<T, Idx> {
fn extract(pool, idx: &Self::Idx) -> Self;
} Then you can have impl PoolOutput<T, [usize]> for Vec<T> { ... }
impl PoolOutput<T, [usize; K]> for [T; K] { ... } and even struct IndexFn<F>;
impl<F: FnMut(usize) -> usize> PoolOutput<T, IndexFn<F>> for [T; K] {...}
impl<F: FnMut(usize) -> usize> PoolOutput<T, (IndexFn<F>, usize)> for Vec<T> {...} instead of a A big disadvantage would be that |
I can't reliably predict how simple the outcome would be, but my gut feeling is that |
I've tried to make some simplifications at https://github.com/ronnodas/itertools/tree/array-permutations-attempt-2. I do agree that |
Maybe Note if you'd like to convert this into a PR (please excuse this note you did not ask for, but I want to clarify things before huge PRs): a916d37 is hard to review, because it lumps different things into one commit (moving |
@phimuemue Thanks! Broke up into smaller commits and filed as #1014. |
Completes a variant of #1001.