Skip to content

Add 'indirect_sort' #117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/algorithm.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,8 @@ Convert a sequence of hexadecimal characters into a sequence of integers or char
Convert a sequence of integral types into a lower case hexadecimal sequence of characters
[endsect:hex_lower]

[include indirect_sort.qbk]

[include is_palindrome.qbk]

[include is_partitioned_until.qbk]
Expand Down
71 changes: 71 additions & 0 deletions doc/indirect_sort.qbk
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
[/ File indirect_sort.qbk]

[section:indirect_sort indirect_sort ]

[/license
Copyright (c) 2023 Marshall Clow

Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
]

There are times that you want a sorted version of a sequence, but for some reason or another, you don't really want to sort them. Maybe the elements in the sequence are non-copyable (or non-movable), or the sequence is const, or they're just really expensive to move around. An example of this might be a sequence of records from a database.

Nevertheless, you might want to sort them. That's where indirect sorting comes in. In a "normal" sort, the elements of the sequence to be sorted are shuffled in place. In indirect sorting, the elements are unchanged, but the sort algorithm returns to you a "permutation" of the elements that, when applied, will leave the elements in the sequence in a sorted order.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this a bit shorter wording especially avoiding to mention the need to sort twice:

Suggested change
There are times that you want a sorted version of a sequence, but for some reason or another, you don't really want to sort them. Maybe the elements in the sequence are non-copyable (or non-movable), or the sequence is const, or they're just really expensive to move around. An example of this might be a sequence of records from a database.
Nevertheless, you might want to sort them. That's where indirect sorting comes in. In a "normal" sort, the elements of the sequence to be sorted are shuffled in place. In indirect sorting, the elements are unchanged, but the sort algorithm returns to you a "permutation" of the elements that, when applied, will leave the elements in the sequence in a sorted order.
There are times that you want a sorted version of a sequence, but for some reason you don't want to modify it. Maybe the elements in the sequence can't be moved/copied, e.g. the sequence is const, or they're just really expensive to move around. An example of this might be a sequence of records from a database.
That's where indirect sorting comes in. In a "normal" sort, the elements of the sequence to be sorted are shuffled in place. In indirect sorting, the elements are unchanged, but the sort algorithm returns a "permutation" of the elements that, when applied, will put the elements in the sequence in a sorted order.

Are the double-spaces after each sentence intended?


Say you have a sequence `[first, last)` of 1000 items that are expensive to swap:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Say you have a sequence `[first, last)` of 1000 items that are expensive to swap:
Assume a sequence `[first, last)` of 1000 items that are expensive to swap:

```
std::sort(first, last); // ['O(N ln N)] comparisons and ['O(N ln N)] swaps (of the element type).
```

On the other hand, using indirect sorting:
```
auto permutation = boost::algorithm::indirect_sort(first, last); // ['O(N lg N)] comparisons and ['O(N lg N)] swaps (of size_t).
boost::algorithm::apply_permutation(first, last, perm.begin(), perm.end()); // ['O(N)] swaps (of the element type)
```

If the element type is sufficiently expensive to swap, then 10,000 swaps of size_t + 1000 swaps of the element_type could be cheaper than 10,000 swaps of the element_type.

Or maybe you don't need the elements to actually be sorted - you just want to traverse them in a sorted order:
```
auto permutation = boost::algorithm::indirect_sort(first, last);
for (size_t idx: permutation)
std::cout << first[idx] << std::endl;
```


More to come here ....

[heading interface]

The function `indirect_sort` a `vector<size_t>` containing the permutation necessary to put the input sequence into a sorted order. One version uses `std::less` to do the comparisons; the other lets the caller pass predicate to do the comparisons.

```
template <typename RAIterator>
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last);

template <typename RAIterator, typename BinaryPredicate>
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last, BinaryPredicate pred);
```

[heading Examples]

[heading Iterator Requirements]

`indirect_sort` requires random-access iterators.

[heading Complexity]

Both of the variants of `indirect_sort` run in ['O(N lg N)] time; they are not more (or less) efficient than `std::sort`. There is an extra layer of indirection on each comparison, but all off the swaps are done on values of type `size_t`

[heading Exception Safety]

[heading Notes]

[endsect]

[/ File indirect_sort.qbk
Copyright 2023 Marshall Clow
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt).
]
83 changes: 83 additions & 0 deletions include/boost/algorithm/indirect_sort.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
/*
Copyright (c) Marshall Clow 2023.

Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

*/

/// \file indirect_sort.hpp
/// \brief indirect sorting algorithms
/// \author Marshall Clow
///

#ifndef BOOST_ALGORITHM_IS_INDIRECT_SORT
#define BOOST_ALGORITHM_IS_INDIRECT_SORT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unusual include guard. Why not BOOST_ALGORITHM_INDIRECT_SORT?


#include <algorithm> // for std::sort (and others)
#include <functional> // for std::less
#include <vector> // for std:;vector

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo:

Suggested change
#include <vector> // for std:;vector
#include <vector> // for std::vector

But is that comment really required?


#include <boost/algorithm/cxx11/iota.hpp>

namespace boost { namespace algorithm {

namespace detail {

template <class Predicate, class Iter>
struct indirect_predicate {
indirect_predicate (Predicate pred, Iter iter)
: pred_(pred), iter_(iter) {}

bool operator ()(size_t a, size_t b) const {
return pred_(iter_[a], iter_[b]);
}

Predicate pred_;
Iter iter_;
};

}

typedef std::vector<size_t> Permutation;

// ===== sort =====

/// \fn indirect_sort (RAIterator first, RAIterator last, Predicate p)
/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is sorted according to the predicate pred.
///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
/// \param pred The predicate to compare elements with
///
template <typename RAIterator, typename Pred>
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last, Pred pred) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last, Pred pred) {
Permutation indirect_sort (RAIterator first, RAIterator last, Pred pred) {

Permutation ret(std::distance(first, last));
boost::algorithm::iota(ret.begin(), ret.end(), size_t(0));
std::sort(ret.begin(), ret.end(),
detail::indirect_predicate<Pred, RAIterator>(pred, first));
return ret;
}

/// \fn indirect_sort (RAIterator first, RAIterator las )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// \fn indirect_sort (RAIterator first, RAIterator las )
/// \fn indirect_sort (RAIterator first, RAIterator last)

/// \returns a permutation of the elements in the range [first, last)
/// such that when the permutation is applied to the sequence,
/// the result is sorted according to the predicate pred.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the result is sorted according to the predicate pred.
/// the result is sorted in non-descending order.

///
/// \param first The start of the input sequence
/// \param last The end of the input sequence
///
template <typename RAIterator>
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<size_t> indirect_sort (RAIterator first, RAIterator last) {
Permutation indirect_sort (RAIterator first, RAIterator last) {

return indirect_sort(first, last,
std::less<typename std::iterator_traits<RAIterator>::value_type>());
}

// ===== stable_sort =====
// ===== partial_sort =====
// ===== nth_element =====
}}

#endif // BOOST_ALGORITHM_IS_INDIRECT_SORT
4 changes: 4 additions & 0 deletions test/Jamfile.v2
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ alias unit_test_framework

# Apply_permutation tests
[ run apply_permutation_test.cpp unit_test_framework : : : : apply_permutation_test ]

# Indirect_sort tests
[ run indirect_sort_test.cpp unit_test_framework : : : : indirect_sort_test ]

# Find tests
[ run find_not_test.cpp unit_test_framework : : : : find_not_test ]
[ run find_backward_test.cpp unit_test_framework : : : : find_backward_test ]
Expand Down
100 changes: 100 additions & 0 deletions test/indirect_sort_test.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
/*
Copyright (c) Marshall Clow 2011-2012.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Copyright (c) Marshall Clow 2011-2012.
Copyright (c) Marshall Clow 2023.


Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

For more information, see http://www.boost.org
*/

#include <boost/config.hpp>
#include <boost/algorithm/indirect_sort.hpp>
#include <boost/algorithm/apply_permutation.hpp>
#include <boost/algorithm/cxx11/is_sorted.hpp>

#define BOOST_TEST_MAIN
#include <boost/test/unit_test.hpp>

#include <iostream>
#include <string>
#include <vector>
#include <list>

typedef std::vector<size_t> Permutation;

// A permutation of size N is a sequence of values in the range [0..N)
// such that no value appears more than once in the permutation.
bool isa_permutation(Permutation p, size_t N) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool isa_permutation(Permutation p, size_t N) {
bool is_a_permutation(Permutation p, size_t N) {

is more readable.

if (p.size() != N) return false;

// Sort the permutation, and ensure that each value appears exactly once.
std::sort(p.begin(), p.end());
for (size_t i = 0; i < N; ++i)
if (p[i] != i) return false;
return true;
}

template <typename Iter,
typename Comp = typename std::less<typename std::iterator_traits<Iter>::value_type> >
struct indirect_comp {
indirect_comp (Iter it, Comp c = Comp())
: iter_(it), comp_(c) {}

bool operator ()(size_t a, size_t b) const { return comp_(iter_[a], iter_[b]);}

Iter iter_;
Comp comp_;
};

template <typename Iter>
void test_one_sort(Iter first, Iter last) {
Permutation perm = boost::algorithm::indirect_sort(first, last);
BOOST_CHECK (isa_permutation(perm, std::distance(first, last)));
BOOST_CHECK (boost::algorithm::is_sorted(perm.begin(), perm.end(), indirect_comp<Iter>(first)));

// Make a copy of the data, apply the permutation, and ensure that it is sorted.
std::vector<typename std::iterator_traits<Iter>::value_type> v(first, last);
boost::algorithm::apply_permutation(v.begin(), v.end(), perm.begin(), perm.end());
BOOST_CHECK (boost::algorithm::is_sorted(v.begin(), v.end()));
}

template <typename Iter, typename Comp>
void test_one_sort(Iter first, Iter last, Comp comp) {
Permutation perm = boost::algorithm::indirect_sort(first, last, comp);
BOOST_CHECK (isa_permutation(perm, std::distance(first, last)));
BOOST_CHECK (boost::algorithm::is_sorted(perm.begin(), perm.end(),
indirect_comp<Iter, Comp>(first, comp)));

// Make a copy of the data, apply the permutation, and ensure that it is sorted.
std::vector<typename std::iterator_traits<Iter>::value_type> v(first, last);
boost::algorithm::apply_permutation(v.begin(), v.end(), perm.begin(), perm.end());
BOOST_CHECK (boost::algorithm::is_sorted(v.begin(), v.end(), comp));
}


void test_sort () {
BOOST_CXX14_CONSTEXPR int num[] = { 1,3,5,7,9, 2, 4, 6, 8, 10 };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BOOST_CXX14_CONSTEXPR int num[] = { 1,3,5,7,9, 2, 4, 6, 8, 10 };
int num[] = { 1,3,5,7,9, 2, 4, 6, 8, 10 };

or int *first = &num[0]; is invalid isn't it?

const int sz = sizeof (num)/sizeof(num[0]);
int *first = &num[0];
int const *cFirst = &num[0];

// Test subsets
for (size_t i = 0; i <= sz; ++i) {
test_one_sort(first, first + i);
test_one_sort(first, first + i, std::greater<int>());

// test with constant inputs
test_one_sort(cFirst, cFirst + i);
test_one_sort(cFirst, cFirst + i, std::greater<int>());
}

// make sure we work with iterators as well as pointers
std::vector<int> v(first, first + sz);
test_one_sort(v.begin(), v.end());
test_one_sort(v.begin(), v.end(), std::greater<int>());
}

BOOST_AUTO_TEST_CASE( test_main )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why that extra method and not using BOOST_AUTO_TEST_CASE(test_sort) directly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I expect there to be more test cases in the future.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the whole idea of BOOST_AUTO_TEST_CASE is that you simply "decorate" each test case with that and NOT have a "main" function. By default it will run each such function sequentially even allowing you to filter test based on their name from the CLI.

-->

BOOST_AUTO_TEST_CASE( test_sort ){
...
}

BOOST_AUTO_TEST_CASE( test_indirect_stable_sort ){
...
}

{
test_sort ();
}