-- To shuffle an array a of N elements (indices 0..N-1): for i from 0 to N−2 do j ← random integer such that i ≤ j < N exchange a[i] and a[j]
One needs an array a of data to shuffle. But what if the range is the sequence 0, 1, 2, ... N -1 ?
In that case the i:th element a[i] is i, before being touched by the shuffling. Perhaps we could avoid some work if one uses a sparse representation of a?
Only the elements where a[i]!=i need to be stored, the rest are implied.
Another observation is that the i:th step, older elements (those at index [0, i) are not used, so they could be dropped to store space if not needed.
This means it is possible to lazily generate a shuffled range!
I use a std::map for the sparse storage. Here is a ascii art visualization of the algorithm in action. Each row is after step i and N=10, a * means that element is not stored in a.
i = 0 j = 3 a[j](before) = 3 ***0****** size = 1 i = 1 j = 5 a[j](before) = 5 _**0*1**** size = 2 i = 2 j = 8 a[j](before) = 8 __*0*1**2* size = 3 i = 3 j = 5 a[j](before) = 1 ___0*0**2* size = 3 i = 4 j = 6 a[j](before) = 6 ____*04*2* size = 3 i = 5 j = 9 a[j](before) = 9 _____04*20 size = 4 i = 6 j = 8 a[j](before) = 2 ______4*40 size = 3 i = 7 j = 9 a[j](before) = 0 _______*47 size = 2 i = 8 j = 9 a[j](before) = 7 ________44 size = 2 last is 4
The last column is the number of elements stored in the sparse array a.
The amount of elements to store is small in the beginning and the end. I averaged out a number of cases and experimentally found that the amount of elements needed to be stored at step i out of N is:
i*(N-i)/(2N)
which has a maximum N/4 at i=N/2. That means on average, four times less storage (excluding the logarithmic overhead of std::map) need to be stored.
The worst case is min(i,N-i) and the best case is 0.
Perhaps more important, the work is done during iteration, instead of doing all work up front.
I whipped this up into two generic functions. The first one executes a callback on each shuffled number a[j], and the other one iterates through an iterator range in random order.
This is how to use it:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
int | |
main() | |
{ | |
std::random_device r; | |
std::minstd_rand rnd(r()); | |
std::cout << "Let's randomly print 0 through 4:\n"; | |
lazy_fisher_yates(5, rnd, [](auto i) { std::cout << i << '\n'; }); | |
std::cout << "Let's randomly select some words:\n"; | |
auto text = { "fish", "cow", "bird", "amoeba", "seal" }; | |
lazy_fisher_yates( | |
begin(text), end(text), rnd, [](auto it) { std::cout << *it << '\n'; }); | |
} |
Let's randomly print 0 through 4: 1 2 0 3 4 Let's randomly select some words: seal amoeba bird fish cow
I implemented this for fun, if you want to do this more efficiently you can use a block cipher instead which requires constant memory and no work up front. I just stumbled across this while doing research for a talk I am giving about using a block cipher for this purpose.
This is the full implementation:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
* | |
* Lazy Fisher-Yates iteration | |
* | |
* This steps through a range randomly, using the idea from the Fisher Yates | |
* shuffle. It does not shuffle the range and then iterate, instead it | |
* stores a sparse representation of the index range. | |
* | |
* The amount of memory reguired increases the closer to the middle one is. | |
* Experimentally, it seems like the expected amount of memory at step i | |
* out of N is i*(N-i)/2N (so at most N/4 when i=N/2). | |
* The underlying store is however a std::map, so expect N log N memory need. | |
* | |
* By Paul Dreik 2019 https://www.pauldreik.se/ | |
* License: Boost 1.0 | |
* SPDX-License-Identifier: BSL-1.0 | |
*/ | |
#pragma once | |
#include <cassert> | |
#include <map> | |
#include <random> | |
#include <utility> | |
// invokes cb with a each integer in [0,N) in random order | |
template<typename Integer, typename URBG, typename Callback> | |
void | |
lazy_fisher_yates(Integer N, URBG&& rng, Callback cb) | |
{ | |
if (N == 0) { | |
return; | |
} | |
std::map<Integer, Integer> a; | |
for (Integer i = 0; i < N - 1; ++i) { | |
std::uniform_int_distribution<Integer> dist(i, N - 1); | |
Integer j = dist(rng); | |
// find a[i] | |
Integer a_i; | |
if (!a.empty() && a.begin()->first == i) { | |
a_i = a.begin()->second; | |
} else { | |
a_i = i; | |
} | |
// find a[j] | |
Integer a_j; | |
auto j_it = a.lower_bound(j); | |
if (j_it != a.end() && j_it->first == j) { | |
a_j = j_it->second; | |
j_it->second = a_i; | |
} else { | |
a_j = j; | |
a.insert(j_it, { j, a_i }); | |
} | |
// drop the first entry to save space if possible | |
if (a.begin()->first <= i) { | |
a.erase(a.begin()); | |
} | |
cb(a_j); | |
} | |
assert(a.size() <= 1); | |
// the last iteration | |
auto a_j = a.empty() ? N - 1 : a.rbegin()->second; | |
cb(a_j); | |
} | |
// invokes cb on each iterator in [first,last) in random order | |
template<typename RandomAccessIterator, typename URBG, typename Callback> | |
void | |
lazy_fisher_yates(RandomAccessIterator first, | |
RandomAccessIterator last, | |
URBG&& rng, | |
Callback cb) | |
{ | |
auto N = std::distance(first, last); | |
lazy_fisher_yates( | |
N, std::forward<URBG>(rng), [&cb, &first](auto i) { cb(first + i); }); | |
} |
Inga kommentarer:
Skicka en kommentar