söndag 29 december 2019

Iterating in random order with lazy Fisher Yates

While doing research for a talk, I stumbled across the Fisher-Yates shuffle. It is used for shuffling a range of data:


-- To shuffle an array a of N elements (indices 0..N-1):
for i from 0 to N−2 do
     j ← random integer such that ij < N
     exchange a[i] and a[j]


One needs an array a of data to shuffle. But what if the range is the sequence 0, 1, 2, ... N -1   ?
In that case the i:th element a[i] is i, before being touched by the shuffling. Perhaps we could avoid some work if one uses a sparse representation of a?
Only the elements where a[i]!=i need to be stored, the rest are implied.

Another observation is that the i:th step, older elements (those at index [0, i) are not used, so they could be dropped to store space if not needed.


This means it is possible to lazily generate a shuffled range!

I use a std::map for the sparse storage. Here is a ascii art visualization of the algorithm in action. Each row is after step i and N=10, a * means that element is not stored in a.

i = 0 j = 3 a[j](before) = 3 ***0******   size = 1
i = 1 j = 5 a[j](before) = 5 _**0*1****   size = 2
i = 2 j = 8 a[j](before) = 8 __*0*1**2*   size = 3
i = 3 j = 5 a[j](before) = 1 ___0*0**2*   size = 3
i = 4 j = 6 a[j](before) = 6 ____*04*2*   size = 3
i = 5 j = 9 a[j](before) = 9 _____04*20   size = 4
i = 6 j = 8 a[j](before) = 2 ______4*40   size = 3
i = 7 j = 9 a[j](before) = 0 _______*47   size = 2
i = 8 j = 9 a[j](before) = 7 ________44   size = 2
last is 4

The last column is the number of elements stored in the sparse array a.

The amount of elements to store is small in the beginning and the end. I averaged out a number of cases and experimentally found that the amount of elements needed to be stored at step i out of N is:

i*(N-i)/(2N)

which has a maximum N/4 at i=N/2. That means on average, four times less storage (excluding the logarithmic overhead of std::map) need to be stored.
The worst case is min(i,N-i) and the best case is 0.

Perhaps more important, the work is done during iteration, instead of doing all work up front.

I whipped this up into two generic functions. The first one executes a callback on each shuffled number a[j], and the other one iterates through an iterator range in random order.



This is how to use it:

which outputs
Let's randomly print 0 through 4:
1
2
0
3
4
Let's randomly select some words:
seal
amoeba
bird
fish
cow

I implemented this for fun, if you want to do this more efficiently you can use a block cipher instead which requires constant memory and no work up front. I just stumbled across this while doing research for a talk I am giving about using a block  cipher for this purpose.

This is the full implementation:


söndag 30 juni 2019

Fuzzing {fmt}

Earlier this year, I was mowing the lawn and listening to cppcast. Both activities are weekly, a fortunate coincidence? It was Victor Zverovich being interviewed, about {fmt}, a formatting library in the process of being standardized

I decided to see if fuzzing would catch anything, and the first bug was very fast to find. It was fixed almost instantly, and I found more errors. This followed the experience I have with fuzzing other projects - bugs are often found during the first minutes, or never.


I asked Victor if he would be interested in getting fuzzing into fmt, and further onto oss-fuzz. He was positive to both, and I started with my plan:
  1. get basic fuzzing running locally
  2. smoke out the initial bugs
  3. get the initial fuzzers onto oss fuzz
  4. get the fuzzers merged
  5. repoint oss fuzz to the upstream repo
This plan is nice, because it minimizes coordination. I don't have to worry about getting the fuzzers up to merge quality, there's no waiting for pull requests to be approved. Any issues are reported to me so I can fix them without bothering anyone. Eventually, when the fuzzers are mature, they can be polished enough to be accepted as a part of the upstream repo. I was open with my plan to Victor, and I think it worked very well.

An unexpected hurdle - std::chrono::duration_cast

Fmt is able to format std::chrono::durations. This turned out to be quite difficult to do correctly. I learned during this process that there are no guarantees on std::chrono::duration_cast in case there is for instance signed integral overflow, or other UB, in the internals. The fuzzer revealed a few cases of this, and there were several places in the {fmt} internals where this would creep in. I discuessed this with Victor, and we both agreed that {fmt} should either give the correct answer, or signal an error. Never UB or the wrong answer. Eventually I made a separate library for providing a UB free duration_cast. I used fuzzing to find the corner cases, and a tip from my C++ user group made me use cfenv for the first time, so I could catch the interesting cases and handle them.
This detour took a while, but eventually a condensed version ended up in {fmt}. I found it very interesting, that the performance overhead of this is zero! I guess the compiler and CPU team up to cover the extra checks in parallel.

With this in place, I could finally enable the chrono fuzzer on oss-fuzz.

Summing up

Overall this has been very fun and rewarding. Victor has been very responsive and helpful. I will definitely follow up with more fuzzing work! For instance there seems to be a parser on it's way in. Also, I think it is wise to follow the oss-fuzz guidelines of running through the corpus as a part of continuous integration.
The good thing with oss fuzz is that it will keep running, without me or anyone else paying attention. I think that is the greatest benefit, rather than the cpu time. All in all I found seven bugs, and contributed fixes to get chrono formatting UB free. 10/10 would do it again!