söndag 29 december 2019

Iterating in random order with lazy Fisher Yates

While doing research for a talk, I stumbled across the Fisher-Yates shuffle. It is used for shuffling a range of data:

-- To shuffle an array a of N elements (indices 0..N-1):
for i from 0 to N−2 do
     j ← random integer such that ij < N
     exchange a[i] and a[j]

One needs an array a of data to shuffle. But what if the range is the sequence 0, 1, 2, ... N -1   ?
In that case the i:th element a[i] is i, before being touched by the shuffling. Perhaps we could avoid some work if one uses a sparse representation of a?
Only the elements where a[i]!=i need to be stored, the rest are implied.

Another observation is that the i:th step, older elements (those at index [0, i) are not used, so they could be dropped to store space if not needed.

This means it is possible to lazily generate a shuffled range!

I use a std::map for the sparse storage. Here is a ascii art visualization of the algorithm in action. Each row is after step i and N=10, a * means that element is not stored in a.

i = 0 j = 3 a[j](before) = 3 ***0******   size = 1
i = 1 j = 5 a[j](before) = 5 _**0*1****   size = 2
i = 2 j = 8 a[j](before) = 8 __*0*1**2*   size = 3
i = 3 j = 5 a[j](before) = 1 ___0*0**2*   size = 3
i = 4 j = 6 a[j](before) = 6 ____*04*2*   size = 3
i = 5 j = 9 a[j](before) = 9 _____04*20   size = 4
i = 6 j = 8 a[j](before) = 2 ______4*40   size = 3
i = 7 j = 9 a[j](before) = 0 _______*47   size = 2
i = 8 j = 9 a[j](before) = 7 ________44   size = 2
last is 4

The last column is the number of elements stored in the sparse array a.

The amount of elements to store is small in the beginning and the end. I averaged out a number of cases and experimentally found that the amount of elements needed to be stored at step i out of N is:


which has a maximum N/4 at i=N/2. That means on average, four times less storage (excluding the logarithmic overhead of std::map) need to be stored.
The worst case is min(i,N-i) and the best case is 0.

Perhaps more important, the work is done during iteration, instead of doing all work up front.

I whipped this up into two generic functions. The first one executes a callback on each shuffled number a[j], and the other one iterates through an iterator range in random order.

This is how to use it:

which outputs
Let's randomly print 0 through 4:
Let's randomly select some words:

I implemented this for fun, if you want to do this more efficiently you can use a block cipher instead which requires constant memory and no work up front. I just stumbled across this while doing research for a talk I am giving about using a block  cipher for this purpose.

This is the full implementation:

söndag 30 juni 2019

Fuzzing {fmt}

Earlier this year, I was mowing the lawn and listening to cppcast. Both activities are weekly, a fortunate coincidence? It was Victor Zverovich being interviewed, about {fmt}, a formatting library in the process of being standardized

I decided to see if fuzzing would catch anything, and the first bug was very fast to find. It was fixed almost instantly, and I found more errors. This followed the experience I have with fuzzing other projects - bugs are often found during the first minutes, or never.

I asked Victor if he would be interested in getting fuzzing into fmt, and further onto oss-fuzz. He was positive to both, and I started with my plan:
  1. get basic fuzzing running locally
  2. smoke out the initial bugs
  3. get the initial fuzzers onto oss fuzz
  4. get the fuzzers merged
  5. repoint oss fuzz to the upstream repo
This plan is nice, because it minimizes coordination. I don't have to worry about getting the fuzzers up to merge quality, there's no waiting for pull requests to be approved. Any issues are reported to me so I can fix them without bothering anyone. Eventually, when the fuzzers are mature, they can be polished enough to be accepted as a part of the upstream repo. I was open with my plan to Victor, and I think it worked very well.

An unexpected hurdle - std::chrono::duration_cast

Fmt is able to format std::chrono::durations. This turned out to be quite difficult to do correctly. I learned during this process that there are no guarantees on std::chrono::duration_cast in case there is for instance signed integral overflow, or other UB, in the internals. The fuzzer revealed a few cases of this, and there were several places in the {fmt} internals where this would creep in. I discuessed this with Victor, and we both agreed that {fmt} should either give the correct answer, or signal an error. Never UB or the wrong answer. Eventually I made a separate library for providing a UB free duration_cast. I used fuzzing to find the corner cases, and a tip from my C++ user group made me use cfenv for the first time, so I could catch the interesting cases and handle them.
This detour took a while, but eventually a condensed version ended up in {fmt}. I found it very interesting, that the performance overhead of this is zero! I guess the compiler and CPU team up to cover the extra checks in parallel.

With this in place, I could finally enable the chrono fuzzer on oss-fuzz.

Summing up

Overall this has been very fun and rewarding. Victor has been very responsive and helpful. I will definitely follow up with more fuzzing work! For instance there seems to be a parser on it's way in. Also, I think it is wise to follow the oss-fuzz guidelines of running through the corpus as a part of continuous integration.
The good thing with oss fuzz is that it will keep running, without me or anyone else paying attention. I think that is the greatest benefit, rather than the cpu time. All in all I found seven bugs, and contributed fixes to get chrono formatting UB free. 10/10 would do it again!

söndag 7 januari 2018

Performance degradation after meltdown mitigation

If you haven't been living under a rock the last week, you already heard of the meltdown attack. I was worried about how slow my computer would be after applying the mitigations. So I measured the most important and common task for me - compiling c++ code.
My test system is an aging laptop, with an Intel Core i5-2410M CPU. It runs Debian Stretch with a fast SATA ssd with an ext4 filesystem on it.
I timed compilation of a c++14 source tree with roughly 60000 lines of code. This project is very relevant to me and is a mix of c with classes with more modern code.

Compilation from scratch

The exact commandline is
rm -rf * && ccache --clear --clean && cmake .. -GNinja && time ninja
When I run it three times in a row I get
real    4m14,130s
sys    0m49,784s
real    3m14,822s
sys    0m49,948s
real    3m21,067s
sys    0m49,760s
with the original system (without the mitigation, kernel 4.9.65-3+deb9u1).
The patched system (kernel 4.9.65-3+deb9u2) gives:
real    3m12,479s
sys    0m54,496s
real    3m17,868s
sys    0m54,756s

The total time (real) is smaller than the repeatability. The system time (time spent in the kernel) increased with approx. 5%.

Compilation with hot cache

This time I did not clear the ccache. The command line is
rm -rf * && cmake -GNinja .. && time ninja
With the original system, I get for three consecutive runs
real    0m13,342s
real    0m12,867s
real    0m12,700s

With the patched system, I get:

real    0m13,645s
real    0m13,637s
real    0m13,512s

Now when most of the work consists of getting cached results instead of compiling, a slowdown on the total time is noticed. It seems to be approximately 0.8 seconds which is 6% slower. I imagine this is the worst case.


This was far better than I feared. Thank you all hard working developers.

lördag 4 mars 2017

Working with user group feedback

I am a co organizer for Sweden C++, a user group in Stockholm. It has been a great experience, getting to know other interested C++ users.
I thought I would say a few words of how we work with feedback.

Feedback is essential. I have a background in automatic control, so I guess I have the authority to say that! Joke aside, feedback lets us improve and make better and better meetings. I attended cppcon 2015, and there was a daily feedback form, partially so urgent issues could be resolved already during the conference. Very professional!

After a major meeting in our user group, we send a request via email to answer a questionnaire at a third party provider.
  • The questions can be answered anonymously, so participants can feel free to express also negative feedback if necessary.
  • The questions are few, to increase the answer rate.
  • Allow for individual feedback to the talkers (I stole this idea from cppcon!)
  • Offer the speakers access to their feedback, not everyone wants it.
  • Allow for general feedback to the organizers
The answer rate has been quite low, but there is still much to learn from the answers. I figure we would get more feedback if people are upset or they have something particularly important to say. We know we have a good balance between length and number of talks etc. - most people are happy, a few want more or less.

One thing we learned from our first meeting was that almost everyone wants the talks to be held in English, or do not care if it is in Swedish or English. Our second big meeting had all the talks and discussion in English. Great, since this both means we can speak properly (talking about programming in Swedish is a bit awkward), all attendees can follow and the youtube videos we record and distribute have a much wider audience.

After the meeting, the organizer group meet over lunch (we do this regularly even when there is no meeting) and discuss the questionnaire, what went good and what we should change to next time. This discussion is great, because it brings up where we want to go and which goals or non-goals we have. I love these lunches, they often take more than two hours. Perhaps because the discussions often float away from the meeting and C++, the organizer group gets along well personally.

Getting involved in this user group was a great thing. Thank you Jens Weller for connecting us in the first place!

fredag 2 september 2016

Uppdatera bios gigabyte brix GB-BSi3-6100 i Ubuntu

Jag har nyss uppdaterat bios för Gigabyte Brix genom att använda unetbootin i Ubuntu 16.04.
Jag följde delvis denna guide:  http://askubuntu.com/a/458915

I korthet:
  • ta en usbsticka, se till att den är FAT32formaterad. markera den som bootbar i gnome-disks
  • ladda ner zipfilen från gigabytes hemsida: http://www.gigabyte.com/products/product-page.aspx?pid=5694&kw=GB-BSi3-6100#dl
  • packa upp zipfilen i roten på usbstickan.
  • installera unetbootin, kör unetbootin och välj freedos som OS och usbsticka som mål.
  • stoppa i usbstickan i brixen, tryck F12 under boot, välj usbstickan
  • freedos startar nu, jag tog nedersta alternativet "freedos live cd only"
  • när freedos startat, skriv c: för att växla enhet
  • skriv flash.bat för att starta flashen
  • flashen tog ett par minuter, sedan startade datorn om och det funkade som det skulle
Stort tack till författarna av Freedos och unetbootin!

lördag 5 december 2015

Using Lets Encrypt without root access

Lets encrypt issues TLS certificates for free and are recognized by almost every browser. Great!

Generation of the certificates however requires root access. I am not fond of running third party code as root, especially when it wants to touch configuration. Eventually, when the client is not beta anymore and ends up in mainstream Debian that will be another story. For now, this is how I managed to get certificates without executing the letsencrypt client with root priviliges on any of my trusted systems. I use a throwaway virtual machine instead, which executes the letsencrypt client during setup and then can be turned off.

My setup is as follows - I will call my domain www.example.com to make it easier to read. I have a pretty standard setup, with an internal web server answering requests, forwarded from my router. Connecting to external TCP port 443 ends up in an internal host on my network, running apache. Lets call that host webserver.local. I will also use another maching, workstation.local which is my normal desktop machine. It is not important where the machines are, as long as they can reach each other.

To shield myself, I use a virtual machine on my workstation, which will execute the letsencrypt client commands.

  1. disable anything listening to 443 on webserver.local temporarily. for me, service apache stop does the trick.
  2. forward port 443 to the virtual machine. On webserver.local, execute  socat TCP4-LISTEN:443,fork,reuseaddr TCP:workstation.local:4444
  3.  on workstation.local, forward port 4444 to the virtual machine port 443. I use virtualbox, which has a GUI under the network panel.
  4. inside the virtual machine, check out the letsencrypt client.
    you can do that as a normal user if you want.
  5. inside the virtual machine, execute the lets encrypt command as root:
    cd /path/to/letsencrypt/checkedout/repo
    ./letsencrypt-auto certonly --standalone --standalone-supported-challenges tls-sni-01 -d www.example.com -d alternatename.example.com
  6. if that went well, you get a congratulation message from the letsencrypt tool. transfer the /etc/letsencrypt directory, with rights preserved, to /etc/letsencrypt on webserver.local. I used tar, which preserves the file privileges which is important for local security.
  7. on webserver.local, configure the server to use the certs in /etc/letsencrypt/
    I use apache, here is an example configuration if you want to see the local paths to use for letsencrypt.
  8. disable the temporary port forward by socat
  9. enable the web server on webserver.local again, for me using apache that is service apache start
  10. verify that it works by opening www.example.com in your browser and checking the certificate information.

torsdag 10 september 2015