## Counting Bits

24 Aug 2019

What's the best way of counting the number of 1-bits in an integer?

On modern hardware, have your CPU do it for you. On Intel CPUs, the `popcnt`

instruction is what you want. I learned reading *Hacker's Delight* that some architectures
called counting the 1-bits *population count*, which presumably informed the name of
the Intel instruction.

Let's try to count the number of 1-bits in a 64-bit integer on an Intel CPU.

I'm trying to keep this as simple as possible (for me; I'm not an assembly guru), so I'll purposefully keep the number of 1-bits low, so that I can just return the count as the exit value from my little assembler program.

The following went into a file named `popcnt.s`

:

.section .data .globl _start _start: # I'll put my test value in one of the new-ish 64-bit registers. # I would have preferred to just us a literal arg to popcnt, but # apparently the instruction does not support that. Also, # thanks, GNU Assembler, for having binary literals! movq $0b1000000010000000100000001000000010000000100000001000000010000000, %r11 # Count the number of 1-bits in %r11 and put the answer in %rdi popcntq %r11, %rdi # Move the number 60 to %rax in preparation for a syscall. # Syscall number 60 is exit. The exit syscall requires the exit number # to be in %rdi, which we have done, above. movq $60, %rax syscall

Build the little assembly program...

$ as --gstabs popcnt.s -o popcnt.o $ ld popcnt.o -o popcnt

Run the little assembly program and ask for the exit number:

$ ./popcnt $ echo $? 8

And that's how we use `popcnt`

.

In Go 1.13rc1, we can also use a binary literal (thanks, Go Team!)
and we can use `bits.OnesCount64()`

to count the number of
1-bits:

package main import ( "fmt" "math/bits" ) func main() { var n uint64 = 0b10000000_10000000_10000000_10000000_10000000_10000000_10000000_10000000 fmt.Printf("OnesCount64(%064b) = %d\n", n, bits.OnesCount64(n)) }

$ go1.13rc1 build $ ./popcnt OnesCount64(1000000010000000100000001000000010000000100000001000000010000000) = 8

Let's ask Go to show us the assembly code for the program.

$ go1.13rc1 tool compile -S main.go > main.S

When we look in `main.S`

, we do indeed see the `popcnt`

instruction:

0x0049 00073 (main.go:10) POPCNTQ AX, CX

Cool!

Because of CPU instruction support for counting 1-bits, the rest of this discussion
is moot. But the book *Hacker's Delight* really does show a neat way of counting the number
of 1-bits in an integer. There was a clearer example on Stack Overflow that really
made this click for me.

Let's say I have the byte 10110110, and I want to count the number of 1-bits in it.

The approach explained in *Hacker's Delight* is to sum the number of 1-bits
in each pair of bits, and then sum the number of 1-bits in each quad of bits, and
then add both quads together. It's a basic divide-and-conquer algorithm, but seeing
it work in practice makes it way clearer.

As I said, the first step is to count the number of 1-bits in each pair of bits. For our example number, 10110110, that means we want to count the number of bits in each of these pairs:

10 11 01 10

The interesting thing is that we can store the sum of each pair using only
two bits! After all, each pair can have only 0 or 1 or 2 bits, and 0, 1, and 2
in binary are 00, 01, and 10 respectively. So not only can we sum the 1-bits
for each pair, we can overwrite each pair with the sum! *And* we can
sum all 4 pairs in parallel.

First, we need to arrange the bits in those 4 pairs so that we can add them together. Essentailly what we want to do is sum all of the odd bits to all of the event bits. If we find a way to arrange the bits of this number

10 11 01 10

like so,

0 1 1 0 1 1 0 1

then we could just sum them together like so:

0 1 1 0 1 1 0 1 ---------- 01 10 01 01

And if we could only arrange this result like so:

10 01 01 01

then we could sum those together like so:

10 01 01 01 ----- 11 10

And if we could only arrange that result like so:

10 11

then we could sum to get our final answer like so:

10 11 ---- 101which is 5 in decimal, and the number of 1-bits in the number we started with!

Happily, it turns out there is a way to do this. If we want to take our starting value, 10110110, and sum each pair of bits, we can do it like so.

First, isolate all the odd bits by masking them, like so:

10110110 & 01010101 -------- 00010100 <-- odd bits

Second, shift all of the even bits so that we can place them "under" the odd bits for summing...

10110110 >> 1 -------- 01011011

...but don't forget to isolate only the even bits! (Yes, I realize the even bits have now shifted into the odd place, but bear with me.)

01011011 & 01010101 -------- 01010001 <-- even bits

Now we have two sets of bits, even and odd, which are "aligned", so we can now stack one on top of the other and sum them.

00010100 <-- odd bits + 01010001 <-- even bits -------- 01100101 <-- result A

If we stack our original pairs of bits on top of their sums, we will see that each pair of bits in our result is the sum of 1-bits.

10 11 01 10 <-- starting bits 01 10 01 01 <-- result A, from above 1 2 1 1 <-- each pair's sum in decimal

Progress! The next step is to take each pair of sums, and sum those. The number of spaces we need to shift doubles, and the bitmask pattern, used to isolate each sum, changes too.

01100101 <-- result A, from above & 00110011 -------- 00100001 <-- even sums 01100101 <-- result A, from above >> 2 -------- 00011001 <-- shift result 00011001 <-- shift result from right above & 00110011 -------- 00010001 <-- odd sums 00100001 <-- even sums + 00010001 <-- odd sums -------- 00110010 <-- result B

If we were successful, we added two pairs of sums together:

binary | decimal 10 01 | 2 1 <-- even sums + 01 01 | 1 1 <-- odd sums --------- | --- 0011 0010 | 3 2 1011 0110 <-- starting bits

Looks good! Now all we have to do is add our final two sums together. Once more, we double our shift amount, and change the bit mask pattern.

00110010 <-- result B from above & 00001111 -------- 00000010 <-- right sum 00110010 <-- result B from above >> 4 -------- 00000011 <-- left sum shifted 00000011 <-- left sum shifted from right above & 00001111 -------- 00000011 <-- left sum 00000010 <-- right sum + 00000011 <-- left sum -------- 00000101 <-- final sum, 5 in decimal

And there we are! The starting bits were 10110110, and there are indeed 5 1-bits.

In go 1.13 code, the above would look like this:

package main import ( "fmt" ) func main() { var startBits uint8 = 0b10110110 x := startBits x = (x & 0b01010101) + ((x >> 1) & 0b01010101) x = (x & 0b00110011) + ((x >> 2) & 0b00110011) x = (x & 0b00001111) + ((x >> 4) & 0b00001111) fmt.Printf("Number of 1-bits in %08b: %d\n", startBits, x) }

$ go1.13rc1 build $ ./popcnt Number of 1-bits in 10110110: 5

In the Go example, I only count the 1-bits in a byte. For the more common
use-cases of 32-bit or 64-bit numbers, you can extend out the pattern above.
Also, the above code can be simplified (and therefore sped up) even more,
which books like *Hacker's Delight* will show you how to do.
Of course, I'll return to the advice from the top;
modern CPUs have an instruction to do this, so always favor that.