Well, it's been a while since I blogged; and there's been a good reason since I had nothing worthwhile to say.
I've now got two things that I want to talk about; one is technical and the other is about cooking.
On to the technical. I'll blog the cooking in another entry.
Just recently at work I had cause to revisit an old technique that I hadn't used in years and I thought that it would be worth recording here.
We've been working with small binary fields in the database containing information that needs to be parsed. So far within the system we have kept much of the data-centric heavy lifting within the database but we were having difficulty with the binary information. The database provided a set of procedures for manipulating the binary information, but they were not very efficient. They were designed to work with extremely large pieces of binary information and tended to keep the binary information out of memory and in the database.
In general the database was far more designed to work with strings.
The work around was one that had been used when binary information needed to be manipulated by string-centric languages in the past. We converted the binary into a hexadecimal string. This allowed us to make use of in-memory string datatypes, and all the very efficient string manipulation tools that the database made available to us (but with the offsets doubled!). In this case we were fortunate as the database was able to do direct conversions from hex into numbers and then perform AND operations on the numbers as needed.
If that capability hadn't been there we could have used a further technique: to do a fast AND from hex you just build a look up table. The table provides a look up between the two hex values that you want to AND. For example a look up table that does just one hex digit at at time would look like:
A hexadecimal AND lookup tableHEX - AND | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F |
---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
---|
1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
---|
2 | 0 | 0 | 2 | 2 | 0 | 0 | 2 | 2 | 0 | 0 | 2 | 2 | 0 | 0 | 2 | 2 |
---|
3 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 |
---|
4 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 4 |
---|
5 | 0 | 1 | 0 | 1 | 4 | 5 | 4 | 5 | 0 | 1 | 0 | 1 | 4 | 5 | 4 | 5 |
---|
6 | 0 | 0 | 2 | 2 | 4 | 4 | 6 | 6 | 0 | 0 | 2 | 2 | 4 | 4 | 6 | 6 |
---|
7 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|
8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
---|
9 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 8 | 9 | 8 | 9 | 8 | 9 | 8 | 9 |
---|
A | 0 | 0 | 2 | 2 | 0 | 0 | 2 | 2 | 8 | 8 | A | A | 8 | 8 | A | A |
---|
B | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | 8 | 9 | A | B | 8 | 9 | A | B |
---|
C | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 4 | 8 | 8 | 8 | C | 8 | 8 | 8 | C |
---|
D | 0 | 1 | 0 | 1 | 4 | 5 | 4 | 5 | 8 | 9 | 8 | 9 | C | D | C | D |
---|
E | 0 | 0 | 2 | 2 | 4 | 4 | 6 | 6 | 8 | 8 | A | A | C | C | E | E |
---|
F | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F |
---|
If you don't mind sacrificing more memory a 2 hex digit or greater lookup table can be created and of course a lot of the time you can distill the lookup further as you only need to do a single bit mask operation and so you would only need on one axis 1,2,4,8 etcetera.
By turning the binary manipulation into a hexadecimal one, we managed to increase the performance by 500%. Of course for really heavy binary manipulation you can't beat a binary friendly language but this technique can be an useful 'good enough' solution.