Record batching
As explained priorly, the fetch response includes multiple resulting records batched together. But how many records should be batched? Sending too few records is going to increase a number of round-trips. Sending too many records will cause the client to stall waiting until it can acknowledge the delivery. And it could also be possible that the client does not need so many records and it was going to close the cursor after receiving the first one. So it turns out that some compromise is required here. The minimal practical batch size depends on the used protocol buffer size, i.e. how many records could be cached before sending and then transmitted as a single packet. The buffer size for the TCP protocol is defined by the TcpRemoteBufferSize setting in firebird.conf. However, it often makes sense to send more records (i.e. a few protocol buffers) without waiting for an ACK, because the CPU power could allow to process more records while waiting for the network to transmit the next batch.
Firebird has its batching logic optimized to transfer between 8 and 16 packets at once. The client library sends this number to the server, waits for transmission of the requested number of records, starts caching the received records and returning them to the client application. The tricky thing here is that the batch size is expressed in records and this value is calculated using the batch size in packets and the expanded (unpacked) record length. A soon as the record gets packed or compressed in some way, the calculation becomes wrong and it results in sending less packets than expected. Also, the last packet could be sent incomplete. Currently, the only “compression” that’s available is trimming of VARCHAR values. So the batching could be either effective or somewhat sub-optimal depending on how many VARCHAR values are fetched and how long the actual strings are as compared to their declared lengths.
The important thing to remember here is that it’s the client library that calculates the batch size. It means that if you need to change the number of records transmitted as a single fetch response, you need to alter the TcpRemoteBufferSize setting on the client side. The server side setting does not matter here.
If the server waits for the next fetch request after sending the batch, or if the client asks for the next batch after processing all the cached records, this is known as a synchronous batching. But obviously, it wastes a lot of time in the case of slow networks. So Firebird uses the asynchronous batching, also known as pipelining. As soon as all records of the batch are sent to the client, the server starts to fetch new records from the engine and cache them for the next transmission. As soon as the client library has processed some part of the current batch, it asks the server for the next batch and continues processing the remaining records. This allows to distribute the load more evenly and provide a better overall throughput. The current (hardcoded) pipelining threshold is 1/2 of the batch size.
Now let’s review what could be enhanced in this area:
- Denser encoding of records is possible. The XDR encoding used currently is very sparse: all numerics are sent as at least four-byte values, all values are aligned at the four-byte boundary, NULL flags are transmitted as integers, etc. One solution could be to replace XDR with some other encoding that’s aware of the data specifics and that could provide a denser representation while still respecting the cross-platform interoperability (network byte order). Also, some computationally cheap compression like RLE could also be applied. It would result in more records per protocol buffer, hence less round-trips to transmit the same number of records.
- The batch size calculation should take the record compression into account. Or, even better, the server should just send as many records as it can fit into the desired number of packets and forget about any tricky computations at all. It would allow a more honest transmission with packets filled properly.
- The pipelining threshold could be adjusted dynamically depending on the network specifics. For high-latency networks, it makes a lot of sense to ask for the next batch immediately after receiving the current one. The quick tests show that more than 50% of a performance gain is possible here.
I’m working on a prototype that could demonstrate the ideas expressed here, so please expect a follow-up post with some performance figures in the not-so-distant future.