Calculating the XMODEM CRC J. LeVan, Ph.D. Introduction: I have been in the process of developing a terminal emulator program for my home computer. I was very unhappy with the Xmodem transfer rate, especially at high modem speeds. The checksum method is clearly faster, but cannot catch communication errors as effectively. The CRC method has an upper end throughput of 500 bytes per second. I place a help call on CompuServe and receive an intriquing response from Bela Lubkin. The algorithm he suggested appeared to be much quicker than the classical bit shift method. It was not at all clear to me how the algorithm worked. The purpose of this note is an explanation of how and why a table lookup algorithm can work. Throughout this note, I will use "effective throughput" to mean: (SizeOfPacket * NumberOfPackets)/(TimeOfEot - TimeOfInitialSOH) All source code will be in C. These programs were written to run on an Apple MacIntosh, which is equipped with a high speed serial port. Below is the standard bit shift and occasionally XOR algorithm. The condition for doing the XOR is simply, if a 1 is going to be shifted out of the sign bit then shift and XOR the old CRC with the generator. Note that the inner loop is performed exactly eight times. ________________________________________________________________ ___Fig.1: Bit Shift CRC_________________________________________ unsigned short ComputeCrc(crc,bufptr,len) register unsigned short crc; /* unsigned shorts are 16 bits */ register char *bufptr; register short len; { register int i; while (len--) { /* calculate CRC from end to start*/ crc ^= (unsigned short) (*bufptr++)<<8; for ( i=0 ; i<8 ; ++i ) { if (crc & 0x8000) crc = (crc << 1) ^ 0x1021; else crc <<= 1; } } return(crc); } ________________________________________________________________ After exactly eight shifts, there will be no carries from the low byte out of the top of the high byte. This leads us to a critical observation: Ob 1: Whether or not an XOR is done with the CRC does *not* depend on the low byte of the CRC. Let A, B, and C be sixteen bit unsigned quantities. We'll use the standard C notation for XOR, a carat (^). The following mathematical properties hold. Ob 2: XOR is Associative and Commutative. I.e: For any choice of A, B and C; these equations will always hold true: A^B = B^A (1) (A^B)^C = A^(B^C) (2) Likewize, using the standard C notation for a left shift ( A<> 8^(*bufptr++)] ^ crc<<8; return(crc); } ________________________________________________________________ Thus our third term is found by: ((P1<<7) ^ (P2<<6) ^ ... ^ P8) = CrcTable[crch>>8] the final crc is caluclated by: crc = crc_table[crch>>8] ^ (crcl<<8) Substituting this relation into the algorithm in Fig. 1, we easily obtain the algorithm in Fig. 2. That much having been said, it only remains to be seen, whether or not it has all been worthwhile. How much faster is table lookup? After building a 128 byte sector filled with random numbers, I ran a benchmark in which the CRC and Checksum were each calculated 100 times with the following results. Here each tick represents 1/60th of a second. ________________________________________________________________ ___Table.1: Comparison of methods for 100 sectors_______________ Slow CRC 120 ticks Table CRC 25 ticks Checksum 3 ticks ________________________________________________________________ The table driven method is an order of magnitude faster than the older CRC method, and only an order of magnitude slower than the Checksum method. This offers a much smoother trade off than originally available with the CRC check. What do these differences mean in terms of effective through put? My communications program also uses the following optimizations which can affect transfer speed: All disk I/O is double buffered. The transmitter sends the body of the Xmodem Packet while simultaneously calculating the CRC. This leads to an almost zero calculate time at low speeds. The machine on the receiving end of the line only calculates the CRC after a complete packet has arrived. ________________________________________________________________ ___Table.2: Transfer measurements_______________________________ Effective Throughput: (bytes/sec) Baud: 1200 2400 9600 19200 57600 Method: (128 byte packets) Slow CRC 106.......201.......598.......770.......773 Table CRC 109.......210.......692.......1113......1172 Checksum 110.......213.......709.......1155......1263 Method: (1k byte packets) Slow CRC ..........210.......628.......939.......994 Table CRC ..........220.......729.......1185......1185 Checksum ..........222.......745.......1226......1973 ________________________________________________________________ One should take benchmarks with a grain of salt. However, it is clear that the table lookup algorithm stays closer to the checksum method than the standard bit shift algorithm. The efficiency and speed of compiled code can vary greatly from machine to machine. These tests were made with two computers linked directly through their serial ports. I have found that CompuServe typically yields and effective throughput of 60-80 bytes per second at 1200 baud. Genie typically yields 70-80 bytes per second, and my neighborhood Vax 785 lets me run with an effective throughput of 103-105 bytes per second, all with Xmodem. In my first attempt at Xmodem, I used a function call for processing each character in the crc loop. In addition, I did not overlap the CRC calculation with transmission of the packet. This was a dreadful mistake. I could not get an effective throughput of more than 500 characters per second even with 4k buffers at any speed. The bottom line is that it take a number of optimizations to speed up a complex process.