当前位置：网站首页>Base64 encoding and decoding principle and C language implementation

Base64 encoding and decoding principle and C language implementation

2022-07-18 14:00:00 【hwd00001】

List of articles

0.base64 The purpose of coding
1. Base64 The coding principle of
- 1.1 example
- 1.2 Not enough digits 3 In the case of bytes
2.base64 Decoding principle
- 2.1 Instance to explain
- 2.2 Organization Decode index table
3. Complete code

Reference material ：
1. Principle introduction materials , author ： New perspective of procedure , article 《 A thorough understanding of an article Base64 Coding principle 》
2. Code reference , author ：ssmile, article 《C Language implementation base64 Codec function 》

0.base64 The purpose of coding

Use the following 64 Printable characters , To represent byte stream ( The value can be 0-255). There is another supplementary character ‘=’.

“ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/”

1. Base64 The coding principle of

Base64 Encoding is to encode a string with each 3 individual 8 The bit (bit) The byte subsequence of is split into 4 individual 6 The bit (bit) Bytes of (6 Bit valid byte , Actually, too. 8 Bytes of bits , Only the leftmost two bits are always 0) Subsequence , Then find the obtained subsequence Base64 Coding index table , A coding method to get the corresponding characters spliced into a new string .
After the coding , Every time 3 Bytes become 4 Bytes , Increase the number of bytes by one third .
Let's use examples to illustrate ：

1.1 example

The table in the following figure is an example , Let's analyze the whole process
Insert picture description here
【 First step 】：“M”、“a”、"n" Corresponding ASCII The code values are respectively 77,97,110, The corresponding binary value is 01001101、01100001、01101110. As shown in the second and third lines of the picture , From this we form a 24 Bit binary string .
【 The second step 】： As shown in the red box , take 24 Bitwise 6 A group of bits is divided into four groups .
【 The third step 】： Add two in front of each group above 0, Expanded into 32 Binary bits , This becomes four bytes ：00010011、00010110、00000101、00101110. The values corresponding to each of them （Base64 Coded index ） by ：19、22、5、46.
Step four ： Use the values above in Base64 Search in the encoding table , They correspond to each other ：T、W、F、u. So strings “Man” After coding, it becomes ：TWFu.

1.2 Not enough digits 3 In the case of bytes

The above is illustrated in terms of three bytes , If the number of bytes is less than three , So how to deal with ？
Insert picture description here

A byte ： One byte of 8 Binary bits , Still grouped according to the rules . At this time, a total of 8 Binary bits , Every time 6 In groups , The second group lacks 4 position , use 0 A filling , Get two Base64 code , The latter two groups have no corresponding data , Use both “=” Fill up . therefore , Above picture “A” After conversion, it becomes “QQ==”;
Two bytes ： Two bytes in total 16 Binary bits , Still grouped according to the rules . At this time, a total of 16 Binary bits , Every time 6 In groups , The third group lacks 2 position , use 0 A filling , Get three Base64 code , The fourth group has no data at all “=” Fill up . therefore , Above picture “BC” After conversion, it becomes “QKM=”;
c The source code of the language is as follows （ Basic copy from ssmile）：

// base64  Conversion table ,  common 64 individual 
static const char base64_alphabet[] = {
    
    'A', 'B', 'C', 'D', 'E', 'F', 'G',
    'H', 'I', 'J', 'K', 'L', 'M', 'N',
    'O', 'P', 'Q', 'R', 'S', 'T',
    'U', 'V', 'W', 'X', 'Y', 'Z',
    'a', 'b', 'c', 'd', 'e', 'f', 'g',
    'h', 'i', 'j', 'k', 'l', 'm', 'n',
    'o', 'p', 'q', 'r', 's', 't',
    'u', 'v', 'w', 'x', 'y', 'z',
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    '+', '/'};
static char cmove_bits(unsigned char src, unsigned lnum, unsigned rnum) {
    
    src <<= lnum; 
    src >>= rnum;
    return src;
}
 
int base64_encode(  char *indata, int inlen, char *outdata, int *outlen) {
    
    
    int ret = 0; // return value
    if (indata == NULL || inlen == 0) {
    
        return ret = -1;
    }
    
    int in_len = 0; //  Source string length ,  If in_len No 3 Multiple ,  Then it needs to be supplemented with 3 Multiple 
    int pad_num = 0; //  The number of characters to be completed ,  This is the only way 2, 1, 0(0 There's no need to splice , )
    if (inlen % 3 != 0) {
    
        pad_num = 3 - inlen % 3;
    }
    in_len = inlen + pad_num; //  The length after splicing ,  The length of the actual encoding required (3 Multiple )
    
    int out_len = in_len * 8 / 6; //  Length after coding 
    
    char *p = outdata; //  Define pointer to outgoing data The first address 
    
    // code ,  The length is the adjusted length , 3 A set of bytes 
    for (int i = 0; i < in_len; i+=3) {
    
        int value = *indata >> 2; //  take indata The first character moves to the right 2bit( discarded 2bit)
        char c = base64_alphabet[value]; //  Corresponding base64 Conversion table characters 
        *p = c; //  Will correspond to the character ( Characters after encoding ) Assign a value to outdata First byte 
        
        // Deal with the last group ( Last 3 byte ) The data of 
        if (i == inlen + pad_num - 3 && pad_num != 0) {
    
            if(pad_num == 1) {
    
                *(p + 1) = base64_alphabet[(int)(cmove_bits(*indata, 6, 2) + cmove_bits(*(indata + 1), 0, 4))];
                *(p + 2) = base64_alphabet[(int)cmove_bits(*(indata + 1), 4, 2)];
                *(p + 3) = '=';
            } else if (pad_num == 2) {
     //  The encoded data should be supplemented with two  '='
                *(p + 1) = base64_alphabet[(int)cmove_bits(*indata, 6, 2)];
                *(p + 2) = '=';
                *(p + 3) = '=';
            }
        } else {
     //  Deal with normal 3 Bytes of data 
            *(p + 1) = base64_alphabet[cmove_bits(*indata, 6, 2) + cmove_bits(*(indata + 1), 0, 4)];
            *(p + 2) = base64_alphabet[cmove_bits(*(indata + 1), 4, 2) + cmove_bits(*(indata + 2), 0, 6)];
            *(p + 3) = base64_alphabet[*(indata + 2) & 0x3f];
        }
        
        p += 4;
        indata += 3;
    }
    
    if(outlen != NULL) {
    
        *outlen = out_len;
    }
    
    return ret;
}

2.base64 Decoding principle

Reverse deduction , By each 4 Bytes （ Each byte contains 6 Bit significant bit ） Merge into 3 individual 8 Bit binary number .

2.1 Instance to explain

With “TWFu” For example , decode . Or look back at the first picture , Look up from the bottom .
Insert picture description here

Ideas
【 First step 】：‘TWFu’ The positions in the code index table are 19,22,5,46; Binary representations are 00010011、00010110、00000101、00101110, Their highest 2 Bit is invalid , For ever 0, Just take the low 6 A bit .
【 The second step 】： this 4 The significant bits of the number are 010011、010110、000101、101110.
【 The third step 】： take 4 The number of significant bits is combined into 24 The bit , Then it is divided into 3 Bytes （ use [] Cover up ）.
[010011、01][0110、0001][01、101110]. The decimal systems are 77,97,110, That is to say ASCII code “Man”.

2.2 Organization Decode index table

To get the position of characters in the encoding index table , Look up the position of characters in the table every time ; In order to improve efficiency , You can compile a 128 Decoding index table of bytes , Such as the above “TWFu” Of ’T’, Corresponding 10 Into the system for 84, The position in the encoding index table is 19, Then we can decode the subscript of the index table 84 Position of 19; Empathy ,‘W’ Corresponding 10 Into the system for 87, The position in the encoding index table is 22, Then we can decode the subscript of the index table 87 Position of 22,64 Subscript positions corresponding to characters participating in encoding , Respectively house their index values in the coding index table . We name the decoding index table base64DecodeChars, So in this table , use C Language means , There is the following correspondence ：

 base64DecodeChars['T']    ---    19
 base64DecodeChars['W']    ---    22
 base64DecodeChars['F']    ---    5 
 base64DecodeChars['u']    ---    46

3. Complete code

#include <stdio.h>
#include <stdlib.h>
 
// base64  Conversion table ,  common 64 individual 
static const char base64_alphabet[] = {
    
    'A', 'B', 'C', 'D', 'E', 'F', 'G',
    'H', 'I', 'J', 'K', 'L', 'M', 'N',
    'O', 'P', 'Q', 'R', 'S', 'T',
    'U', 'V', 'W', 'X', 'Y', 'Z',
    'a', 'b', 'c', 'd', 'e', 'f', 'g',
    'h', 'i', 'j', 'k', 'l', 'm', 'n',
    'o', 'p', 'q', 'r', 's', 't',
    'u', 'v', 'w', 'x', 'y', 'z',
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    '+', '/'};
 
//  Decode with  base64DecodeChars
static const unsigned char base64_suffix_map[256] = {
    
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 253, 255,
    255, 253, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 253, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255,  62, 255, 255, 255,  63,
    52,  53,  54,  55,  56,  57,  58,  59,  60,  61, 255, 255,
    255, 254, 255, 255, 255,   0,   1,   2,   3,   4,   5,   6,
    7,   8,   9,  10,  11,  12,  13,  14,  15,  16,  17,  18,
    19,  20,  21,  22,  23,  24,  25, 255, 255, 255, 255, 255,
    255,  26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,
    37,  38,  39,  40,  41,  42,  43,  44,  45,  46,  47,  48,
    49,  50,  51, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255 };
 
static char cmove_bits(unsigned char src, unsigned lnum, unsigned rnum) {
    
    src <<= lnum; 
    src >>= rnum;
    return src;
}
 
int base64_encode(  char *indata, int inlen, char *outdata, int *outlen) {
    
    
    int ret = 0; // return value
    if (indata == NULL || inlen == 0) {
    
        return ret = -1;
    }
    
    int in_len = 0; //  Source string length ,  If in_len No 3 Multiple ,  Then it needs to be supplemented with 3 Multiple 
    int pad_num = 0; //  The number of characters to be completed ,  This is the only way 2, 1, 0(0 There's no need to splice , )
    if (inlen % 3 != 0) {
    
        pad_num = 3 - inlen % 3;
    }
    in_len = inlen + pad_num; //  The length after splicing ,  The length of the actual encoding required (3 Multiple )
    
    int out_len = in_len * 8 / 6; //  Length after coding 
    
    char *p = outdata; //  Define pointer to outgoing data The first address 
    
    // code ,  The length is the adjusted length , 3 A set of bytes 
    for (int i = 0; i < in_len; i+=3) {
    
        int value = *indata >> 2; //  take indata The first character moves to the right 2bit( discarded 2bit)
        char c = base64_alphabet[value]; //  Corresponding base64 Conversion table characters 
        *p = c; //  Will correspond to the character ( Characters after encoding ) Assign a value to outdata First byte 
        
        // Deal with the last group ( Last 3 byte ) The data of 
        if (i == inlen + pad_num - 3 && pad_num != 0) {
    
            if(pad_num == 1) {
    
                *(p + 1) = base64_alphabet[(int)(cmove_bits(*indata, 6, 2) + cmove_bits(*(indata + 1), 0, 4))];
                *(p + 2) = base64_alphabet[(int)cmove_bits(*(indata + 1), 4, 2)];
                *(p + 3) = '=';
            } else if (pad_num == 2) {
     //  The encoded data should be supplemented with two  '='
                *(p + 1) = base64_alphabet[(int)cmove_bits(*indata, 6, 2)];
                *(p + 2) = '=';
                *(p + 3) = '=';
            }
        } else {
     //  Deal with normal 3 Bytes of data 
            *(p + 1) = base64_alphabet[cmove_bits(*indata, 6, 2) + cmove_bits(*(indata + 1), 0, 4)];
            *(p + 2) = base64_alphabet[cmove_bits(*(indata + 1), 4, 2) + cmove_bits(*(indata + 2), 0, 6)];
            *(p + 3) = base64_alphabet[*(indata + 2) & 0x3f];
        }
        
        p += 4;
        indata += 3;
    }
    
    if(outlen != NULL) {
    
        *outlen = out_len;
    }
    
    return ret;
}
 
 
int base64_decode(const char *indata, int inlen, char *outdata, int *outlen) {
    
    
    int ret = 0;
    if (indata == NULL || inlen <= 0 || outdata == NULL || outlen == NULL) {
    
        return ret = -1;
    }
    if (inlen % 4 != 0) {
     //  The data to be decoded is not 4 Byte multiples 
        return ret = -2;
    }
    
    int t = 0, x = 0, y = 0, i = 0;
    unsigned char c = 0;
    int g = 3;
    
    //while (indata[x] != 0) {
    
    while (x < inlen) {
    
        //  The data to be decoded corresponds to ASCII Value correspondence base64_suffix_map Value 
        c = base64_suffix_map[indata[x++]];
        if (c == 255) return -1;//  The corresponding value is not in the transcoding table 
        if (c == 253) continue;//  The corresponding value is line feed or carriage return 
        if (c == 254) {
     c = 0; g--; }//  The corresponding value is '='
        t = (t<<6) | c; //  Put them in a sequence of int In the middle of the pattern 3 byte 
        if (++y == 4) {
    
            outdata[i++] = (unsigned char)((t>>16)&0xff);
            if (g > 1) outdata[i++] = (unsigned char)((t>>8)&0xff);
            if (g > 2) outdata[i++] = (unsigned char)(t&0xff);
            y = t = 0;
        }
    }
    if (outlen != NULL) {
    
        *outlen = i;
    }
    return ret;
}