1 概述
Huffman壓縮算法是一種基于字符出現(xiàn)頻率的編碼算法,通過(guò)構(gòu)建Huffman樹,將出現(xiàn)頻率高的字符用短編碼表示,出現(xiàn)頻率低的字符用長(zhǎng)編碼表示,從而實(shí)現(xiàn)對(duì)數(shù)據(jù)的壓縮。以下是Huffman壓縮算法的詳細(xì)流程:
統(tǒng)計(jì)字符頻率:遍歷待壓縮的數(shù)據(jù),統(tǒng)計(jì)每個(gè)字符出現(xiàn)的頻率。
構(gòu)建優(yōu)先隊(duì)列:將每個(gè)字符及其頻率作為一個(gè)結(jié)點(diǎn)放入優(yōu)先隊(duì)列(或最小堆)中,根據(jù)字符頻率構(gòu)建一個(gè)按頻率大小排序的優(yōu)先隊(duì)列。
構(gòu)建Huffman樹:不斷地從優(yōu)先隊(duì)列中取出頻率最小的兩個(gè)結(jié)點(diǎn),合并為一個(gè)新結(jié)點(diǎn),并將新結(jié)點(diǎn)重新插入到優(yōu)先隊(duì)列中,直到隊(duì)列只剩下一個(gè)結(jié)點(diǎn),即Huffman樹的根結(jié)點(diǎn)。
生成Huffman編碼:通過(guò)遍歷Huffman樹,從根結(jié)點(diǎn)到每個(gè)葉子結(jié)點(diǎn)的路徑上的左右分支分別對(duì)應(yīng)編碼0和1,根據(jù)路徑生成每個(gè)字符的Huffman編碼。
壓縮數(shù)據(jù):根據(jù)生成的Huffman編碼,將待壓縮數(shù)據(jù)中的每個(gè)字符替換為對(duì)應(yīng)的Huffman編碼,得到壓縮后的數(shù)據(jù)。
存儲(chǔ)壓縮表:將字符與對(duì)應(yīng)的Huffman編碼關(guān)系存儲(chǔ)為壓縮表,以便解壓縮時(shí)使用。
存儲(chǔ)壓縮數(shù)據(jù):將壓縮后的數(shù)據(jù)以二進(jìn)制形式存儲(chǔ)。
在解壓縮時(shí),需要根據(jù)存儲(chǔ)的Huffman編碼表和壓縮數(shù)據(jù),使用相同的Huffman樹結(jié)構(gòu)進(jìn)行解碼,將壓縮數(shù)據(jù)解壓縮成原始數(shù)據(jù),并輸出原始數(shù)據(jù)。
Huffman壓縮算法的優(yōu)勢(shì)在于可以根據(jù)數(shù)據(jù)的特征自適應(yīng)地確定編碼,使得出現(xiàn)頻率高的字符擁有更短的編碼,從而實(shí)現(xiàn)高效的數(shù)據(jù)壓縮。然而,Huffman算法對(duì)于小規(guī)模數(shù)據(jù)壓縮效果不佳,適用于處理較大規(guī)模的數(shù)據(jù)壓縮。
2 huffman壓縮算法過(guò)程詳細(xì)演示
下面將通過(guò)一個(gè)簡(jiǎn)單的例子來(lái)演示Huffman壓縮算法的壓縮過(guò)程,假設(shè)有一個(gè)字符串 “ABRACADABRA” 需要進(jìn)行壓縮。
統(tǒng)計(jì)字符頻率:
A: 5 次
B: 2 次
R: 2 次
C: 1 次
D: 1 次
2) 構(gòu)建優(yōu)先隊(duì)列:
構(gòu)建一個(gè)優(yōu)先隊(duì)列,按照字符頻率排序:
(C, 1), (D, 1), (B, 2), (R, 2), (A, 5)
3) 構(gòu)建Huffman樹:
不斷地從優(yōu)先隊(duì)列中取出頻率最小的兩個(gè)結(jié)點(diǎn),合并為一個(gè)新節(jié)點(diǎn),并重新插入隊(duì)列中,直到隊(duì)列只剩下一個(gè)節(jié)點(diǎn),作為Huffman樹的根節(jié)點(diǎn)。
合并過(guò)程:
(C, 1)和(D, 1) -> (CD, 2)
(B, 2)和(R, 2) -> (BR, 4)
((CD, 2) 和 (BR, 4)) -> ((CD)BR, 6)
((A, 5) 和 ((CD)BR, 6)) -> (((CD)BR)A, 11)
最終得到的Huffman樹如下:
(((CD)BR)A)
/
(CD)BR A
/
CD BR
/ /
C D B R
生成Huffman編碼:
從根節(jié)點(diǎn)開始,左分支為0,右分支為1,生成每個(gè)字符的Huffman編碼:
A: 0
B: 101
R: 100
C: 1100
D: 1101
6) 壓縮數(shù)據(jù):
將原始數(shù)據(jù)字符串 “ABRACADABRA” 中的每個(gè)字符使用對(duì)應(yīng)的Huffman編碼替換,得到壓縮后的數(shù)據(jù)。
原始數(shù)據(jù):ABRACADABRA
Huffman編碼:010110011001011010001011110
壓縮后數(shù)據(jù):010110011001011010001011110
在實(shí)際壓縮過(guò)程中,還需要將Huffman編碼表(字符與編碼的映射關(guān)系)一并存儲(chǔ),以便在解壓縮時(shí)使用。通過(guò)上述過(guò)程,原始數(shù)據(jù)被成功壓縮,并且根據(jù)Huffman編碼,高頻字符編碼較短,低頻字符編碼較長(zhǎng),實(shí)現(xiàn)了數(shù)據(jù)的有效壓縮。
3 c語(yǔ)言Huffman壓縮代碼示例
以下是一個(gè)簡(jiǎn)單的C語(yǔ)言示例代碼,實(shí)現(xiàn)了Huffman算法進(jìn)行數(shù)據(jù)壓縮和解壓縮的功能:
#include#include #include #define MAX_TREE_HT 100 // 結(jié)點(diǎn)結(jié)構(gòu)體 typedef struct MinHeapNode { char data; unsigned freq; struct MinHeapNode *left, *right; } MinHeapNode; // 最小堆結(jié)構(gòu)體 typedef struct MinHeap { unsigned size; unsigned capacity; MinHeapNode **array; } MinHeap; // 創(chuàng)建新結(jié)點(diǎn) MinHeapNode* newNode(char data, unsigned freq) { MinHeapNode* node = (MinHeapNode*)malloc(sizeof(MinHeapNode)); node->left = node->right = NULL; node->data = data; node->freq = freq; return node; } // 創(chuàng)建最小堆 MinHeap* createMinHeap(unsigned capacity) { MinHeap* minHeap = (MinHeap*)malloc(sizeof(MinHeap)); minHeap->size = 0; minHeap->capacity = capacity; minHeap->array = (MinHeapNode**)malloc(minHeap->capacity * sizeof(MinHeapNode*)); return minHeap; } // 交換兩個(gè)結(jié)點(diǎn) void swapMinHeapNodes(MinHeapNode** a, MinHeapNode** b) { MinHeapNode* t = *a; *a = *b; *b = t; } // 最小堆調(diào)整 void minHeapify(MinHeap* minHeap, int idx) { int smallest = idx; int left = 2 * idx + 1; int right = 2 * idx + 2; if (left < minHeap->size && minHeap->array[left]->freq < minHeap->array[smallest]->freq) smallest = left; if (right < minHeap->size && minHeap->array[right]->freq < minHeap->array[smallest]->freq) smallest = right; if (smallest != idx) { swapMinHeapNodes(&minHeap->array[smallest], &minHeap->array[idx]); minHeapify(minHeap, smallest); } } // 獲取最小結(jié)點(diǎn) MinHeapNode* extractMin(MinHeap* minHeap) { MinHeapNode* temp = minHeap->array[0]; minHeap->array[0] = minHeap->array[minHeap->size - 1]; --minHeap->size; minHeapify(minHeap, 0); return temp; } // 插入結(jié)點(diǎn) void insertMinHeap(MinHeap* minHeap, MinHeapNode* minHeapNode) { ++minHeap->size; int i = minHeap->size - 1; while (i && minHeapNode->freq < minHeap->array[(i - 1) / 2]->freq) { minHeap->array[i] = minHeap->array[(i - 1) / 2]; i = (i - 1) / 2; } minHeap->array[i] = minHeapNode; } // 創(chuàng)建和構(gòu)建最小堆 MinHeap* buildMinHeap(char data[], int freq[], int size) { MinHeap* minHeap = createMinHeap(size); for (int i = 0; i < size; ++i) minHeap->array[i] = newNode(data[i], freq[i]); minHeap->size = size; for (int i = size / 2 - 1; i >= 0; --i) minHeapify(minHeap, i); return minHeap; } // 檢查結(jié)點(diǎn)是否是葉子結(jié)點(diǎn) int isLeaf(MinHeapNode* root) { return !(root->left) && !(root->right); } // 構(gòu)建霍夫曼樹 MinHeapNode* buildHuffmanTree(char data[], int freq[], int size) { MinHeapNode *left, *right, *top; MinHeap* minHeap = buildMinHeap(data, freq, size); while (minHeap->size != 1) { left = extractMin(minHeap); right = extractMin(minHeap); top = newNode('$', left->freq + right->freq); top->left = left; top->right = right; insertMinHeap(minHeap, top); } return extractMin(minHeap); } // 打印霍夫曼編碼 void printCodes(MinHeapNode* root, int arr[], int top) { if (root->left) { arr[top] = 0; printCodes(root->left, arr, top + 1); } if (root->right) { arr[top] = 1; printCodes(root->right, arr, top + 1); } if (isLeaf(root)) { printf("%c: ", root->data); for (int i = 0; i < top; ++i) printf("%d", arr[i]); printf(" "); } } // 壓縮數(shù)據(jù) void huffmanCompression(char data[]) { int freq[256] = {0}; int n = strlen(data); for (int i = 0; i < n; ++i) ++freq[data[i]]; char arr[256]; int freq2[256]; int size = 0; for (int i = 0; i < 256; ++i) { if (freq[i] != 0) { arr[size] = (char)i; freq2[size] = freq[i]; ++size; } } MinHeapNode* root = buildHuffmanTree(arr, freq2, size); int arr2[MAX_TREE_HT], top = 0; printCodes(root, arr2, top); } int main() { char data[] = "hello world"; huffmanCompression(data); return 0; }
這個(gè)示例代碼演示了使用Huffman算法對(duì)輸入的數(shù)據(jù)進(jìn)行壓縮,并打印出各個(gè)字符的Huffman編碼。huffmanCompression 函數(shù)首先統(tǒng)計(jì)輸入數(shù)據(jù)中每個(gè)字符的出現(xiàn)頻率,并構(gòu)建Huffman樹,然后通過(guò)遞歸遍歷Huffman樹獲取每個(gè)字符的Huffman編碼并打印出來(lái)。在 main 函數(shù)中,我們對(duì)一個(gè)簡(jiǎn)單的字符串進(jìn)行了壓縮,并輸出了每個(gè)字符的Huffman編碼。
需要注意的是,這個(gè)示例代碼僅演示了Huffman算法的基本壓縮原理,實(shí)際應(yīng)用中可能需要對(duì)數(shù)據(jù)內(nèi)容、編碼方式等進(jìn)行更多處理和優(yōu)化。
4 C語(yǔ)言Huffman解壓縮算法示例
以下是一個(gè)簡(jiǎn)單的C語(yǔ)言示例代碼,實(shí)現(xiàn)了Huffman算法進(jìn)行數(shù)據(jù)解壓縮的功能:
#include#include #include // 結(jié)點(diǎn)結(jié)構(gòu)體 typedef struct MinHeapNode { char data; struct MinHeapNode *left, *right; } MinHeapNode; // 解壓縮數(shù)據(jù) void huffmanDecompression(char data[], MinHeapNode* root) { int n = strlen(data); MinHeapNode* current = root; for (int i = 0; i < n; ++i) { if (data[i] == '0') { current = current->left; } else { current = current->right; } if (current->left == NULL && current->right == NULL) { printf("%c", current->data); current = root; } } } int main() { // 構(gòu)造Huffman樹 MinHeapNode *root = (MinHeapNode*)malloc(sizeof(MinHeapNode)); root->left = root->right = NULL; MinHeapNode *A = (MinHeapNode*)malloc(sizeof(MinHeapNode)); A->data = 'A'; A->left = A->right = NULL; MinHeapNode *B = (MinHeapNode*)malloc(sizeof(MinHeapNode)); B->data = 'B'; B->left = B->right = NULL; MinHeapNode *C = (MinHeapNode*)malloc(sizeof(MinHeapNode)); C->data = 'C'; C->left = C->right = NULL; root->left = A; root->right = B; A->left = C; A->right = NULL; B->left = B->right = NULL; C->left = C->right = NULL; // 待解壓縮的數(shù)據(jù) char data[] = "00100110001"; // 解壓縮數(shù)據(jù) huffmanDecompression(data, root); return 0; }
在這個(gè)簡(jiǎn)單的示例代碼中,我們首先構(gòu)建了一個(gè)簡(jiǎn)單的Huffman樹,然后定義了一個(gè)待解壓縮的數(shù)據(jù)字符串。huffmanDecompression 函數(shù)接受壓縮后的數(shù)據(jù)和Huffman樹的根結(jié)點(diǎn)作為參數(shù),通過(guò)逐位解析壓縮后的數(shù)據(jù),按照Huffman樹逐步走到葉子結(jié)點(diǎn),從而解壓縮出原始數(shù)據(jù)并打印。
在 main 函數(shù)中,我們構(gòu)造了一個(gè)簡(jiǎn)單的Huffman樹,并指定了一個(gè)簡(jiǎn)單的待解壓縮的數(shù)據(jù)字符串,然后調(diào)用 huffmanDecompression 函數(shù)進(jìn)行解壓縮操作。解壓縮過(guò)程中,輸出的字符序列應(yīng)該是根據(jù)Huffman樹進(jìn)行解碼后的原始數(shù)據(jù)。
需要注意的是,這個(gè)示例代碼中的Huffman樹和待解壓縮的數(shù)據(jù)都是固定的,實(shí)際應(yīng)用中可能需要根據(jù)具體的壓縮數(shù)據(jù)和Huffman樹結(jié)構(gòu)進(jìn)行相應(yīng)的解壓縮處理。
-
算法
+關(guān)注
關(guān)注
23文章
4612瀏覽量
92891 -
C語(yǔ)言
+關(guān)注
關(guān)注
180文章
7604瀏覽量
136823 -
函數(shù)
+關(guān)注
關(guān)注
3文章
4331瀏覽量
62618 -
Huffman
+關(guān)注
關(guān)注
0文章
4瀏覽量
6356
原文標(biāo)題:Huffman算法壓縮解壓縮(C)
文章出處:【微信號(hào):leezym0317,微信公眾號(hào):FPGA開源工作室】歡迎添加關(guān)注!文章轉(zhuǎn)載請(qǐng)注明出處。
發(fā)布評(píng)論請(qǐng)先 登錄
相關(guān)推薦
評(píng)論