有一天,我在的一个微信群的人聊起了压缩相关的事情,然后群主组织了一个编程比赛,要求是压缩一段ASCII文本。我是用Ruby 以Huffman编码的方式实现了一下。刚开始的时候自以为很简单。结果用了好久才写好。用这个机会我也好好学了一下。

代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114

class TreeNode
attr_accessor :data, :size, :left, :right
def initialize(data, left=nil, right=nil)
self.data = data
self.left = left
self.right = right
end

def leaf?
left.nil? and right.nil?
end
end

def get_frequency_table(str)
table = {}
str.split("").each do |char|
if table[char]
table[char] += 1
else
table[char] = 1
end
end
table
end

def get_huffman_tree(frequency_table)
nodes = frequency_table.sort_by {|char, frequency| frequency}.map{|item| TreeNode.new(item)}

# add place holder
max_frequncy_node = TreeNode.new(["", nodes.last.data[1] + 1])
nodes << max_frequncy_node

while nodes.size > 1
node_0 = nodes[0]
node_1 = nodes[1]
new_node = TreeNode.new([nil, node_0.data[1] + node_1.data[1]], node_0, node_1)

# new arry
nodes = nodes[2, nodes.size] << new_node
nodes.sort_by! {|node| node.data[1]}
end
nodes.first
end

def get_huffman_coding_table(huffman_tree)
return {huffman_tree.data[0] => "0"} if huffman_tree.leaf?
depth_first_iterator(huffman_tree, "", {})
end

def depth_first_iterator(node, code, table)
if node.leaf?
table[node.data[0]] = code
end

table.merge(depth_first_iterator(node.left, code + "0", table)) if node.left
table.merge(depth_first_iterator(node.right, code + "1", table)) if node.right
table
end

def encode(str, table)
bin_str = str.split('').map {|char| table[char]}.join

while bin_str.size % 7 != 0
# add place holder
bin_str += table.to_a.last[1]
end

bin_str.scan(/.{7}/).map do |oct_string|
oct_string.to_i(2).chr
end.join
end

def decode(str, table)
table = Hash[table.map { |i| [i[1], i[0]] }]
result = ""

code = ""
str.split('').map do |char|
bin_str = char.ord.to_s(2)
"0" * (7 - bin_str.size) + bin_str
end.join.split('').each do |bit|
code += bit
if table[code]
result += table[code]
code = ""
end
end
result
end

str = %(
I’ll make this short: The thing you’re doing now, reading prose on a screen, is going out of fashion.

We’re taking stock of the internet right now, with writers who cover the digital world cataloging some of the most consequential currents shaping it. If you probe those currents and look ahead to the coming year online, one truth becomes clear. The defining narrative of our online moment concerns the decline of text, and the exploding reach and power of audio and video.

THIS MULTIMEDIA INTERNET has been gaining on the text-based internet for years. But last year, the story accelerated sharply, and now audio and video are unstoppable. The most influential communicators online once worked on web pages and blogs. They’re now making podcasts, Netflix shows, propaganda memes, Instagram and YouTube channels, and apps like HQ Trivia.

Consider the most compelling digital innovations now emerging: the talking assistants that were the hit of the holidays, Apple’s face-reading phone, artificial intelligence to search photos or translate spoken language, and augmented reality — which inserts any digital image into a live view of your surroundings.
)
table = get_huffman_coding_table(get_huffman_tree(get_frequency_table(str)))
encoded_string = encode(str, table)
decoded_srring = decode(encoded_string, table)

puts "============= Origin"
puts str
puts "============= Encoded"
puts encoded_string
puts "============= Decoded"
puts decoded_srring
puts "============= Result"
puts "Origin string length in ASCII: #{str.size}"
puts "Encoded string length in ASCII: #{encoded_string.size}"

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
ruby practice.rb
============= Origin

I’ll make this short: The thing you’re doing now, reading prose on a screen, is going out of fashion.

We’re taking stock of the internet right now, with writers who cover the digital world cataloging some of the most consequential currents shaping it. If you probe those currents and look ahead to the coming year online, one truth becomes clear. The defining narrative of our online moment concerns the decline of text, and the exploding reach and power of audio and video.

THIS MULTIMEDIA INTERNET has been gaining on the text-based internet for years. But last year, the story accelerated sharply, and now audio and video are unstoppable. The most influential communicators online once worked on web pages and blogs. They’re now making podcasts, Netflix shows, propaganda memes, Instagram and YouTube channels, and apps like HQ Trivia.

Consider the most compelling digital innovations now emerging: the talking assistants that were the hit of the holidays, Apple’s face-reading phone, artificial intelligence to search photos or translate spoken language, and augmented reality — which inserts any digital image into a live view of your surroundings.
============= Encoded
Zd
PmSf_3khNo!Z,
5&[>A\XcA~My#
v:3o[w3z
hKGh8pP|i ?x
v[>C*!0)q_r$8(6z cA~@$6`H}_P
msG /[C7PFT mgePq
?A&StF;8!Z:>i^}CzDi-rGx6z>;`S+K\>,wt>:=lwg1o'd9;%Dvz`EB`S OG#BC7QC|9yywNaI(OZ7_0~OV%A!
^j2t8(}bNFcX9yy{[gPmt_,t8(>zC7SAF(8eA&|_ME`qKAvqa"t
O
A`9g%iiGx&_&PO~k}8?AT4}8=9_ ;%
Gd`aC0{CBa{<q`'n59;y&CDcM..Pmtxo^T/o\G6SpsfW6(j~E`s dx\H><.}3c;*Gb;=
cB`$4v
JD|v
4a>)_?#qg#sw mCGde"wQ8!by[m-6}p1BR
"m,"p8oQ 'fWd=} F.ex~~DoF 'M8?#]JD\
AbY3t I @D}HJK@ksb[C7QC|9ylD5"
puL8fXEA5tT
2G5^@K{A^|QKF->;z3"|%" E{bxkCz
sn. d!;:
s?Q{Cm9r
>O:'2\
"Eoh\ L;af|zQ Ks0Yw6?"s*jD
PX7EP\j>MgZ7[t`eA<K
============= Decoded

I’ll make this short: The thing you’re doing now, reading prose on a screen, is going out of fashion.

We’re taking stock of the internet right now, with writers who cover the digital world cataloging some of the most consequential currents shaping it. If you probe those currents and look ahead to the coming year online, one truth becomes clear. The defining narrative of our online moment concerns the decline of text, and the exploding reach and power of audio and video.

THIS MULTIMEDIA INTERNET has been gaining on the text-based internet for years. But last year, the story accelerated sharply, and now audio and video are unstoppable. The most influential communicators online once worked on web pages and blogs. They’re now making podcasts, Netflix shows, propaganda memes, Instagram and YouTube channels, and apps like HQ Trivia.

Consider the most compelling digital innovations now emerging: the talking assistants that were the hit of the holidays, Apple’s face-reading phone, artificial intelligence to search photos or translate spoken language, and augmented reality — which inserts any digital image into a live view of your surroundings.
============= Result
Origin string length in ASCII: 1158
Encoded string length in ASCII: 772