有一天,我在的一个微信群的人聊起了压缩相关的事情,然后群主组织了一个编程比赛,要求是压缩一段ASCII文本。我是用Ruby 以Huffman编码的方式实现了一下。刚开始的时候自以为很简单。结果用了好久才写好。用这个机会我也好好学了一下。

代码如下:


class TreeNode
    attr_accessor :data, :size, :left, :right
    def initialize(data, left=nil, right=nil)
        self.data = data
        self.left = left
        self.right = right
    end

    def leaf?
        left.nil? and right.nil?
    end
end

def get_frequency_table(str)
    table = {}
    str.split("").each do |char|
        if table[char]
            table[char] += 1
        else
            table[char] = 1
        end
    end
    table
end

def get_huffman_tree(frequency_table)
    nodes = frequency_table.sort_by {|char, frequency| frequency}.map{|item| TreeNode.new(item)}

    # add place holder
    max_frequncy_node = TreeNode.new(["", nodes.last.data[1] + 1])
    nodes << max_frequncy_node

    while nodes.size > 1
        node_0 = nodes[0]
        node_1 = nodes[1]
        new_node = TreeNode.new([nil, node_0.data[1] + node_1.data[1]], node_0, node_1)

        # new arry
        nodes = nodes[2, nodes.size] << new_node
        nodes.sort_by! {|node| node.data[1]}
    end
    nodes.first
end

def get_huffman_coding_table(huffman_tree)
    return {huffman_tree.data[0] => "0"} if huffman_tree.leaf?
    depth_first_iterator(huffman_tree, "", {})
end

def depth_first_iterator(node, code, table)
    if node.leaf?
        table[node.data[0]] = code
    end
    
    table.merge(depth_first_iterator(node.left, code + "0", table)) if node.left
    table.merge(depth_first_iterator(node.right, code + "1", table)) if node.right
    table
end

def encode(str, table)
    bin_str = str.split('').map {|char| table[char]}.join

    while bin_str.size % 7 != 0
        # add place holder 
        bin_str += table.to_a.last[1]
    end

    bin_str.scan(/.{7}/).map do |oct_string|
        oct_string.to_i(2).chr
    end.join
end

def decode(str, table)
    table = Hash[table.map { |i| [i[1], i[0]] }]
    result = ""

    code = ""
    str.split('').map do |char|
        bin_str = char.ord.to_s(2)
        "0" * (7 - bin_str.size) + bin_str
    end.join.split('').each do |bit|
        code += bit
        if table[code]
            result += table[code]
            code = ""
        end
    end
    result
end

str = %(
I’ll make this short: The thing you’re doing now, reading prose on a screen, is going out of fashion.

We’re taking stock of the internet right now, with writers who cover the digital world cataloging some of the most consequential currents shaping it. If you probe those currents and look ahead to the coming year online, one truth becomes clear. The defining narrative of our online moment concerns the decline of text, and the exploding reach and power of audio and video.

THIS MULTIMEDIA INTERNET has been gaining on the text-based internet for years. But last year, the story accelerated sharply, and now audio and video are unstoppable. The most influential communicators online once worked on web pages and blogs. They’re now making podcasts, Netflix shows, propaganda memes, Instagram and YouTube channels, and apps like HQ Trivia.

Consider the most compelling digital innovations now emerging: the talking assistants that were the hit of the holidays, Apple’s face-reading phone, artificial intelligence to search photos or translate spoken language, and augmented reality — which inserts any digital image into a live view of your surroundings.
)
table = get_huffman_coding_table(get_huffman_tree(get_frequency_table(str)))
encoded_string = encode(str, table)
decoded_srring = decode(encoded_string, table)

puts "============= Origin"
puts str
puts "============= Encoded"
puts encoded_string
puts "============= Decoded"
puts decoded_srring
puts "============= Result"
puts "Origin string length in ASCII: #{str.size}"
puts "Encoded string length in ASCII: #{encoded_string.size}"

运行结果:

ruby practice.rb
============= Origin

I’ll make this short: The thing you’re doing now, reading prose on a screen, is going out of fashion.

We’re taking stock of the internet right now, with writers who cover the digital world cataloging some of the most consequential currents shaping it. If you probe those currents and look ahead to the coming year online, one truth becomes clear. The defining narrative of our online moment concerns the decline of text, and the exploding reach and power of audio and video.

THIS MULTIMEDIA INTERNET has been gaining on the text-based internet for years. But last year, the story accelerated sharply, and now audio and video are unstoppable. The most influential communicators online once worked on web pages and blogs. They’re now making podcasts, Netflix shows, propaganda memes, Instagram and YouTube channels, and apps like HQ Trivia.

Consider the most compelling digital innovations now emerging: the talking assistants that were the hit of the holidays, Apple’s face-reading phone, artificial intelligence to search photos or translate spoken language, and augmented reality — which inserts any digital image into a live view of your surroundings.
============= Encoded
Zd
  PmSf_3khNo!Z,
5&[>A\XcA~My#
             v:3o[w3z
                     hKGh8pP|i	?x
                                  v[>C*!0)q_r$8(6z   cA~@$6`H}_P
                                                  msG           /[C7PFT mgePq
?A&StF;8!Z:>i^}CzDi-rGx6z>;`S+K\>,wt>:=lwg1o'd9;%Dvz`EB`S                    OG#BC7QC|9yywNaI(OZ7_0~OV%A!
                                                         ^j2t8(}bNFcX9yy{[gPmt_,t8(>zC7SAF(8eA&|_ME`qKAvqa"t
                                                                                                            O
                                                                                                             A`9g%iiGx&_&PO~k}8?AT4}8=9_ ;%
                                                                                                                                           Gd`aC0{CBa{<q`'n59;y&CDcM..Pmtxo^T/o\G6SpsfW6(j~E`s dx\H><.}3c;*Gb;=
                  cB`$4v
                        JD|v
                            4a>)_?#qg#sw mCGde"wQ8!by[m-6}p1BR
"m,"p8oQ 'fWd=}                                               F.ex~~DoF	'M8?#]JD\
               AbY3t I	@D}HJK@ksb[C7QC|9ylD5"
                                              puL8fXEA5tT
2G5^@K{A^|QKF->;z3"|%"                                   E{bxkCz
sn.                   d!;:
   s?Q{Cm9r
>O:'2\
      "Eoh\ L;af|zQ Ks0Yw6?"s*jD
                                PX7EP\j>MgZ7[t`eA<K
============= Decoded

I’ll make this short: The thing you’re doing now, reading prose on a screen, is going out of fashion.

We’re taking stock of the internet right now, with writers who cover the digital world cataloging some of the most consequential currents shaping it. If you probe those currents and look ahead to the coming year online, one truth becomes clear. The defining narrative of our online moment concerns the decline of text, and the exploding reach and power of audio and video.

THIS MULTIMEDIA INTERNET has been gaining on the text-based internet for years. But last year, the story accelerated sharply, and now audio and video are unstoppable. The most influential communicators online once worked on web pages and blogs. They’re now making podcasts, Netflix shows, propaganda memes, Instagram and YouTube channels, and apps like HQ Trivia.

Consider the most compelling digital innovations now emerging: the talking assistants that were the hit of the holidays, Apple’s face-reading phone, artificial intelligence to search photos or translate spoken language, and augmented reality — which inserts any digital image into a live view of your surroundings.
============= Result
Origin string length in ASCII: 1158
Encoded string length in ASCII: 772