Optimizing lossless compression by normalized data length in Huffman Algorithm

Due to the grown need of storage space, the demand for efficient compression scheme becomes increasingly important. One of the lossless data compression goals is to archive raw audio data to ensure the file is restored to the original form when it is to be reused. Generally, raw data is stored as 16...

Full description

Saved in:
Bibliographic Details
Main Author: Tonny, Hidayat
Format: Thesis
Language:English
English
Published: 2022
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/26986/1/Optimizing%20lossless%20compression%20by%20normalized%20data%20length%20in%20Huffman%20Algorithm.pdf
http://eprints.utem.edu.my/id/eprint/26986/2/Optimizing%20lossless%20compression%20by%20normalized%20data%20length%20in%20Huffman%20Algorithm.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Due to the grown need of storage space, the demand for efficient compression scheme becomes increasingly important. One of the lossless data compression goals is to archive raw audio data to ensure the file is restored to the original form when it is to be reused. Generally, raw data is stored as 16-bit (65,536 difference possible values). Huffman Algorithms is currently still very effective at compressing 8-bit data, which can be grouped into Static, Dynamic, and Adaptive extensions, however its performance cannot be determined if it is performed on data that has several variables and probabilities. Based on the literature review, the measurement of the compression performance for files archives is to use the Compression Ratio (CR) and Compression Time (CT) indicators. These two indicators are used to calculate and analyse the file size reduction and the ability of the file to be reconstructed back to its original form without compromising its quality. This research produces a new scheme called Quaternary Arity (4-ary) Modification Quadtree (MQ) or 4-ary/MQ based on entropy coding which has its roots in other variants of Huffman schemes such as Binary / Static, Quadtree, Octatree, and Hexatree. The 4-ary/MQ method employs the characteristics of the Quadtree structure and extends the Dynamic Huffman coding mechanism (FGK rule) in node arrangement while adopting the Adaptive Huffman method that uses additional variable data. The novelty of this scheme is the work of adding additional variables to maintain the branch root to ensure it is always consistent with four branches. A descriptive analysis of the 4-ary/MQ was performed on several audio datasets (Music, Mono Music, Stereo Music, Ripping CD, Speech, Noise, Sound Effects, and Instruments) to compare with the Huffman Schematic Variant. A comparative analysis with several lossless compression applications has significantly shown that CR is more optimal than PKZIP, WinZip, 7-Zip, and Monkeys Audio. It was found that the 4-ary/MQ compression benefits the compressed data that is stored in local storage media as well as for hosting and optimizing bandwidth. The new algorithm also has a good performance in producing optimal CR with fast CT in most of the 16-bit WAV audio datasets. The proposed new algorithm has more optimal CR than the various variants of the Huffman-based lossless application. It is also expected that this new algorithm scheme may potentially work well on data above 16-bit for future research.