VLSI Architecture of Arithmetic Coder Used in SPIHT pdf

VLSI Architecture of Arithmetic Coder Used in SPIHT
1VLSI Architecture.pdf (Size: 1.21 MB / Downloads: 62)
Abstract
A highthroughput memoryefficient arithmetic
coder architecture for the set partitioning in hierarchical trees
(SPIHT) image compression is proposed based on a simple context
model in this paper. The architecture benefits from various optimizations
performed at different levels of arithmetic coding from
higher algorithm abstraction to lower circuits’ implementations.
First, the complex context model used by software is mitigated
by designing a simple context model, which just uses the brother
nodes’ states in the coding zerotree of SPIHT to form context symbols
for the arithmetic coding. The simple context model results
in a regular access pattern during reading the wavelet transform
coefficients, which is convenient to the hardware implementation,
but at a cost of slight performance loss. Second, in order to avoid
rescanning the wavelet transform coefficients, a breadth first
search SPIHT without lists algorithm is used instead of SPIHT
with lists algorithm. Especially, the coding bitplanes of each zero
tree are processed in parallel. Third, an outoforder execution
mechanism for different types of context is proposed that can
allocate the context symbol to the idle arithmetic coding core with
a different order that of the input. For the balance of the input rate
of the wavelet coefficients, eight arithmetic coders are replicated in
the compression system. And in one arithmetic coder, there exists
four cores to process different contexts. Fourth, several dedicated
circuits are designed to further improve the throughput of the
architecture.
INTRODUCTION
AS arithmetic coding (AC) [1], [2] method can obtain optimal
performance for its ability to generate codes with
fractional bits, it is widely used by various image compression
algorithms, such as the QM in JPEG [3], the MQ in JPEG2000
[4]–[6], and the contextbased adaptive binary arithmetic coder
(CABAC) [7], [8] in H.264. Especially, the set partitioning in
hierarchical trees (SPIHT) [9] uses an AC method to improve
its peak signaltonoise ratio (PSNR) about 0.5 dB. Although
the theory and program code of AC are mature, the complicated
internal operations of AC limit its application for some real time
fields, such as satellite image and high speed camera image compressions.
In order to achieve performance gains, high speed architecture
of AC in compression scenarios must be designed to
meet the throughput requirement.
Thus both industrial and academic research groups have put
their efforts to AC hardware architectures for various image
compression systems. However, there are two main challenges
in hardware architecture design for high speed applications. One
is data dependencies in AC which require the result of iteration
before next run can commence during the adaptive model update
and internal loops. The other one is that AC requires increasingly
greater precision as more data arrive. In order to deal
with such difficulties, several architectures are proposed in the
past years.
SPIHT Image Compression
SPIHT with lists algorithm uses three different lists to store
significant information of wavelet coefficients for image coding
purpose. Three lists are list of insignificant sets (LIS), list of
insignificant pixels (LIP), and list of significant pixels (LSP).
At first, SPIHT combines nodes of a coefficient tree in wavelet
domain and its successor nodes into one set which is denoted
as insignificant. With traveling each tree node, sets in the LIS
are partitioned into four different subsets which are tested for
significant state.
Challenges With SPIHT Arithmetic Coding Architecture
As far as hardware architecture of arithmetic coder in SPIHT
is concerned, there are three main challenges for designers to
solve during realtime implementation. First, a large amount of
coding symbols is supplied to arithmetic coder which can be
a bottleneck for high speed realtime applications. Because the
scheme of SPIHT is a bitplane based method, which codes each
bitplane from the most significant bitplane (MSB) to the least
significant bitplane (LSB) sequentially, the quantity of context
symbols for arithmetic coder will be proportional to the coded
planes that are determined by the maximal wavelet coefficient.
For the 9/7 wavelet filter, the precision of wavelet coefficient
will be increased compared to the pixel precision after transformation
stage. Therefore in order to keep speed balance between
the wavelet transformation and the arithmetic coder, the
throughput of arithmetic coder must match the input rate of the
wavelet stage. For the design of arithmetic coder, we test some
typical images with different pixel precision and compression
ratio using the QccPackSPIHT software.
EXPERIMENTAL RESULTS
Software Results
The experimental results come in two folds, i.e., software and
hardware. First, PSNR results for typical images using different
SPIHT methods are recorded, including SPIHT with arithmetic,
SPIHT without arithmetic, SPIHT without lists and arithmetic
and our SPIHT prototype. Table VI lists the detailed data. From
the results, SPIHTHW is slightly lower than SPIHTAC as the
precision is limited during the wavelet transform and a simple
context model is involved.
CONCLUSION
Arithmetic coding makes itself a standard technique for its
high efficiency. However, as far as hardware implementation is
concerned, the complexity of calculation limits AC in the filed
of high speed realtime coding. For improvement of throughput
purpose, we propose a high speed architecture of AC used in
SPIHT without lists algorithm. In the architecture, a simple context
scheme is used first to reduce the memory size. Then high
speed calculation units are employed for speedup purpose. Especially,
a power control module can reduce the power dissipation
efficiently. It is a high parallelism and calculation device
that makes the speed of context processing fast. From the simulation
results, our AC architecture can meet many high speed
image compression requirements. And the degradation of performance
incurred by the fixed point calculation is slight. 

