Misplaced Pages

Binary Reed–Solomon encoding

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
This article is an orphan, as no other articles link to it. Please introduce links to this page from related articles; try the Find link tool for suggestions. (August 2016)
This article may require cleanup to meet Misplaced Pages's quality standards. The specific problem is: broken tables, non-English diagram, unclear text. Please help improve this article if you can. (June 2018) (Learn how and when to remove this message)
This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations. Please help improve this article by introducing more precise citations. (November 2024) (Learn how and when to remove this message)
(Learn how and when to remove this message)

Binary Reed–Solomon coding (BRS), which belongs to a RS code, is a way of encoding that can fix node data loss in a distributed storage environment. It has maximum distance separable (MDS) encoding properties. Its encoding and decoding rate outperforms conventional RS coding and optimum CRS coding.

BRS coding

Background

RS coding is a fault-tolerant encoding method for a distributed storage environment. Suppose we wish to distribute data across k individual devices for improved storage capacity or bandwidth, for example in a hardware RAID setup. Such a configuration risks significant data loss in the event of device failure. The Reed-Solomon encoding produces a storage coding system which robust to the simultaneous failure of any subset of m nodes. To do this, we adding m additional nodes to the system, for a total of n = k + m storage nodes.

Traditional RS encoding method uses the Vandermonde matrix as a coding matrix and its inverse as the decoding matrix. Traditional RS encoding and decoding operations are all carried out on a large finite domain.

Because BRS encoding and decoding employ only shift and XOR operations, they are much faster than traditional RS coding. The algorithm of BRS coding is proposed by the advanced network technology laboratory of Peking University, and it also released the open source implementation of BRS coding. In the actual environment test, the encoding and decoding speed of BRS is faster than that of CRS. In the design and implementation of distributed storage system, using BRS coding can make the system have the characteristics of fault tolerant regeneration.

Principle

BRS encoding principle

The structure of traditional Reed–Solomon codes is based on finite fields, and the BRS code is based on the shift and XOR operation. BRS encoding is based on the Vandermonde matrix, and its specific encoding steps are as follows:

  1. Equally divides the original data blocks into k blocks, and each block of data has L-bit data, recorded as S = ( s 0 , s 1 , . . . , s k 1 ) {\displaystyle S=(s_{0},s_{1},...,s_{k-1})} where s i = s i , 0 s i , 1 . . . s i , L 1 {\displaystyle s_{i}=s_{i,0}s_{i,1}...s_{i,L-1}} , i = 0 , 1 , 2 , . . . , k 1 {\displaystyle i=0,1,2,...,k-1} .
  2. Builds the calibration data block M {\displaystyle M} , M {\displaystyle M} has a total of n k {\displaystyle n-k} blocks: M = ( m 0 , m 1 , . . . , m n k 1 ) {\displaystyle M=(m_{0},m_{1},...,m_{n-k-1})} where m i = j = 0 k 1 s j ( r j i ) {\displaystyle m_{i}=\sum _{j=0}^{k-1}s_{j}(r_{j}^{i})} , i = 0 , 1 , . . . , n k 1 {\displaystyle i=0,1,...,n-k-1} .
    The addition here are all XOR operation, where r j i {\displaystyle r_{j}^{i}} represents the number of bits of "0" added to the front of the original data block s j {\displaystyle s_{j}} . Thereby forming a parity data block m i {\displaystyle m_{i}} . r j i {\displaystyle r_{j}^{i}} is given by the following way: ( r 0 a , r 1 a , . . . , r k 1 a ) = ( 0 , a , 2 a , . . . ( k 1 ) a ) {\displaystyle (r_{0}^{a},r_{1}^{a},...,r_{k-1}^{a})=(0,a,2a,...(k-1)a)} where a = 0 , 1 , . . . n k 1 {\displaystyle a=0,1,...n-k-1} .
  3. Each node stores data, nodes N i ( i = 0 , 1 , . . . , n 1 ) {\displaystyle N_{i}(i=0,1,...,n-1)} store the data as s 0 , s 1 , . . . , s k 1 , m 0 , m 1 , . . . , m n k 1 {\displaystyle s_{0},s_{1},...,s_{k-1},m_{0},m_{1},...,m_{n-k-1}} .

BRS encoding example

If now n = 6 , k = 3 {\displaystyle n=6,k=3} , there I D 0 = ( 0 , 0 , 0 ) {\displaystyle ID_{0}=(0,0,0)} , I D 0 = ( 0 , 1 , 2 ) {\displaystyle ID_{0}=(0,1,2)} , I D 0 = ( 0 , 2 , 4 ) {\displaystyle ID_{0}=(0,2,4)} . The original data block are s i = s 0 , s 1 , . . . , s L 1 {\displaystyle s_{i}=s_{0},s_{1},...,s_{L-1}} , where i = 0 , 1 , . . . , k 1 {\displaystyle i=0,1,...,k-1} . The calibration data for each block are m i = m i , 0 m i , 1 . . . m x i , L + i × ( k 1 ) 1 {\displaystyle m_{i}=m_{i,0}m_{i,1}...mx_{i,L+i\times (k-1)-1}} , where i = 0 , 1 , . . . , k 1 {\displaystyle i=0,1,...,k-1} .

Calculation of calibration data blocks is as follows, the addition operation represents a bit XOR operation:

m 0 = s 0 ( 0 ) s 1 ( 0 ) s 2 ( 0 ) {\displaystyle m_{0}=s_{0}(0)\oplus s_{1}(0)\oplus s_{2}(0)} , so m 0 = ( m 0 , 0 m 0 , 1 . . . m 0 , 5 ) {\displaystyle m_{0}=(m_{0,0}m_{0,1}...m_{0,5})}

m 1 = s 0 ( 0 ) s 1 ( 1 ) s 2 ( 2 ) {\displaystyle m_{1}=s_{0}(0)\oplus s_{1}(1)\oplus s_{2}(2)} , so m 1 = ( m 1 , 0 m 1 , 1 . . . m 1 , 7 ) {\displaystyle m_{1}=(m_{1,0}m_{1,1}...m_{1,7})}

m 2 = s 0 ( 0 ) s 1 ( 2 ) s 2 ( 4 ) {\displaystyle m_{2}=s_{0}(0)\oplus s_{1}(2)\oplus s_{2}(4)} , so m 2 = ( m 2 , 0 m 2 , 1 . . . m 2 , 9 ) {\displaystyle m_{2}=(m_{2,0}m_{2,1}...m_{2,9})}

BRS decoding principle

In the structure of BRS code, we divide the original data blocks into k {\displaystyle k} blocks. They are S = ( s 0 , s 1 , . . . , s k 1 ) {\displaystyle S=(s_{0},s_{1},...,s_{k-1})} . And encoding has been n {\displaystyle n} block calibration data blocks, there are M = ( m 0 , m 1 , . . . , m n k 1 ) {\displaystyle M=(m_{0},m_{1},...,m_{n-k-1})} .

During the decoding process, there is a necessary condition: The number of undamaged calibration data blocks have to be greater than or equal to the number of the original data blocks that missing, if not, it cannot be repaired.

The following is a decoding process analysis:

Might as well make n = 6 {\displaystyle n=6} , k = 3 {\displaystyle k=3} . Then

m 0 = s 0 + s 1 + s 2 {\displaystyle m_{0}=s_{0}+s_{1}+s_{2}}

m 1 = s 0 + x s 1 + x 2 s 2 {\displaystyle m_{1}=s_{0}+xs_{1}+x^{2}s_{2}}

m 1 = s 0 + x 2 s 1 + x 4 s 2 {\displaystyle m_{1}=s_{0}+x^{2}s_{1}+x^{4}s_{2}}

Supposed s 0 {\displaystyle s_{0}} is intact, s 1 , s 2 {\displaystyle s_{1},s_{2}} miss, choose m 1 {\displaystyle m_{1}} , m 2 {\displaystyle m_{2}} to repair, make

m 1 = m 1 + s 0 {\displaystyle m_{1}^{*}=m_{1}+s_{0}}

m 2 = m 2 + s 0 {\displaystyle m_{2}^{*}=m_{2}+s_{0}}

Because m 1 {\displaystyle m_{1}} m 2 {\displaystyle m_{2}} s 0 {\displaystyle s_{0}} are known, m 1 {\displaystyle m_{1}^{*}} m 2 {\displaystyle m_{2}^{*}} are known. So that

s 1 , i 2 = m 2 , i + s 2 , i 4 {\displaystyle s_{1,i-2}=m_{2,i}^{*}+s_{2,i-4}}

s 2 , i 2 = m 1 , i + s 1 , i 1 {\displaystyle s_{2,i-2}=m_{1,i}^{*}+s_{1,i-1}}

According to the above iterative formula, each cycle can figure out two bit values ( s 1 , s 2 {\displaystyle s_{1},s_{2}} can get a bit). Each of the original data block length ( L {\displaystyle L} bit), so after repeating L {\displaystyle L} times, We can work out all the unknown bit in the original data block. by parity of reasoning, we can completed the data decoding.

Performance

Some experiments shows that, considering the encoding rate, BRS encoding rate is about 6-fold as much as RS encoding rate and 1.5-fold as much as CRS encoding rate in the single core processor, which meets the conditions that compare to RS encoding, its encoding speed upgrades no less than 200%.

Under the same conditions, for the different number of deletions, BRS decoding rate is about 4-fold as much as RS encoding rate, about 1.3-fold as much as CRS encoding rate, which meets the conditions that compare to RS encoding, the decoding speed promotes 100%.

Applications

In the current situation, the application of distributed systems is commonly used. Using erasure code to store data in the bottom of the distributed storage system can increase the fault tolerance of the system. At the same time, compared to the traditional replica strategy, erasure code technology can exponentially improve the reliability of the system for the same redundancy.

BRS encoding can be applied to distributed storage systems, for example, BRS encoding can be used as the underlying data encoding while using HDFS. Due to the advantages of performance and similarity of the encoding method, BRS encoding can be used to replace the CRS encoding in distributed systems.

Usage

There are open source codes to implement BRS encoding written in C and available on GitHub. In the design and implementation of a distributed storage system, we can use BRS encoding to store data and to achieve the system's own fault tolerance.

References

Category:
Binary Reed–Solomon encoding Add topic