|
| 1 | +#### Overview |
| 2 | + |
| 3 | +Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics. |
| 4 | + |
| 5 | +Fixed-width decimal data in Arrow is usually represented the Decimal128 data type. |
| 6 | +This data type has non-trivial memory costs (16 bytes per value) and computational costs (operations on 128-bit integers must be emulated on most if not all architectures). |
| 7 | + |
| 8 | +Arrow recently gained Decimal32 and Decimal64 data types which, as their names suggest, encode fixed-width decimal data more compactly. |
| 9 | +Decimal32 (resp. Decimal64) is able to represent up to 9 (resp. 18) decimal digits of precision, which is sufficient in many applications. |
| 10 | + |
| 11 | +However, while basic support is present, Decimal32 and Decimal64 are not universally supported by all Arrow components. |
| 12 | + |
| 13 | +We propose to finish implementing support for Decimal32 and Decimal64 types in all components of Arrow C++: |
| 14 | + |
| 15 | +* scalar compute kernels: |
| 16 | + - `abs` |
| 17 | + - `round` |
| 18 | + - `is_in`, `index_in` |
| 19 | + - `coalesce` |
| 20 | + - `min_element_wise`, `max_element_wise` |
| 21 | + |
| 22 | +* vector compute kernels: |
| 23 | + - `dictionary_encode`, `unique`, `value_counts` |
| 24 | + - `pairwise_diff` |
| 25 | + - `select_k_unstable` |
| 26 | + - `replace_with_mask` |
| 27 | + - `fill_null_forward`, `fill_null_backward` |
| 28 | + |
| 29 | +* aggregate compute kernels: |
| 30 | + - `sum`, `mean`, `mode`, `tdigest` |
| 31 | + - `first`, `last`, `min`, `max` |
| 32 | + - `index` |
| 33 | + |
| 34 | +* CSV reader and writer |
| 35 | + |
| 36 | +* ORC reader and writer |
| 37 | + |
| 38 | +Funders can decide to fund the entire package, or choose the components they are interested in. |
| 39 | + |
| 40 | +##### Are you interested in this project? Either entirely or partially, contact us for more information on how to help us fund it |
0 commit comments