Skip to content

Commit f8b629c

Browse files
Merge pull request #326 from pitrou/fundable-arrow-binary-view
Add BinaryView in Arrow C++ as a fundable project
2 parents 4917272 + d602564 commit f8b629c

File tree

6 files changed

+86
-1
lines changed

6 files changed

+86
-1
lines changed

src/components/fundable/MenuSideBar.tsx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@ import styles from "./styles.module.css";
55
const sections = [
66
{ id: 'jupyter-ecosystem', label: 'Jupyter ecosystem' },
77
{ id: 'package-management', label: 'Package management' },
8-
{ id: 'scientific-computing', label: 'Scientific computing' }
8+
{ id: 'scientific-computing', label: 'Scientific computing' },
9+
{ id: 'apache-arrow', label: 'Apache Arrow and Parquet' }
910
];
1011

1112
export default function MenuSideBar() {
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
#### Overview
2+
3+
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics.
4+
5+
Representation of string and binary data in Arrow traditionally uses the Binary layout, where the entire string data resides in a separate buffer that is accessed using indirect indexing from a buffer of offsets.
6+
7+
Recently, the Arrow project added the Binary View layout, a more efficient layout inspired from modern execution engines where the beginning of each string is packed directly within the offsets buffer. This allows short strings to be read and processed directly without going through an additional indirection.
8+
9+
However, while basic support is present, Binary View is not universally supported by all Arrow components.
10+
11+
We propose to finish implementing support for Binary View and String View types in all components of Arrow C++:
12+
13+
* scalar compute kernels:
14+
- `equal`, `less_equal`, etc.
15+
- `is_in`, `index_in`
16+
- `ascii_*`, `binary_*`, `utf8_*`
17+
- `string_is_ascii`
18+
- `count_substring`
19+
- `extract_regex`, `extract_regex_span`
20+
- `split_pattern`, `split_pattern_regex`
21+
- `coalesce`
22+
23+
* vector compute kernels:
24+
- `take`, `filter`, `scatter`
25+
- `run_end_encode`, `run_end_decode`
26+
- `sort_indices`, `rank`, `rank_normal`, `rank_quantile`
27+
- `partition_nth_indices`
28+
- `select_k_unstable`
29+
- `replace_with_mask`
30+
- `fill_null_forward`, `fill_null_backward`, `drop_null`
31+
32+
* aggregate compute kernels:
33+
- `count_distinct`
34+
- `first`, `last`, `min`, `max`
35+
- `index`
36+
37+
* CSV reader and writer
38+
39+
* ORC reader and writer
40+
41+
Funders can decide to fund the entire package, or choose the components they are interested in.
42+
43+
##### Are you interested in this project? Either entirely or partially, contact us for more information on how to help us fund it

src/components/fundable/index.tsx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,12 @@ export function MainAreaFundableProjects() {
3737
projectCategory={fundableProjectsDetails.scientificComputing}
3838
/>
3939
</section>
40+
<section id="apache-arrow">
41+
<ProjectCategory
42+
projectCategoryName={"Apache Arrow and Parquet"}
43+
projectCategory={fundableProjectsDetails.apacheArrow}
44+
/>
45+
</section>
4046
<section id="propose-and-fund-a-project">
4147
<h2 className={styles.project_category_header} style={{ margin: "0px" }}>Can't find a project?</h2>
4248
<p style={{ marginTop: "var(--ifm-spacing-lg)" }}>If you have a project in mind that you think would be relevant to our expertise, please contact us to discuss it.</p>

src/components/fundable/projectsDetails.ts

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import JupyterGISToolsForPythonAPIMD from "@site/src/components/fundable/descrip
44
import EmscriptenForgePackageRequestsMD from "@site/src/components/fundable/descriptions/EmscriptenForgePackageRequests.md"
55
import SVE2SupportInXsimdMD from "@site/src/components/fundable/descriptions/SVE2SupportInXsimd.md"
66
import MatrixOperationsInXtensor from "@site/src/components/fundable/descriptions/MatrixOperationsInXtensor.md"
7+
import BinaryViewInArrowCpp from "@site/src/components/fundable/descriptions/BinaryViewInArrowCpp.md"
78

89
export const fundableProjectsDetails = {
910
jupyterEcosystem: [
@@ -84,6 +85,22 @@ export const fundableProjectsDetails = {
8485
currentFundingPercentage: 0,
8586
repoLink: "https://github.com/xtensor-stack/xtensor"
8687
}
88+
],
89+
90+
apacheArrow: [
91+
{
92+
category: "Apache Arrow and Parquet",
93+
title: "Complete BinaryView / StringView support in Arrow C++",
94+
pageName: "BinaryViewInApacheArrow",
95+
shortDescription: "BinaryView is a more recent and more efficient alternative to Arrow's standard Binary type. It allows for inlined storage of short strings and fast prefix comparison.",
96+
description: BinaryViewInArrowCpp,
97+
price: "TBD",
98+
maxNbOfFunders: 4,
99+
currentNbOfFunders: 0,
100+
currentFundingPercentage: 0,
101+
repoLink: "https://github.com/apache/arrow"
102+
}
87103
]
104+
88105
}
89106

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
import useDocusaurusContext from '@docusaurus/useDocusaurusContext';
2+
import GetAQuotePage from '@site/src/components/fundable/GetAQuotePage';
3+
4+
export default function FundablePage() {
5+
const { siteConfig } = useDocusaurusContext();
6+
return (
7+
<GetAQuotePage/>
8+
);
9+
}
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
import useDocusaurusContext from '@docusaurus/useDocusaurusContext';
2+
import LargeProjectCardPage from '@site/src/components/fundable/LargeProjectCardPage';
3+
4+
export default function FundablePage() {
5+
const { siteConfig } = useDocusaurusContext();
6+
return (
7+
<LargeProjectCardPage/>
8+
);
9+
}

0 commit comments

Comments
 (0)