Support Async Queries via Celery

## Description

Superset can use [async queries](https://superset.apache.org/docs/configuration/async-queries-celery/) to support long running queries and offload heavy workloads.

I started spiking here https://github.com/stackabletech/superset-operator/tree/feat/enable-celery-workers

- This is a working POC with a working integrationtest.
- Currently only the CeleryWorker is deployed via Deployment (no Beat)
- Redis and celery settings are currently done via overrides. Requires:
  - https://github.com/stackabletech/decisions/issues/73
  - https://github.com/stackabletech/decisions/issues/72
- Code restructuring

## Requirements

We have to add two new roles:

- CeleryWorker: heavy data processing (DEPLOYMENT, OPTIONAL)
- CeleryBeat: SINGULAR instance for scheduling jobs, not required for SQLLab testing but (DEPLOYMENT, one single Pod, OPTIONAL):
  - Scheduled Reports: If you want a dashboard emailed to you every Monday at 9:00 AM, the Beat is what "wakes up" at 9:00 AM to start that process.
  - Alerts: If you have a rule that says "Alert me if sales drop below $100," the Beat triggers the "check" task every few minutes.
  - Cache Warming: If you've configured Superset to pre-load dashboard data into the cache overnight, that is a scheduled task.
  - Thumbnail Generation: While some thumbnails are generated on-demand, periodic cleanup or bulk processing tasks are often scheduled.

Currently:
```rust
#[derive(Clone, CustomResource, Debug, Deserialize, JsonSchema, PartialEq, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct SupersetClusterSpec {
    // no doc - docs in the struct.
    pub image: ProductImage,

    // [...]

    // no doc - docs in the struct.
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub nodes: Option<Role<v1alpha1::SupersetConfigFragment, SupersetRoleConfig>>,

    // no doc - docs in the struct.
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub workers: Option<Role<v1alpha1::SupersetConfigFragment, SupersetRoleConfig>>,

    // no doc - docs in the struct.
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub beat: Option<Role<v1alpha1::SupersetConfigFragment, SupersetRoleConfig>>,
}
```

## Configuration

The worker and beat instances should usually be configured the same as the webserver (one superset_config.py) which is not much effort.

## Logging

I had some problems using our framework and configure logging in general. Just using defaults is set to `WARNING` and therefore not logging anything.

I tried the following in the `CeleryConfig` without success:

```
worker_hijack_root_logger = False 
worker_log_color = False
worker_redirect_stdouts = True
worker_redirect_stdouts_level = "DEBUG"
```

The only thing that made the worker produce any logs for me was adding `loglevel=INFO` to the start command like:

```
celery --app=superset.tasks.celery_app:app worker --loglevel=INFO
```

which produced some logs when started in async mode:

```
superset [2026-03-03 11:26:30,361: INFO/MainProcess] mingle: searching for neighbors
superset [2026-03-03 11:26:31,366: INFO/MainProcess] mingle: all alone
superset [2026-03-03 11:26:31,372: INFO/MainProcess] celery@superset-worker-default-7448b7c6c-7rs4h ready.
superset [2026-03-03 11:26:55,172: INFO/MainProcess] Task sql_lab.get_sql_results[8c7952d5-ffe8-464d-a3c6-4601d364f9bb] received
superset [2026-03-03 11:26:55,241: INFO/ForkPoolWorker-15] Query 1: Executing 1 statement(s)
superset [2026-03-03 11:26:55,241: INFO/ForkPoolWorker-15] Query 1: Set query to 'running'
superset [2026-03-03 11:26:55,258: INFO/ForkPoolWorker-15] Query 1: Running statement 1 out of 1
superset [2026-03-03 11:26:55,282: INFO/ForkPoolWorker-15] Query 1: Storing results in results backend, key: faa5f5c9-04da-4393-a9b9-51035ab33f81
superset [2026-03-03 11:26:55,286: INFO/ForkPoolWorker-15] Task sql_lab.get_sql_results[8c7952d5-ffe8-464d-a3c6-4601d364f9bb] succeeded in 0.11297486900002696s: None
```

## Monitoring

Extra monitoring capabilities for a Celery cluster: `pip install flower` (not tested)

```
celery --app=superset.tasks.celery_app:app flower
```

## Further discussions

- The logging must be consolidated with the stackable approach
- How do we want to deploy the Beat instance (there should always be only one)
  - In Kubernetes it is a deployment, the question is rather for how we want the "role" to work. We have an example where we have a workaround if we do not need everything: https://github.com/stackabletech/spark-k8s-operator/blob/ce425074763568079c243b55bf8f2a9d7493317e/rust/operator-binary/src/connect/crd.rs#L113
- We have to check what Queues / Brokers we support (e.g. redis)
- We need e.g. redis and want the [generic database connection](https://github.com/stackabletech/issues/issues/238)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Async Queries via Celery #698

Description

Requirements

Configuration

Logging

Monitoring

Further discussions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Support Async Queries via Celery #698

Description

Description

Requirements

Configuration

Logging

Monitoring

Further discussions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions