Skip to content

Support Async Queries via Celery #698

@maltesander

Description

@maltesander

Description

Superset can use async queries to support long running queries and offload heavy workloads.

I started spiking here https://github.com/stackabletech/superset-operator/tree/feat/enable-celery-workers

Requirements

We have to add two new roles:

  • CeleryWorker: heavy data processing (DEPLOYMENT, OPTIONAL)
  • CeleryBeat: SINGULAR instance for scheduling jobs, not required for SQLLab testing but (DEPLOYMENT, one single Pod, OPTIONAL):
    • Scheduled Reports: If you want a dashboard emailed to you every Monday at 9:00 AM, the Beat is what "wakes up" at 9:00 AM to start that process.
    • Alerts: If you have a rule that says "Alert me if sales drop below $100," the Beat triggers the "check" task every few minutes.
    • Cache Warming: If you've configured Superset to pre-load dashboard data into the cache overnight, that is a scheduled task.
    • Thumbnail Generation: While some thumbnails are generated on-demand, periodic cleanup or bulk processing tasks are often scheduled.

Currently:

#[derive(Clone, CustomResource, Debug, Deserialize, JsonSchema, PartialEq, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct SupersetClusterSpec {
    // no doc - docs in the struct.
    pub image: ProductImage,

    // [...]

    // no doc - docs in the struct.
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub nodes: Option<Role<v1alpha1::SupersetConfigFragment, SupersetRoleConfig>>,

    // no doc - docs in the struct.
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub workers: Option<Role<v1alpha1::SupersetConfigFragment, SupersetRoleConfig>>,

    // no doc - docs in the struct.
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub beat: Option<Role<v1alpha1::SupersetConfigFragment, SupersetRoleConfig>>,
}

Configuration

The worker and beat instances should usually be configured the same as the webserver (one superset_config.py) which is not much effort.

Logging

I had some problems using our framework and configure logging in general. Just using defaults is set to WARNING and therefore not logging anything.

I tried the following in the CeleryConfig without success:

worker_hijack_root_logger = False 
worker_log_color = False
worker_redirect_stdouts = True
worker_redirect_stdouts_level = "DEBUG"

The only thing that made the worker produce any logs for me was adding loglevel=INFO to the start command like:

celery --app=superset.tasks.celery_app:app worker --loglevel=INFO

which produced some logs when started in async mode:

superset [2026-03-03 11:26:30,361: INFO/MainProcess] mingle: searching for neighbors
superset [2026-03-03 11:26:31,366: INFO/MainProcess] mingle: all alone
superset [2026-03-03 11:26:31,372: INFO/MainProcess] celery@superset-worker-default-7448b7c6c-7rs4h ready.
superset [2026-03-03 11:26:55,172: INFO/MainProcess] Task sql_lab.get_sql_results[8c7952d5-ffe8-464d-a3c6-4601d364f9bb] received
superset [2026-03-03 11:26:55,241: INFO/ForkPoolWorker-15] Query 1: Executing 1 statement(s)
superset [2026-03-03 11:26:55,241: INFO/ForkPoolWorker-15] Query 1: Set query to 'running'
superset [2026-03-03 11:26:55,258: INFO/ForkPoolWorker-15] Query 1: Running statement 1 out of 1
superset [2026-03-03 11:26:55,282: INFO/ForkPoolWorker-15] Query 1: Storing results in results backend, key: faa5f5c9-04da-4393-a9b9-51035ab33f81
superset [2026-03-03 11:26:55,286: INFO/ForkPoolWorker-15] Task sql_lab.get_sql_results[8c7952d5-ffe8-464d-a3c6-4601d364f9bb] succeeded in 0.11297486900002696s: None

Monitoring

Extra monitoring capabilities for a Celery cluster: pip install flower (not tested)

celery --app=superset.tasks.celery_app:app flower

Further discussions

Metadata

Metadata

Assignees

Type

No type

Projects

Status

In Refinement

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions