diff --git a/docs/decisions/0024-competency-criteria-versioning.rst b/docs/decisions/0024-competency-criteria-versioning.rst new file mode 100644 index 000000000..47c89d593 --- /dev/null +++ b/docs/decisions/0024-competency-criteria-versioning.rst @@ -0,0 +1,83 @@ +24. How should versioning be handled for CBE competency achievement criteria? +============================================================================= + +Context +------- +Course Authors and/or Platform Administrators will be entering the competency achievement criteria rules in Studio that learners are required to meet in order to demonstrate competencies. Depending on the institution, these Course Authors or Platform Administrators may have a variety of job titles, including Instructional Designer, Curriculum Designer, Instructor, LMS Administrator, Faculty, or other Staff. + +Typically, only one person would be responsible for entering competency achievement criteria rules in Studio for each course, though this person may change over time. However, entire programs could have many different Course Authors or Platform Administrators with this responsibility. + +Typically, institutions and instructional designers do not change the mastery requirements (competency achievement criteria) for their competencies frequently over time. However, the ability to do historical audit logging of changes within Studio can be a valuable feature to those who have mistakenly made changes and want to revert or those who want to experiment with new approaches. + +Currently, Open edX always displays the latest edited version of content in the Studio UI and always shows the latest published version of content in the LMS UI, despite having more robust version tracking on the backend (Publishable Entities). + +Authoring data (criteria definitions) and runtime learner data (status) have different governance needs. The former is long-lived and typically non-PII, while the latter is user-specific, can be large (learners x criteria/competencies x time), and may require stricter retention and access controls. These differing lifecycles can make deep coupling of authoring and runtime data harder to manage at scale. Performance is also a consideration as computing or resolving versioned criteria for large courses could add overhead in Studio authoring screens or LMS views. + +Decision +-------- +For the initial implementation, versioning and traceability of competency achievement criteria will be handled with a combination of model history and lifecycle guardrails: + +1. Apply ``django-simple-history`` to competency criteria definition moodels/tables: + + - ``CompetencyCriteriaGroup`` + - ``CompetencyCriteria`` + - ``CompetencyRuleProfile`` + + This provides historical row snapshots and audit metadata for authored criteria definitions, without adopting the full publishable framework for this phase. + +2. Do not apply ``django-simple-history`` to ``oel_tagging_tag``, ``oel_tagging_taxonomy``, or ``CompetencyTaxonomy`` in this phase. + + These models are treated as non-evaluative display/metadata for competency criteria purposes; edits to names or metadata in these tables are not intended to change evaluation outcomes. + +3. ``oel_tagging_objecttag`` associations used by competency criteria follow post-use archive rules: + + - Before any related learner status exists, edits and deletes are allowed. + - After any related learner status exists, disassociation/deletion is archive-only (soft delete), not hard delete. + - Archived rows remain queryable so learner status records can continue to be traced back to their source association. + +4. Authoring guardrails must warn on potentially impactful edits: + + - If a user edits competency criteria definitions or competency object/tag associations after related learner status exists, Studio must display an explicit warning that student statuses have already been set, and these changes will be applied going forward, so existing learner statuses will not be retroactively updated. + - Applying these changes requires explicit user confirmation. + +5. Learner status models/tables are append-only history and do not use ``django-simple-history``: + + - For ``StudentCompetencyCriteriaStatus``, ``StudentCompetencyCriteriaGroupStatus``, and ``StudentCompetencyStatus``, each status change is stored as a new row with ``created`` as the write timestamp. + - Existing learner status rows are not updated in place. + - Current status is determined by the most recent row for a given learner + target entity (ordered by ``created``, with ``id`` as a tie-breaker). + - Older rows represent the learner status history and remain available for audit/tracing. + + +Rejected Alternatives +--------------------- + +1. Defer competency achievement criteria versioning for the initial implementation. Store only the latest authored criteria and expose the latest published state in the LMS, consistent with current Studio/LMS behavior. + - Pros: + - Keeps the initial implementation lightweight + - Cons: + - There is no built-in rollback or audit history + - Adding versioning later will require data migration and careful choices about draft vs published defaults +2. Each model indicates version, status, and audit fields + - Pros: + - Simple and familiar pattern (version + status + created/updated metadata) + - Straightforward queries for the current published state + - Can support rollback by marking an earlier version as published + - Stable identifiers (original_ids) can anchor versions and ease potential future migrations + - Cons: + - Requires custom conventions for versioning across related tables and nested groups + - Lacks shared draft/publish APIs and immutable version objects that other authoring apps can reuse + - Not necessarily consistent with existing patterns in the codebase (though these are already not overly consistent). +3. Publishable framework in openedx-learning + - Pros: + - First-class draft/published semantics with immutable historical versions + - Consistent APIs and patterns shared across other authoring apps + - Cons: + - Requires modeling criteria/groups as publishable entities and wiring Studio/LMS workflows to versioning APIs + - Adds schema and migration complexity for a feature that does not yet require full versioning +4. Append-only audit log table (event history) + - Pros: + - Lightweight way to capture who changed what and when + - Enables basic rollback by replaying or reversing events + - Cons: + - Requires custom tooling to reconstruct past versions + - Does not align with existing publishable versioning patterns diff --git a/requirements/base.txt b/requirements/base.txt index 3331be547..2dea2eec0 100644 --- a/requirements/base.txt +++ b/requirements/base.txt @@ -12,7 +12,7 @@ attrs==26.1.0 # via -r requirements/base.in billiard==4.2.4 # via celery -celery==5.6.2 +celery==5.6.3 # via -r requirements/base.in certifi==2026.2.25 # via requests @@ -35,7 +35,7 @@ click-plugins==1.1.1.2 # via celery click-repl==0.3.0 # via celery -cryptography==46.0.5 +cryptography==46.0.6 # via pyjwt django==5.2.12 # via @@ -60,7 +60,7 @@ django-waffle==5.0.0 # via # edx-django-utils # edx-drf-extensions -djangorestframework==3.17.0 +djangorestframework==3.17.1 # via # -r requirements/base.in # drf-jwt @@ -80,7 +80,7 @@ edx-opaque-keys==3.1.0 # via # edx-drf-extensions # edx-organizations -edx-organizations==7.3.0 +edx-organizations==8.0.0 # via -r requirements/base.in idna==3.11 # via requests @@ -106,7 +106,7 @@ pynacl==1.6.2 # via edx-django-utils python-dateutil==2.9.0.post0 # via celery -requests==2.32.5 +requests==2.33.0 # via edx-drf-extensions rules==3.5 # via -r requirements/base.in diff --git a/requirements/ci.txt b/requirements/ci.txt index a4f7fd0fe..131e8a92c 100644 --- a/requirements/ci.txt +++ b/requirements/ci.txt @@ -28,11 +28,13 @@ pluggy==1.6.0 # via tox pyproject-api==1.10.0 # via tox -python-discovery==1.2.0 - # via virtualenv +python-discovery==1.2.1 + # via + # tox + # virtualenv tomli-w==1.2.0 # via tox -tox==4.50.3 +tox==4.51.0 # via -r requirements/ci.in virtualenv==21.2.0 # via tox diff --git a/requirements/dev.txt b/requirements/dev.txt index fe3279d4b..70eba1ff5 100644 --- a/requirements/dev.txt +++ b/requirements/dev.txt @@ -23,7 +23,7 @@ billiard==4.2.4 # via # -r requirements/quality.txt # celery -build==1.4.0 +build==1.4.2 # via # -r requirements/pip-tools.txt # pip-tools @@ -31,7 +31,7 @@ cachetools==7.0.5 # via # -r requirements/ci.txt # tox -celery==5.6.2 +celery==5.6.3 # via -r requirements/quality.txt certifi==2026.2.25 # via @@ -42,7 +42,7 @@ cffi==2.0.0 # -r requirements/quality.txt # cryptography # pynacl -chardet==7.2.0 +chardet==7.4.0.post2 # via diff-cover charset-normalizer==3.4.6 # via @@ -90,7 +90,7 @@ coverage[toml]==7.13.5 # via # -r requirements/quality.txt # pytest-cov -cryptography==46.0.5 +cryptography==46.0.6 # via # -r requirements/quality.txt # pyjwt @@ -153,7 +153,7 @@ django-waffle==5.0.0 # -r requirements/quality.txt # edx-django-utils # edx-drf-extensions -djangorestframework==3.17.0 +djangorestframework==3.17.1 # via # -r requirements/quality.txt # drf-jwt @@ -190,7 +190,7 @@ edx-opaque-keys==3.1.0 # -r requirements/quality.txt # edx-drf-extensions # edx-organizations -edx-organizations==7.3.0 +edx-organizations==8.0.0 # via -r requirements/quality.txt filelock==3.25.2 # via @@ -293,7 +293,7 @@ mypy-extensions==1.1.0 # mypy mysqlclient==2.2.8 # via -r requirements/quality.txt -nh3==0.3.3 +nh3==0.3.4 # via # -r requirements/quality.txt # readme-renderer @@ -355,7 +355,7 @@ pycparser==3.0 # cffi pydocstyle==6.3.0 # via -r requirements/quality.txt -pygments==2.19.2 +pygments==2.20.0 # via # -r requirements/quality.txt # diff-cover @@ -418,9 +418,10 @@ python-dateutil==2.9.0.post0 # -r requirements/quality.txt # celery # freezegun -python-discovery==1.2.0 +python-discovery==1.2.1 # via # -r requirements/ci.txt + # tox # virtualenv python-slugify==8.0.4 # via @@ -435,7 +436,7 @@ readme-renderer==44.0 # via # -r requirements/quality.txt # twine -requests==2.32.5 +requests==2.33.0 # via # -r requirements/quality.txt # edx-drf-extensions @@ -496,7 +497,7 @@ tomlkit==0.14.0 # via # -r requirements/quality.txt # pylint -tox==4.50.3 +tox==4.51.0 # via -r requirements/ci.txt twine==6.2.0 # via -r requirements/quality.txt diff --git a/requirements/doc.txt b/requirements/doc.txt index 07c0631e1..fbbc93c22 100644 --- a/requirements/doc.txt +++ b/requirements/doc.txt @@ -28,7 +28,7 @@ billiard==4.2.4 # via # -r requirements/test.txt # celery -celery==5.6.2 +celery==5.6.3 # via -r requirements/test.txt certifi==2026.2.25 # via @@ -71,7 +71,7 @@ coverage[toml]==7.13.5 # via # -r requirements/test.txt # pytest-cov -cryptography==46.0.5 +cryptography==46.0.6 # via # -r requirements/test.txt # pyjwt @@ -121,7 +121,7 @@ django-waffle==5.0.0 # -r requirements/test.txt # edx-django-utils # edx-drf-extensions -djangorestframework==3.17.0 +djangorestframework==3.17.1 # via # -r requirements/test.txt # drf-jwt @@ -159,7 +159,7 @@ edx-opaque-keys==3.1.0 # -r requirements/test.txt # edx-drf-extensions # edx-organizations -edx-organizations==7.3.0 +edx-organizations==8.0.0 # via -r requirements/test.txt freezegun==1.5.5 # via -r requirements/test.txt @@ -214,7 +214,7 @@ mypy-extensions==1.1.0 # mypy mysqlclient==2.2.8 # via -r requirements/test.txt -nh3==0.3.3 +nh3==0.3.4 # via readme-renderer packaging==26.0 # via @@ -251,7 +251,7 @@ pycparser==3.0 # cffi pydata-sphinx-theme==0.16.1 # via sphinx-book-theme -pygments==2.19.2 +pygments==2.20.0 # via # -r requirements/test.txt # accessible-pygments @@ -298,7 +298,7 @@ pyyaml==6.0.3 # code-annotations readme-renderer==44.0 # via -r requirements/doc.in -requests==2.32.5 +requests==2.33.0 # via # -r requirements/test.txt # edx-drf-extensions diff --git a/requirements/pip-tools.txt b/requirements/pip-tools.txt index d89516b82..7da5f28cb 100644 --- a/requirements/pip-tools.txt +++ b/requirements/pip-tools.txt @@ -4,7 +4,7 @@ # # make upgrade # -build==1.4.0 +build==1.4.2 # via pip-tools click==8.3.1 # via pip-tools diff --git a/requirements/quality.txt b/requirements/quality.txt index 6115f162f..dcde6d8de 100644 --- a/requirements/quality.txt +++ b/requirements/quality.txt @@ -22,7 +22,7 @@ billiard==4.2.4 # via # -r requirements/test.txt # celery -celery==5.6.2 +celery==5.6.3 # via -r requirements/test.txt certifi==2026.2.25 # via @@ -71,7 +71,7 @@ coverage[toml]==7.13.5 # via # -r requirements/test.txt # pytest-cov -cryptography==46.0.5 +cryptography==46.0.6 # via # -r requirements/test.txt # pyjwt @@ -123,7 +123,7 @@ django-waffle==5.0.0 # -r requirements/test.txt # edx-django-utils # edx-drf-extensions -djangorestframework==3.17.0 +djangorestframework==3.17.1 # via # -r requirements/test.txt # drf-jwt @@ -156,7 +156,7 @@ edx-opaque-keys==3.1.0 # -r requirements/test.txt # edx-drf-extensions # edx-organizations -edx-organizations==7.3.0 +edx-organizations==8.0.0 # via -r requirements/test.txt freezegun==1.5.5 # via -r requirements/test.txt @@ -232,7 +232,7 @@ mypy-extensions==1.1.0 # mypy mysqlclient==2.2.8 # via -r requirements/test.txt -nh3==0.3.3 +nh3==0.3.4 # via readme-renderer packaging==26.0 # via @@ -271,7 +271,7 @@ pycparser==3.0 # cffi pydocstyle==6.3.0 # via -r requirements/quality.in -pygments==2.19.2 +pygments==2.20.0 # via # -r requirements/test.txt # pytest @@ -328,7 +328,7 @@ pyyaml==6.0.3 # code-annotations readme-renderer==44.0 # via twine -requests==2.32.5 +requests==2.33.0 # via # -r requirements/test.txt # edx-drf-extensions diff --git a/requirements/test.txt b/requirements/test.txt index 903027ef4..437cd993d 100644 --- a/requirements/test.txt +++ b/requirements/test.txt @@ -18,7 +18,7 @@ billiard==4.2.4 # via # -r requirements/base.txt # celery -celery==5.6.2 +celery==5.6.3 # via -r requirements/base.txt certifi==2026.2.25 # via @@ -61,7 +61,7 @@ coverage[toml]==7.13.5 # via # -r requirements/test.in # pytest-cov -cryptography==46.0.5 +cryptography==46.0.6 # via # -r requirements/base.txt # pyjwt @@ -107,7 +107,7 @@ django-waffle==5.0.0 # -r requirements/base.txt # edx-django-utils # edx-drf-extensions -djangorestframework==3.17.0 +djangorestframework==3.17.1 # via # -r requirements/base.txt # drf-jwt @@ -136,7 +136,7 @@ edx-opaque-keys==3.1.0 # -r requirements/base.txt # edx-drf-extensions # edx-organizations -edx-organizations==7.3.0 +edx-organizations==8.0.0 # via -r requirements/base.txt freezegun==1.5.5 # via -r requirements/test.in @@ -199,7 +199,7 @@ pycparser==3.0 # via # -r requirements/base.txt # cffi -pygments==2.19.2 +pygments==2.20.0 # via # pytest # rich @@ -234,7 +234,7 @@ python-slugify==8.0.4 # via code-annotations pyyaml==6.0.3 # via code-annotations -requests==2.32.5 +requests==2.33.0 # via # -r requirements/base.txt # edx-drf-extensions diff --git a/src/openedx_content/applets/components/models.py b/src/openedx_content/applets/components/models.py index 3ec181fa2..de93680fd 100644 --- a/src/openedx_content/applets/components/models.py +++ b/src/openedx_content/applets/components/models.py @@ -1,5 +1,5 @@ """ -The model hierarchy is Component -> ComponentVersion -> Content. +The model hierarchy is Component -> ComponentVersion -> Media. A Component is an entity like a Problem or Video. It has enough information to identify the Component and determine what the handler should be (e.g. XBlock @@ -10,7 +10,7 @@ publishing app. Component maps 1:1 to PublishableEntity and ComponentVersion maps 1:1 to PublishableEntityVersion. -Multiple pieces of Content may be associated with a ComponentVersion, through +Multiple pieces of Media may be associated with a ComponentVersion, through the ComponentVersionMedia model. ComponentVersionMedia allows to specify a ComponentVersion-local identifier. We're using this like a file path by convention, but it's possible we might want to have special identifiers later. @@ -91,7 +91,7 @@ class Component(PublishableEntityMixin): Problem), but little beyond that. A Component will have many ComponentVersions over time, and most metadata is - associated with the ComponentVersion model and the Content that + associated with the ComponentVersion model and the Media that ComponentVersions are associated with. A Component belongs to exactly one LearningPackage. @@ -199,7 +199,7 @@ class ComponentVersion(PublishableEntityVersionMixin): """ A particular version of a Component. - This holds the media using a M:M relationship with Content via + This holds the media using a M:M relationship with Media via ComponentVersionMedia. """ @@ -225,18 +225,18 @@ class Meta: class ComponentVersionMedia(models.Model): """ - Determines the Content for a given ComponentVersion. + Determines the Media for a given ComponentVersion. An ComponentVersion may be associated with multiple pieces of binary data. For instance, a Video ComponentVersion might be associated with multiple transcripts in different languages. - When Content is associated with an ComponentVersion, it has some local + When Media is associated with an ComponentVersion, it has some local key that is unique within the the context of that ComponentVersion. This allows the ComponentVersion to do things like store an image file and reference it by a "path" key. - Content is immutable and sharable across multiple ComponentVersions. + Media is immutable and sharable across multiple ComponentVersions. """ component_version = models.ForeignKey(ComponentVersion, on_delete=models.CASCADE) diff --git a/src/openedx_content/applets/components/readme.rst b/src/openedx_content/applets/components/readme.rst index b79134bc6..eb97d22f2 100644 --- a/src/openedx_content/applets/components/readme.rst +++ b/src/openedx_content/applets/components/readme.rst @@ -1,7 +1,7 @@ Components App ============== -The ``components`` app holds the versioned data models for the lowest-level pieces of content that can be stored in Open edX: Components (e.g. XBlocks), as well as the individual pieces of raw data content that they reference. +The ``components`` app holds the versioned data models for the lowest-level pieces of content that can be stored in Open edX: Components (e.g. XBlocks), as well as the individual pieces of raw data Media that they reference. Motivation ---------- @@ -18,6 +18,6 @@ Architecture Guidelines * We're keeping nearly unlimited history, so per-version metadata (i.e. the space/time cost of making a new version) must be kept low. * Do not assume that all Components will be XBlocks. -* Encourage other apps to make models that join to (and add their own metadata to) Component, ComponentVersion, Content, etc. But it should be done in such a way that this app is not aware of them. +* Encourage other apps to make models that join to (and add their own metadata to) Component, ComponentVersion, Media, etc. But it should be done in such a way that this app is not aware of them. * Always preserve the most raw version of the data possible, e.g. OLX, even if XBlocks then extend that with more sophisticated data models. At some point those XBlocks will get deprecated/removed, and we will still want to be able to export the raw data. * Exports should be fast and *not* require the invocation of plugin code. \ No newline at end of file diff --git a/src/openedx_content/migrations/0005_containertypes.py b/src/openedx_content/migrations/0005_containertypes.py index 65250188a..ad9ceed51 100644 --- a/src/openedx_content/migrations/0005_containertypes.py +++ b/src/openedx_content/migrations/0005_containertypes.py @@ -29,6 +29,14 @@ def backfill_container_types(apps, schema_editor): raise ValueError(f"container {unknown_containers[0]} is of unknown container type. Cannot apply migration.") +def clear_container_types(apps, schema_editor): + """ + Drop the contents of the container_type field for the reverse migration. + """ + Container = apps.get_model("openedx_content", "Container") + Container.objects.update(container_type=None) + + class Migration(migrations.Migration): dependencies = [ ("openedx_content", "0004_componenttype_constraint"), @@ -68,7 +76,10 @@ class Migration(migrations.Migration): ), ), # 3. Populate the container_type column, which is currently NULL for all existing containers - migrations.RunPython(backfill_container_types), + migrations.RunPython( + code=backfill_container_types, + reverse_code=clear_container_types, + ), # 4. disallow NULL values from now on migrations.AlterField( model_name="container", diff --git a/src/openedx_core/__init__.py b/src/openedx_core/__init__.py index 03fd9fd50..2d323bdeb 100644 --- a/src/openedx_core/__init__.py +++ b/src/openedx_core/__init__.py @@ -6,4 +6,4 @@ """ # The version for the entire repository -__version__ = "0.38.0" +__version__ = "0.38.1" diff --git a/src/openedx_tagging/data.py b/src/openedx_tagging/data.py index 20ba7324b..1ab8e9876 100644 --- a/src/openedx_tagging/data.py +++ b/src/openedx_tagging/data.py @@ -21,7 +21,6 @@ class TagData(TypedDict): value: str external_id: str | None child_count: int - descendant_count: int depth: int parent_value: str | None # Note: usage_count may or may not be present, depending on the request. diff --git a/src/openedx_tagging/models/base.py b/src/openedx_tagging/models/base.py index 21c0b6229..882a4ec3f 100644 --- a/src/openedx_tagging/models/base.py +++ b/src/openedx_tagging/models/base.py @@ -6,7 +6,8 @@ import logging import re -from typing import List, Self +from collections import Counter, defaultdict +from typing import List, Self, cast from django.core.exceptions import ValidationError from django.db import models @@ -173,7 +174,7 @@ def descendant_count(self) -> int: ).count() return 0 - def save(self, *args, **kwargs): + def save(self, *args, **kwargs) -> None: """ Compute and persist depth and lineage before saving, then cascade any changes to descendants. """ @@ -187,8 +188,8 @@ def save(self, *args, **kwargs): self.lineage = self.value + "\t" else: if Tag.parent.is_cached(self): # pylint: disable=no-member - parent_depth = self.parent.depth - parent_lineage = self.parent.lineage + parent_depth = self.parent.depth # type: ignore[union-attr] + parent_lineage = self.parent.lineage # type: ignore[union-attr] else: parent_vals = Tag.objects.values("depth", "lineage").get(pk=self.parent_id) parent_depth = parent_vals["depth"] @@ -216,7 +217,9 @@ def save(self, *args, **kwargs): } if depth_delta != 0: update_kwargs["depth"] = F("depth") + depth_delta - self.taxonomy.tag_set.filter(lineage__startswith=old_values["lineage"]).exclude(pk=self.pk).update( + self.taxonomy.tag_set.filter( # type: ignore[union-attr] + lineage__startswith=old_values["lineage"] + ).exclude(pk=self.pk).update( **update_kwargs ) @@ -492,7 +495,6 @@ def _get_filtered_tags_free_text( qs = qs.annotate( depth=Value(0), child_count=Value(0), - descendant_count=Value(0), external_id=Value(None, output_field=models.CharField()), parent_value=Value(None, output_field=models.CharField()), _id=Value(None, output_field=models.CharField()), @@ -524,31 +526,15 @@ def _get_filtered_tags_one_level( qs = self.tag_set.filter(parent=None) qs = qs.annotate(parent_value=Value(None, output_field=models.CharField())) qs = qs.annotate(child_count=models.Count("children", distinct=True)) # type: ignore[no-redef] - # Count all descendants at any depth using depth + lineage prefix. - # depth__gt correctly excludes self; lineage prefix matches all descendants. - descendants_sq = ( - self.tag_set.filter(depth__gt=models.OuterRef("depth"), lineage__startswith=models.OuterRef("lineage")) - .order_by() - .annotate(count=models.Func(F("id"), function="Count")) - ) - qs = qs.annotate(descendant_count=models.Subquery(descendants_sq.values("count"))) # type: ignore[no-redef] # Filter by search term: if search_term: qs = qs.filter(value__icontains=search_term) qs = qs.annotate(_id=F("id")) # ID has an underscore to encourage use of 'value' rather than this internal ID - qs = qs.values("value", "child_count", "descendant_count", "depth", "parent_value", "external_id", "_id") + qs = qs.values("value", "child_count", "depth", "parent_value", "external_id", "_id") qs = qs.order_by("value") if include_counts: - # We need to include the count of how many times this tag is used to tag objects. - # You'd think we could just use: - # qs = qs.annotate(usage_count=models.Count("objecttag__pk")) - # but that adds another join which starts creating a cross product and the children and usage_count become - # intertwined and multiplied with each other. So we use a subquery. - obj_tags = ObjectTag.objects.filter(tag_id=models.OuterRef("pk")).order_by().annotate( - # We need to use Func() to get Count() without GROUP BY - see https://stackoverflow.com/a/69031027 - count=models.Func(F('id'), function='Count') - ) - qs = qs.annotate(usage_count=models.Subquery(obj_tags.values('count'))) + return self._add_counts(list(cast(list, qs))) # type: ignore[return-value] + return qs # type: ignore[return-value] def _get_filtered_tags_deep( @@ -562,11 +548,6 @@ def _get_filtered_tags_deep( Implementation of get_filtered_tags() for closed taxonomies, where we're including tags from multiple levels of the hierarchy. """ - # Note: we ignore a lot of "no-redef" warnings here because we use annotations to pre-load fields like - # `child_count`, and `descendant_count` for all tags in a single query rather than computing them later for each - # Tag, with additional queries. Also, we are converting the result to a values query (that returns a dict), not - # returning actual Tag objects at the end, but mypy doesn't know that. - if parent_tag_value: # Get a subtree below this tag: main_parent_tag = self.tag_for_value(parent_tag_value) @@ -611,51 +592,56 @@ def _get_filtered_tags_deep( # frontend. # Count the direct children, and annotate the result on each row as "child_count". - # The query below produces the same results as: - # qs = initial_qs.annotate(child_count=models.Count("children")) - # However, this correlated subquery avoids a JOIN + GROUP BY, and is far more efficient in practice. - # This also lets us use the same code path whether there's a search_term or not. child_count_sq = ( initial_qs.filter(parent_id=models.OuterRef("pk")) .order_by() .annotate(count=models.Func(F("id"), function="Count")) ) - # Count all descendants at any depth using the lineage prefix trick. - descendants_sq = ( - initial_qs.filter( - depth__gt=models.OuterRef("depth"), - lineage__startswith=models.OuterRef("lineage"), - ) - .order_by() - .annotate(count=models.Func(F("id"), function="Count")) - ) - qs = initial_qs.annotate( # type: ignore[no-redef] - child_count=models.Subquery(child_count_sq.values("count")), - descendant_count=models.Subquery(descendants_sq.values("count")), - ) + qs = initial_qs.annotate(child_count=models.Subquery(child_count_sq.values("count"))) # type: ignore[no-redef] # Add the parent value qs = qs.annotate(parent_value=F("parent__value")) qs = qs.annotate(_id=F("id")) # ID has an underscore to encourage use of 'value' rather than this internal ID qs = qs.values( # type: ignore[assignment] - "value", "child_count", "descendant_count", "depth", "parent_value", "external_id", "_id" + "value", "child_count", "depth", "parent_value", "external_id", "_id" ) # lineage is a case-insensitive column storing "Root\tParent\t...\tThisValue\t", so # ordering by it gives the tree sort order that we want. qs = qs.order_by("lineage") if include_counts: - # Including the counts is a bit tricky; see the comment above in _get_filtered_tags_one_level() - obj_tags = ( - ObjectTag.objects.filter(tag_id=models.OuterRef("pk")) - .order_by() - .annotate( - # We need to use Func() to get Count() without GROUP BY - see https://stackoverflow.com/a/69031027 - count=models.Func(F("id"), function="Count") - ) - ) - qs = qs.annotate(usage_count=models.Subquery(obj_tags.values("count"))) + return self._add_counts(list(cast(list, qs))) # type: ignore[return-value] + return qs # type: ignore[return-value] + def _add_counts(self, tag_data: list[dict]) -> list[dict]: + """ + Add usage counts to a list of tag data dictionaries. For performance + reasons, we call this function with the list result of the + QuerySet so we can then add the counts in-memory rather than to a + QuerySet which would require a very expensive annotation to join the + in-memory data to the original QuerySet. + """ + + tag_lineage_dict = dict(self.tag_set.all().filter(taxonomy_id=self.id).values_list("value", "lineage")) + object_tags = self.objecttag_set.all().filter(taxonomy_id=self.id).values_list("_value", "object_id") + tag_counts: Counter[str] = Counter() + object_tag_lineage_seen: defaultdict[str, set] = defaultdict(set) + + for tag_value, object_id in object_tags: + # split the lineages to get a dict of {tag.value: [lineages]} + lineage_tags = (t for t in tag_lineage_dict.get(tag_value, "").split('\t') if t) + # de-duplicate based on if the lineage is already 'seen' per object + unseen_tags = [t for t in lineage_tags if t not in object_tag_lineage_seen[object_id]] + + tag_counts.update(unseen_tags) + object_tag_lineage_seen[object_id].update(unseen_tags) + + # In-memory 'annotation'; this is faster than using annotate() on the QuerySet. + for row in tag_data: + row["usage_count"] = tag_counts.get(row["value"], 0) + + return tag_data + def add_tag( self, tag_value: str, diff --git a/src/openedx_tagging/rest_api/v1/serializers.py b/src/openedx_tagging/rest_api/v1/serializers.py index 761e0b141..a70f6d20f 100644 --- a/src/openedx_tagging/rest_api/v1/serializers.py +++ b/src/openedx_tagging/rest_api/v1/serializers.py @@ -226,7 +226,6 @@ class TagDataSerializer(UserPermissionsSerializerMixin, serializers.Serializer): value = serializers.CharField() external_id = serializers.CharField(allow_null=True) child_count = serializers.IntegerField() - descendant_count = serializers.IntegerField() depth = serializers.IntegerField() parent_value = serializers.CharField(allow_null=True) usage_count = serializers.IntegerField(required=False) diff --git a/tests/openedx_tagging/import_export/test_template.py b/tests/openedx_tagging/import_export/test_template.py index 9b34947e5..8baf16ddd 100644 --- a/tests/openedx_tagging/import_export/test_template.py +++ b/tests/openedx_tagging/import_export/test_template.py @@ -53,17 +53,17 @@ def test_import_template(self, template_file, parser_format): 'Electronic instruments (ELECTRIC) (None) (children: 2)', ' Synthesizer (SYNTH) (Electronic instruments) (children: 0)', ' Theramin (THERAMIN) (Electronic instruments) (children: 0)', - 'Percussion instruments (PERCUSS) (None) (children: 3 + 6)', + 'Percussion instruments (PERCUSS) (None) (children: 3)', ' Chordophone (CHORD) (Percussion instruments) (children: 1)', ' Piano (PIANO) (Chordophone) (children: 0)', ' Idiophone (BELLS) (Percussion instruments) (children: 2)', ' Celesta (CELESTA) (Idiophone) (children: 0)', ' Hi-hat (HI-HAT) (Idiophone) (children: 0)', - ' Membranophone (DRUMS) (Percussion instruments) (children: 2 + 1)', + ' Membranophone (DRUMS) (Percussion instruments) (children: 2)', ' Cajón (CAJÓN) (Membranophone) (children: 1)', ' Pyle Stringed Jam Cajón (PYLE) (Cajón) (children: 0)', ' Tabla (TABLA) (Membranophone) (children: 0)', - 'String instruments (STRINGS) (None) (children: 2 + 5)', + 'String instruments (STRINGS) (None) (children: 2)', ' Bowed strings (BOW) (String instruments) (children: 2)', ' Cello (CELLO) (Bowed strings) (children: 0)', ' Violin (VIOLIN) (Bowed strings) (children: 0)', @@ -71,7 +71,7 @@ def test_import_template(self, template_file, parser_format): ' Banjo (BANJO) (Plucked strings) (children: 0)', ' Harp (HARP) (Plucked strings) (children: 0)', ' Mandolin (MANDOLIN) (Plucked strings) (children: 0)', - 'Wind instruments (WINDS) (None) (children: 2 + 5)', + 'Wind instruments (WINDS) (None) (children: 2)', ' Brass (BRASS) (Wind instruments) (children: 2)', ' Trumpet (TRUMPET) (Brass) (children: 0)', ' Tuba (TUBA) (Brass) (children: 0)', diff --git a/tests/openedx_tagging/test_api.py b/tests/openedx_tagging/test_api.py index 12bd917b7..ad7751d58 100644 --- a/tests/openedx_tagging/test_api.py +++ b/tests/openedx_tagging/test_api.py @@ -145,8 +145,8 @@ def test_get_tags(self) -> None: "Bacteria (children: 2)", " Archaebacteria (children: 0)", " Eubacteria (children: 0)", - "Eukaryota (children: 5 + 8)", - " Animalia (children: 7 + 1)", + "Eukaryota (children: 5)", + " Animalia (children: 7)", " Arthropoda (children: 0)", " Chordata (children: 1)", " Mammalia (children: 0)", @@ -175,7 +175,7 @@ def test_get_root_tags(self): assert pretty_format_tags(root_life_on_earth_tags, parent=False) == [ 'Archaea (children: 3)', 'Bacteria (children: 2)', - 'Eukaryota (children: 5 + 8)', + 'Eukaryota (children: 5)', ] @override_settings(LANGUAGES=test_languages) @@ -755,17 +755,17 @@ def get_object_tags(): "Archaea (used: 1, children: 2)", " Euryarchaeida (used: 0, children: 0)", " Proteoarchaeota (used: 0, children: 0)", - "Bacteria (used: 0, children: 1)", # does not contain "cha" but a child does + "Bacteria (used: 1, children: 1)", # does not contain "cha" but a child does " Archaebacteria (used: 1, children: 0)", ]), ("ar", [ "Archaea (used: 1, children: 2)", " Euryarchaeida (used: 0, children: 0)", " Proteoarchaeota (used: 0, children: 0)", - "Bacteria (used: 0, children: 1)", # does not contain "ar" but a child does + "Bacteria (used: 1, children: 1)", # does not contain "ar" but a child does " Archaebacteria (used: 1, children: 0)", - "Eukaryota (used: 0, children: 1 + 2)", - " Animalia (used: 1, children: 2)", # does not contain "ar" but a child does + "Eukaryota (used: 6, children: 1)", + " Animalia (used: 4, children: 2)", # does not contain "ar" but a child does " Arthropoda (used: 1, children: 0)", " Cnidaria (used: 0, children: 0)", ]), @@ -773,9 +773,9 @@ def get_object_tags(): "Archaea (used: 1, children: 2)", " Euryarchaeida (used: 0, children: 0)", " Proteoarchaeota (used: 0, children: 0)", - "Bacteria (used: 0, children: 1)", # does not contain "ae" but a child does + "Bacteria (used: 1, children: 1)", # does not contain "ae" but a child does " Archaebacteria (used: 1, children: 0)", - "Eukaryota (used: 0, children: 1)", # does not contain "ae" but a child does + "Eukaryota (used: 6, children: 1)", # does not contain "ae" but a child does " Plantae (used: 1, children: 0)", ]), ("a", [ @@ -783,11 +783,11 @@ def get_object_tags(): " DPANN (used: 0, children: 0)", " Euryarchaeida (used: 0, children: 0)", " Proteoarchaeota (used: 0, children: 0)", - "Bacteria (used: 0, children: 2)", + "Bacteria (used: 1, children: 2)", " Archaebacteria (used: 1, children: 0)", " Eubacteria (used: 0, children: 0)", - "Eukaryota (used: 0, children: 4 + 8)", - " Animalia (used: 1, children: 7 + 1)", + "Eukaryota (used: 6, children: 4)", + " Animalia (used: 4, children: 7)", " Arthropoda (used: 1, children: 0)", " Chordata (used: 0, children: 1)", " Mammalia (used: 0, children: 0)", @@ -1107,10 +1107,10 @@ def test_deep_taxonomy(self) -> None: assert pretty_format_tags( tagging_api.get_tags(taxonomy) ) == [ - "Root - depth 0 (None) (children: 1 + 4)", - " Child - depth 1 (Root - depth 0) (children: 1 + 3)", - " Grandchild - depth 2 (Child - depth 1) (children: 1 + 2)", - " Great-Grandchild - depth 3 (Grandchild - depth 2) (children: 1 + 1)", + "Root - depth 0 (None) (children: 1)", + " Child - depth 1 (Root - depth 0) (children: 1)", + " Grandchild - depth 2 (Child - depth 1) (children: 1)", + " Great-Grandchild - depth 3 (Grandchild - depth 2) (children: 1)", " Great-Great-Grandchild - depth 4 (Great-Grandchild - depth 3) (children: 1)", " Great-Great-Great-Grandchild - depth 5 (Great-Great-Grandchild - depth 4) (children: 0)", ] @@ -1128,7 +1128,7 @@ def test_deep_taxonomy(self) -> None: # Or even load a subtree: deep_result = taxonomy.get_filtered_tags(depth=None, parent_tag_value=tag_2.value) assert pretty_format_tags(deep_result, parent=False) == [ - ' Great-Grandchild - depth 3 (children: 1 + 1)', + ' Great-Grandchild - depth 3 (children: 1)', ' Great-Great-Grandchild - depth 4 (children: 1)', ' Great-Great-Great-Grandchild - depth 5 (children: 0)', ] diff --git a/tests/openedx_tagging/test_models.py b/tests/openedx_tagging/test_models.py index cde558582..8f801c77f 100644 --- a/tests/openedx_tagging/test_models.py +++ b/tests/openedx_tagging/test_models.py @@ -51,6 +51,7 @@ def setUp(self): self.chordata = get_tag("Chordata") self.mammalia = get_tag("Mammalia") self.animalia = get_tag("Animalia") + self.eukaryota = get_tag("Eukaryota") self.system_taxonomy_tag = get_tag("System Tag 1") self.english_tag = self.language_taxonomy.tag_for_external_id("en") self.user_1 = get_user_model()( @@ -329,9 +330,9 @@ def test_get_root(self) -> None: del r["_id"] # Remove the internal database IDs; they aren't interesting here and a other tests check them assert result == [ # These are the root tags, in alphabetical order: - {"value": "Archaea", "child_count": 3, "descendant_count": 3, **common_fields}, - {"value": "Bacteria", "child_count": 2, "descendant_count": 2, **common_fields}, - {"value": "Eukaryota", "child_count": 5, "descendant_count": 13, **common_fields}, + {"value": "Archaea", "child_count": 3, **common_fields}, + {"value": "Bacteria", "child_count": 2, **common_fields}, + {"value": "Eukaryota", "child_count": 5, **common_fields}, ] def test_get_child_tags_one_level(self) -> None: @@ -345,11 +346,11 @@ def test_get_child_tags_one_level(self) -> None: del r["_id"] # Remove the internal database IDs; they aren't interesting here and a other tests check them assert result == [ # These are the Eukaryota tags, in alphabetical order: - {"value": "Animalia", "child_count": 7, "descendant_count": 8, **common_fields}, - {"value": "Fungi", "child_count": 0, "descendant_count": 0, **common_fields}, - {"value": "Monera", "child_count": 0, "descendant_count": 0, **common_fields}, - {"value": "Plantae", "child_count": 0, "descendant_count": 0, **common_fields}, - {"value": "Protista", "child_count": 0, "descendant_count": 0, **common_fields}, + {"value": "Animalia", "child_count": 7, **common_fields}, + {"value": "Fungi", "child_count": 0, **common_fields}, + {"value": "Monera", "child_count": 0, **common_fields}, + {"value": "Plantae", "child_count": 0, **common_fields}, + {"value": "Protista", "child_count": 0, **common_fields}, ] def test_get_grandchild_tags_one_level(self) -> None: @@ -363,13 +364,13 @@ def test_get_grandchild_tags_one_level(self) -> None: del r["_id"] # Remove the internal database IDs; they aren't interesting here and a other tests check them assert result == [ # These are the Eukaryota tags, in alphabetical order: - {"value": "Arthropoda", "child_count": 0, "descendant_count": 0, **common_fields}, - {"value": "Chordata", "child_count": 1, "descendant_count": 1, **common_fields}, - {"value": "Cnidaria", "child_count": 0, "descendant_count": 0, **common_fields}, - {"value": "Ctenophora", "child_count": 0, "descendant_count": 0, **common_fields}, - {"value": "Gastrotrich", "child_count": 0, "descendant_count": 0, **common_fields}, - {"value": "Placozoa", "child_count": 0, "descendant_count": 0, **common_fields}, - {"value": "Porifera", "child_count": 0, "descendant_count": 0, **common_fields}, + {"value": "Arthropoda", "child_count": 0, **common_fields}, + {"value": "Chordata", "child_count": 1, **common_fields}, + {"value": "Cnidaria", "child_count": 0, **common_fields}, + {"value": "Ctenophora", "child_count": 0, **common_fields}, + {"value": "Gastrotrich", "child_count": 0, **common_fields}, + {"value": "Placozoa", "child_count": 0, **common_fields}, + {"value": "Porifera", "child_count": 0, **common_fields}, ] def test_get_depth_1_search_term(self) -> None: @@ -381,7 +382,6 @@ def test_get_depth_1_search_term(self) -> None: { "value": "Archaea", "child_count": 3, - "descendant_count": 3, "depth": 0, "usage_count": 0, "parent_value": None, @@ -400,7 +400,6 @@ def test_get_depth_1_child_search_term(self) -> None: { "value": "Archaebacteria", "child_count": 0, - "descendant_count": 0, "depth": 1, "parent_value": "Bacteria", "external_id": None, @@ -416,10 +415,12 @@ def test_depth_1_queries(self) -> None: """ with self.assertNumQueries(1): self.test_get_root() - with self.assertNumQueries(1): + + # 2 queries including a query to get 1. max depth and 2. objs for tag counts + with self.assertNumQueries(3): self.test_get_depth_1_search_term() # When listing the tags below a specific tag, there is one additional query to load the parent tag: - with self.assertNumQueries(2): + with self.assertNumQueries(4): self.test_get_child_tags_one_level() with self.assertNumQueries(2): self.test_get_depth_1_child_search_term() @@ -441,8 +442,8 @@ def test_get_all(self) -> None: "Bacteria (None) (children: 2)", " Archaebacteria (Bacteria) (children: 0)", " Eubacteria (Bacteria) (children: 0)", - "Eukaryota (None) (children: 5 + 8)", - " Animalia (Eukaryota) (children: 7 + 1)", + "Eukaryota (None) (children: 5)", + " Animalia (Eukaryota) (children: 7)", " Arthropoda (Animalia) (children: 0)", " Chordata (Animalia) (children: 1)", " Mammalia (Chordata) (children: 0)", @@ -478,7 +479,7 @@ def test_search_2(self) -> None: """ result = pretty_format_tags(self.taxonomy.get_filtered_tags(search_term="chordata")) assert result == [ - "Eukaryota (None) (children: 1 + 1)", # Has one child that matches, plus one additional matching descendant + "Eukaryota (None) (children: 1)", # Has one child that matches " Animalia (Eukaryota) (children: 1)", " Chordata (Animalia) (children: 0)", # this is the matching tag. ] @@ -492,8 +493,8 @@ def test_search_3(self) -> None: assert result == [ "Archaea (None) (children: 1)", " Proteoarchaeota (Archaea) (children: 0)", - "Eukaryota (None) (children: 2 + 2)", # 2 direct matching children, 2 additional matching descendants - " Animalia (Eukaryota) (children: 2)", + "Eukaryota (None) (children: 2)", # 2 direct matching children + " Animalia (Eukaryota) (children: 2)", # also 2 matching children " Arthropoda (Animalia) (children: 0)", # match " Gastrotrich (Animalia) (children: 0)", # match " Protista (Eukaryota) (children: 0)", # match @@ -511,7 +512,6 @@ def test_tags_deep(self) -> None: "depth": 3, "usage_count": 0, "child_count": 0, - "descendant_count": 0, "external_id": None, "_id": 21, # These IDs are hard-coded in the test fixture file } @@ -524,8 +524,10 @@ def test_deep_queries(self) -> None: """ with self.assertNumQueries(1): self.test_get_all() - # Searching below a specific tag requires an additional query to load that tag: - with self.assertNumQueries(2): + # Searching below a specific tag requires an additional query to load that tag, + # 4 queries including a query to get the 1.max depth for tag counts and 2 objs + # for counts: + with self.assertNumQueries(4): self.test_tags_deep() # Keyword search requires an additional query: with self.assertNumQueries(2): @@ -545,7 +547,9 @@ def test_get_external_id(self) -> None: def test_usage_count(self) -> None: """ - Test that the usage count in the results is right + Test that the usage count in the results is right for a basic case; + many objects tagged separately should return a simple usage count that + reflects lineage de-duplication (or lack thereof, in this case) """ api.tag_object(object_id="obj01", taxonomy=self.taxonomy, tags=["Bacteria"]) api.tag_object(object_id="obj02", taxonomy=self.taxonomy, tags=["Bacteria"]) @@ -554,7 +558,7 @@ def test_usage_count(self) -> None: # Now the API should reflect these usage counts: result = pretty_format_tags(self.taxonomy.get_filtered_tags(search_term="bacteria", include_counts=True)) assert result == [ - "Bacteria (None) (used: 3, children: 2)", + "Bacteria (None) (used: 4, children: 2)", " Archaebacteria (Bacteria) (used: 0, children: 0)", " Eubacteria (Bacteria) (used: 1, children: 0)", ] @@ -563,7 +567,217 @@ def test_usage_count(self) -> None: self.taxonomy.get_filtered_tags(search_term="bacteria", include_counts=True, depth=1) ) assert result1 == [ - "Bacteria (None) (used: 3, children: 2)", + "Bacteria (None) (used: 4, children: 2)", + ] + + def test_usage_count_lineage_count_across_same_course(self) -> None: + """ + Test that the usage count is correct and parent counts are included based on + child tags being added to an object. However, we de-duplicate and only count + 1 parent tag towards each object even if 2 children are applied to that object + """ + self.taxonomy.allow_multiple = True + self.taxonomy.save() + api.tag_object(object_id="obj01", taxonomy=self.taxonomy, tags=["Bacteria", "Archaebacteria", "Eubacteria"]) + api.tag_object(object_id="obj02", taxonomy=self.taxonomy, tags=["Archaebacteria"]) + # Now the API should reflect these usage counts: + result = pretty_format_tags(self.taxonomy.get_filtered_tags(search_term="bacteria", include_counts=True)) + assert result == [ + "Bacteria (None) (used: 2, children: 2)", + " Archaebacteria (Bacteria) (used: 2, children: 0)", + " Eubacteria (Bacteria) (used: 1, children: 0)", + ] + # Same with depth=1, which uses a different query internally: + result1 = pretty_format_tags( + self.taxonomy.get_filtered_tags(search_term="bacteria", include_counts=True, depth=1) + ) + assert result1 == [ + "Bacteria (None) (used: 2, children: 2)", + ] + + def test_usage_count_rolls_up_to_ancestors_deep(self) -> None: + """ + When a child tag (depth 3) is applied to an object, it should + roll up the count to all its ancestors when using _get_filtered_tags_deep. + The child tag and each of its ancestors should have usage_count=1. + """ + api.tag_object("obj:1", self.taxonomy, [self.mammalia.value]) + result = pretty_format_tags(self.taxonomy.get_filtered_tags(include_counts=True)) + assert result == [ + "Archaea (None) (used: 0, children: 3)", + " DPANN (Archaea) (used: 0, children: 0)", + " Euryarchaeida (Archaea) (used: 0, children: 0)", + " Proteoarchaeota (Archaea) (used: 0, children: 0)", + "Bacteria (None) (used: 0, children: 2)", + " Archaebacteria (Bacteria) (used: 0, children: 0)", + " Eubacteria (Bacteria) (used: 0, children: 0)", + "Eukaryota (None) (used: 1, children: 5)", + " Animalia (Eukaryota) (used: 1, children: 7)", + " Arthropoda (Animalia) (used: 0, children: 0)", + " Chordata (Animalia) (used: 1, children: 1)", + " Mammalia (Chordata) (used: 1, children: 0)", + " Cnidaria (Animalia) (used: 0, children: 0)", + " Ctenophora (Animalia) (used: 0, children: 0)", + " Gastrotrich (Animalia) (used: 0, children: 0)", + " Placozoa (Animalia) (used: 0, children: 0)", + " Porifera (Animalia) (used: 0, children: 0)", + " Fungi (Eukaryota) (used: 0, children: 0)", + " Monera (Eukaryota) (used: 0, children: 0)", + " Plantae (Eukaryota) (used: 0, children: 0)", + " Protista (Eukaryota) (used: 0, children: 0)", + ] + + def test_usage_count_multiple_objects_same_tag_deep(self) -> None: + """ + When two distinct objects (e.g. separate courses, modules, etc.) are tagged + with the same child tag, it should count 2 for that tag (and roll up 2 + to ancestors). Each distinct object should contribute exactly 1 to the count. + """ + api.tag_object("obj:1", self.taxonomy, [self.chordata.value]) + api.tag_object("obj:2", self.taxonomy, [self.chordata.value]) + result = pretty_format_tags( + self.taxonomy.get_filtered_tags(search_term="chordata", include_counts=True) + ) + assert result == [ + "Eukaryota (None) (used: 2, children: 1)", + " Animalia (Eukaryota) (used: 2, children: 1)", + " Chordata (Animalia) (used: 2, children: 0)", + ] + + def test_usage_count_sibling_tags_same_object_deduplication_deep(self) -> None: + """ + When one object is tagged with two sibling tags (both children of the same + parent), the parent's usage_count should be 1, not 2. It should de-duplicate. + """ + self.taxonomy.allow_multiple = True + self.taxonomy.save() + # Eubacteria and Archaebacteria are both children of Bacteria + api.tag_object("obj:1", self.taxonomy, [self.eubacteria.value, self.archaebacteria.value]) + result = pretty_format_tags( + self.taxonomy.get_filtered_tags(search_term="bacteria", include_counts=True) + ) + assert result == [ + "Bacteria (None) (used: 1, children: 2)", + " Archaebacteria (Bacteria) (used: 1, children: 0)", + " Eubacteria (Bacteria) (used: 1, children: 0)", + ] + + def test_usage_count_sibling_tags_different_objects_deep(self) -> None: + """ + When two different objects are each tagged with a different sibling tag, + the parent's usage_count should be 2, not 1. + """ + api.tag_object("obj:1", self.taxonomy, [self.eubacteria.value]) + api.tag_object("obj:2", self.taxonomy, [self.archaebacteria.value]) + result = pretty_format_tags( + self.taxonomy.get_filtered_tags(search_term="bacteria", include_counts=True) + ) + assert result == [ + "Bacteria (None) (used: 2, children: 2)", + " Archaebacteria (Bacteria) (used: 1, children: 0)", + " Eubacteria (Bacteria) (used: 1, children: 0)", + ] + + def test_usage_count_one_level_root_tags(self) -> None: + """ + _get_filtered_tags_one_level (depth=1) with include_counts=True should + reflect the rolled-up usage count, not just direct usage. + Tagging an object with a child tag should increment the root tag's count. + """ + api.tag_object("obj:1", self.taxonomy, [self.eubacteria.value]) # child of Bacteria + result = pretty_format_tags( + self.taxonomy.get_filtered_tags(depth=1, include_counts=True) + ) + assert result == [ + "Archaea (None) (used: 0, children: 3)", + "Bacteria (None) (used: 1, children: 2)", + "Eukaryota (None) (used: 0, children: 5)", + ] + + def test_usage_count_one_level_child_tags(self) -> None: + """ + When listing children of a tag (depth=1, parent_tag_value=...), the + usage_count of each child should only reflect the objects tagged with + that child or any of its descendants. + """ + api.tag_object("obj:1", self.taxonomy, [self.mammalia.value]) # grandchild of Animalia via Chordata + api.tag_object("obj:2", self.taxonomy, [self.chordata.value]) # direct child of Animalia + result = pretty_format_tags( + self.taxonomy.get_filtered_tags(depth=1, parent_tag_value="Animalia", include_counts=True) + ) + assert result == [ + " Arthropoda (Animalia) (used: 0, children: 0)", + " Chordata (Animalia) (used: 2, children: 1)", + " Cnidaria (Animalia) (used: 0, children: 0)", + " Ctenophora (Animalia) (used: 0, children: 0)", + " Gastrotrich (Animalia) (used: 0, children: 0)", + " Placozoa (Animalia) (used: 0, children: 0)", + " Porifera (Animalia) (used: 0, children: 0)", + ] + + def test_usage_count_three_levels_deep_rollup(self) -> None: + """ + Tagging an object with a depth-3 tag (Chordata) should roll up + to grandparent (Animalia) and great-grandparent (Eukaryota), + verifying the full 3-level lineage query in add_counts_query. + """ + api.tag_object("obj:1", self.taxonomy, [self.animalia.value]) + api.tag_object("obj:1", self.taxonomy, [self.chordata.value]) + result = pretty_format_tags( + self.taxonomy.get_filtered_tags(search_term="chordata", include_counts=True) + ) + assert result == [ + "Eukaryota (None) (used: 1, children: 1)", + " Animalia (Eukaryota) (used: 1, children: 1)", + " Chordata (Animalia) (used: 1, children: 0)", + ] + + def test_usage_count_returns_zero_not_none_deep(self) -> None: + """ + When no object has been tagged with a tag or any of its + descendants, usage_count must be 0 (integer), not None. + """ + result = pretty_format_tags(self.taxonomy.get_filtered_tags(include_counts=True)) + assert result == [ + "Archaea (None) (used: 0, children: 3)", + " DPANN (Archaea) (used: 0, children: 0)", + " Euryarchaeida (Archaea) (used: 0, children: 0)", + " Proteoarchaeota (Archaea) (used: 0, children: 0)", + "Bacteria (None) (used: 0, children: 2)", + " Archaebacteria (Bacteria) (used: 0, children: 0)", + " Eubacteria (Bacteria) (used: 0, children: 0)", + "Eukaryota (None) (used: 0, children: 5)", + " Animalia (Eukaryota) (used: 0, children: 7)", + " Arthropoda (Animalia) (used: 0, children: 0)", + " Chordata (Animalia) (used: 0, children: 1)", + " Mammalia (Chordata) (used: 0, children: 0)", + " Cnidaria (Animalia) (used: 0, children: 0)", + " Ctenophora (Animalia) (used: 0, children: 0)", + " Gastrotrich (Animalia) (used: 0, children: 0)", + " Placozoa (Animalia) (used: 0, children: 0)", + " Porifera (Animalia) (used: 0, children: 0)", + " Fungi (Eukaryota) (used: 0, children: 0)", + " Monera (Eukaryota) (used: 0, children: 0)", + " Plantae (Eukaryota) (used: 0, children: 0)", + " Protista (Eukaryota) (used: 0, children: 0)", + ] + + def test_usage_count_with_search_term_deep(self) -> None: + """ + When using get_filtered_tags() with both a search_term and + include_counts=True, the usage_count returned should still + reflect the true count for each matching tag, not be affected + by the search filter. + """ + api.tag_object("obj:1", self.taxonomy, [self.eubacteria.value]) + api.tag_object("obj:2", self.taxonomy, [self.archaebacteria.value]) + result = pretty_format_tags( + self.taxonomy.get_filtered_tags(search_term="bacteria", include_counts=True) + ) + assert result == [ + "Bacteria (None) (used: 2, children: 2)", + " Archaebacteria (Bacteria) (used: 1, children: 0)", + " Eubacteria (Bacteria) (used: 1, children: 0)", ] def test_tree_sort(self) -> None: @@ -575,7 +789,7 @@ def test_tree_sort(self) -> None: taxonomy = self.create_sort_test_taxonomy() result = pretty_format_tags(taxonomy.get_filtered_tags()) assert result == [ - "1 (None) (children: 4 + 1)", + "1 (None) (children: 4)", " 1 A (1) (children: 0)", " 11 (1) (children: 0)", " 11111 (1) (children: 1)", @@ -595,43 +809,6 @@ def test_tree_sort(self) -> None: " azure (ALPHABET) (children: 0)", ] - def test_descendant_counts(self) -> None: - """ - Test getting the descendant count on a taxonomy known to cause aggregation - bugs unless the aggregations are correctly specified with distinct=True - - https://docs.djangoproject.com/en/5.0/topics/db/aggregation/#combining-multiple-aggregations - """ - taxonomy = api.create_taxonomy("ESDC Subset") - api.add_tag_to_taxonomy(taxonomy, "Interests") # root tag - api.add_tag_to_taxonomy(taxonomy, "Holland Codes", parent_tag_value="Interests") # child tag - # Create the grandchild tag: - g_tag = api.add_tag_to_taxonomy(taxonomy, "Interests - Holland Codes", parent_tag_value="Holland Codes") - # Create the 6 great-grandchild tags: - api.add_tag_to_taxonomy(taxonomy, "Artistic", parent_tag_value=g_tag.value) - api.add_tag_to_taxonomy(taxonomy, "Conventional", parent_tag_value=g_tag.value) - api.add_tag_to_taxonomy(taxonomy, "Enterprising", parent_tag_value=g_tag.value) - api.add_tag_to_taxonomy(taxonomy, "Investigative", parent_tag_value=g_tag.value) - api.add_tag_to_taxonomy(taxonomy, "Realistic", parent_tag_value=g_tag.value) - api.add_tag_to_taxonomy(taxonomy, "Social", parent_tag_value=g_tag.value) - - result = pretty_format_tags(taxonomy.get_filtered_tags(depth=1, include_counts=True)) - assert result == [ - "Interests (None) (used: 0, children: 1 + 7)", # 1 child + (1 grandchild and 6 great grandchild tags) - ] - result2 = pretty_format_tags(taxonomy.get_filtered_tags(depth=None, include_counts=True)) - assert result2 == [ - "Interests (None) (used: 0, children: 1 + 7)", - " Holland Codes (Interests) (used: 0, children: 1 + 6)", - " Interests - Holland Codes (Holland Codes) (used: 0, children: 6)", - " Artistic (Interests - Holland Codes) (used: 0, children: 0)", - " Conventional (Interests - Holland Codes) (used: 0, children: 0)", - " Enterprising (Interests - Holland Codes) (used: 0, children: 0)", - " Investigative (Interests - Holland Codes) (used: 0, children: 0)", - " Realistic (Interests - Holland Codes) (used: 0, children: 0)", - " Social (Interests - Holland Codes) (used: 0, children: 0)", - ] - class TestFilteredTagsFreeTextTaxonomy(TestCase): """ diff --git a/tests/openedx_tagging/test_views.py b/tests/openedx_tagging/test_views.py index 63de07155..041eec03d 100644 --- a/tests/openedx_tagging/test_views.py +++ b/tests/openedx_tagging/test_views.py @@ -1455,8 +1455,8 @@ def test_small_taxonomy(self): "Bacteria (None) (children: 2)", " Archaebacteria (Bacteria) (children: 0)", " Eubacteria (Bacteria) (children: 0)", - "Eukaryota (None) (children: 5 + 8)", - " Animalia (Eukaryota) (children: 7 + 1)", + "Eukaryota (None) (children: 5)", + " Animalia (Eukaryota) (children: 7)", " Arthropoda (Animalia) (children: 0)", " Chordata (Animalia) (children: 1)", " Mammalia (Chordata) (children: 0)", @@ -1504,7 +1504,7 @@ def test_small_taxonomy_paged(self): assert next_response.status_code == status.HTTP_200_OK next_data = next_response.data assert pretty_format_tags(next_data["results"]) == [ - "Eukaryota (None) (children: 5 + 8)", + "Eukaryota (None) (children: 5)", ] assert next_data.get("current_page") == 2 @@ -1614,7 +1614,8 @@ def test_large_taxonomy(self): self.client.force_authenticate(user=self.staff) url = self.large_taxonomy_url + "?include_counts" - with self.assertNumQueries(3): + # 4 queries, including 1 for max depth for counts + with self.assertNumQueries(4): response = self.client.get(url) assert response.status_code == status.HTTP_200_OK @@ -1623,16 +1624,16 @@ def test_large_taxonomy(self): results = data["results"] assert pretty_format_tags(results) == [ - "Tag 0 (None) (used: 0, children: 12 + 144)", - "Tag 1099 (None) (used: 0, children: 12 + 144)", - "Tag 1256 (None) (used: 0, children: 12 + 144)", - "Tag 1413 (None) (used: 0, children: 12 + 144)", - "Tag 157 (None) (used: 0, children: 12 + 144)", - "Tag 1570 (None) (used: 0, children: 12 + 144)", - "Tag 1727 (None) (used: 0, children: 12 + 144)", - "Tag 1884 (None) (used: 0, children: 12 + 144)", - "Tag 2041 (None) (used: 0, children: 12 + 144)", - "Tag 2198 (None) (used: 0, children: 12 + 144)", + "Tag 0 (None) (used: 0, children: 12)", + "Tag 1099 (None) (used: 0, children: 12)", + "Tag 1256 (None) (used: 0, children: 12)", + "Tag 1413 (None) (used: 0, children: 12)", + "Tag 157 (None) (used: 0, children: 12)", + "Tag 1570 (None) (used: 0, children: 12)", + "Tag 1727 (None) (used: 0, children: 12)", + "Tag 1884 (None) (used: 0, children: 12)", + "Tag 2041 (None) (used: 0, children: 12)", + "Tag 2198 (None) (used: 0, children: 12)", # ... there are 41 more root tags but they're excluded from this first result page. ] @@ -1703,8 +1704,8 @@ def test_large_search(self): data = response.data results = data["results"] assert pretty_format_tags(results)[:20] == [ - "Tag 0 (None) (children: 3 + 10)", # First 2 results don't match but have children that match - # Note the count here ---^ is not the total number of matching descendants, just the number of children + "Tag 0 (None) (children: 3)", # First 2 results don't match but have children that match + # Note the count here ---^ is not the total number of matching children, just the number of children # once we filter the tree to include only matches and their ancestors. " Tag 1 (Tag 0) (children: 1)", " Tag 11 (Tag 1) (children: 0)", @@ -1719,7 +1720,7 @@ def test_large_search(self): " Tag 117 (Tag 105) (children: 0)", " Tag 118 (Tag 0) (children: 1)", " Tag 119 (Tag 118) (children: 0)", - "Tag 1099 (None) (children: 9 + 93)", + "Tag 1099 (None) (children: 9)", " Tag 1100 (Tag 1099) (children: 12)", " Tag 1101 (Tag 1100) (children: 0)", " Tag 1102 (Tag 1100) (children: 0)", @@ -1741,16 +1742,16 @@ def test_large_search(self): results2 = data2["results"] assert pretty_format_tags(results2) == [ # Notice that none of these root tags directly match the search query, but their children/grandchildren do - "Tag 0 (None) (children: 3 + 10)", - "Tag 1099 (None) (children: 9 + 93)", - "Tag 1256 (None) (children: 2 + 2)", - "Tag 1413 (None) (children: 1 + 1)", - "Tag 157 (None) (children: 2 + 2)", - "Tag 1570 (None) (children: 2 + 2)", - "Tag 1727 (None) (children: 1 + 1)", - "Tag 1884 (None) (children: 2 + 1)", - "Tag 2041 (None) (children: 1 + 10)", - "Tag 2198 (None) (children: 2 + 2)", + "Tag 0 (None) (children: 3)", + "Tag 1099 (None) (children: 9)", + "Tag 1256 (None) (children: 2)", + "Tag 1413 (None) (children: 1)", + "Tag 157 (None) (children: 2)", + "Tag 1570 (None) (children: 2)", + "Tag 1727 (None) (children: 1)", + "Tag 1884 (None) (children: 2)", + "Tag 2041 (None) (children: 1)", + "Tag 2198 (None) (children: 2)", ] assert data2.get("count") == 51 assert data2.get("num_pages") == 6 diff --git a/tests/openedx_tagging/utils.py b/tests/openedx_tagging/utils.py index 3e959d960..7b038ac42 100644 --- a/tests/openedx_tagging/utils.py +++ b/tests/openedx_tagging/utils.py @@ -22,10 +22,6 @@ def pretty_format_tags(result, parent=True, external_id=False) -> list[str]: if "usage_count" in t: line += f"used: {t['usage_count']}, " line += f"children: {t['child_count']}" - # In addition to the direct children, how many total descendants are there? - deeper_descendants = t['descendant_count'] - t['child_count'] - if deeper_descendants != 0: - line += f" + {deeper_descendants}" line += ")" pretty_results.append(line) return pretty_results