Skip to content

Conversation

@kangtastic
Copy link

@kangtastic kangtastic commented Mar 16, 2023

Synopsis

Add Ascii85, Base85, and Z85 encoder and decoder functions implemented in C to binascii and use them to greatly improve the performance and reduce the memory usage of the existing Ascii85, Base85, and Z85 codec functions in base64.

No API or documentation changes are necessary with respect to any functions in base64, and all existing unit tests for those functions continue to pass without modification.

Resolves: gh-101178

Discussion

The base85-related functions in base64 are now wrappers for the new functions in binascii, as envisioned in the docs:

The binascii module contains a number of methods to convert between binary and various ASCII-encoded binary representations. Normally, you will not use these functions directly but use wrapper modules like uu or base64 instead. The binascii module contains low-level functions written in C for greater speed that are used by the higher-level modules.

Parting out Ascii85 from Base85 and Z85 was warranted in my testing despite the code duplication due to the various performance-murdering special cases in Ascii85.

Comments and questions are welcome.

Benchmarks

Updated December 28, 2025.

# bench_b85.py

# Note: EXTREMELY SLOW on unmodified mainline CPython
#       when tracing malloc on the base-85 functions.

import base64
import sys
import timeit
import tracemalloc

funcs = [(base64.b64encode, base64.b64decode),  # sanity check/comparison
         (base64.a85encode, base64.a85decode),
         (base64.b85encode, base64.b85decode),
         (base64.z85encode, base64.z85decode)]

def mb(n):
    return f"{n / 1024 / 1024:.3f} MB"

def stats(func, data, t, m):
    name, n, bps = func.__qualname__, len(data), len(data) / t
    print(f"{name} : {n} b in {t:.3f} s ({mb(bps)}/s) using {mb(m)}")

if __name__ == "__main__":
    data = b"a" * int(sys.argv[1]) * 1024 * 1024
    for fenc, fdec in funcs:
        tracemalloc.start()
        enc = fenc(data)
        menc = tracemalloc.get_traced_memory()[1] - len(enc)
        tracemalloc.stop()
        tenc = timeit.timeit("fenc(data)", number=1, globals=globals())
        stats(fenc, data, tenc, menc)

        tracemalloc.start()
        dec = fenc(enc)
        mdec = tracemalloc.get_traced_memory()[1] - len(dec)
        tracemalloc.stop()
        tdec = timeit.timeit("fdec(enc)", number=1, globals=globals())
        stats(fdec, enc, tdec, mdec)
# Python 3.15.0a3+ (heads/main:0efbad60e13, Dec 28 2025, 11:02:16)
# ./configure --enable-optimizations --with-lto

# Unmodified
$ time ./python bench_b85.py 64
b64encode : 67108864 b in 0.092 s (693.266 MB/s) using 42.667 MB
b64decode : 89478488 b in 0.234 s (364.961 MB/s) using 56.889 MB
a85encode : 67108864 b in 7.163 s (8.935 MB/s) using 2664.401 MB
a85decode : 83886080 b in 14.478 s (5.526 MB/s) using 3332.254 MB
b85encode : 67108864 b in 6.965 s (9.189 MB/s) using 2664.401 MB
b85decode : 83886080 b in 10.082 s (7.935 MB/s) using 3332.254 MB
z85encode : 67108864 b in 7.245 s (8.834 MB/s) using 2664.102 MB
z85decode : 83886080 b in 9.666 s (8.277 MB/s) using 3332.254 MB

real    9m44.382s
user    9m27.271s
sys     0m12.747s


# With this PR
b64encode : 67108864 b in 0.085 s (753.375 MB/s) using 42.667 MB
b64decode : 89478488 b in 0.230 s (371.282 MB/s) using 56.889 MB
a85encode : 67108864 b in 0.094 s (681.709 MB/s) using 0.000 MB
a85decode : 83886080 b in 0.191 s (418.019 MB/s) using 0.000 MB
b85encode : 67108864 b in 0.075 s (850.118 MB/s) using 0.000 MB
b85decode : 83886080 b in 0.141 s (567.490 MB/s) using 0.000 MB
z85encode : 67108864 b in 0.074 s (864.559 MB/s) using 0.000 MB
z85decode : 83886080 b in 0.173 s (462.854 MB/s) using 0.000 MB

real    0m1.865s
user    0m1.726s
sys     0m0.126s

The old pure-Python implementation is two orders of magnitude slower and uses over O(40n) temporary memory.

@ghost
Copy link

ghost commented Mar 16, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@kangtastic kangtastic changed the title Add Ascii85 and base85 support to binascii gh-101178: Add Ascii85 and base85 support to binascii Mar 16, 2023
@arhadthedev arhadthedev added the stdlib Standard Library Python modules in the Lib/ directory label Mar 23, 2023
@kangtastic
Copy link
Author

kangtastic commented Mar 19, 2024

It's a year later, and Z85 support has been added to base64 in the meantime. So while bringing this PR up to date with main, I added Z85 support to it as well.

For reference, this is the benchmark run that led me to do so.

# After merging main but before adding Z85 support to this PR
(cpython-b85) $ python bench_b85.py 64
b64encode : 67108864 b in 0.121 s (527.435 MB/s) using 42.667 MB
b64decode : 89478488 b in 0.309 s (276.188 MB/s) using 56.889 MB
a85encode : 67108864 b in 0.297 s (215.150 MB/s) using 0.000 MB
a85decode : 83886080 b in 0.205 s (390.751 MB/s) using 0.000 MB
b85encode : 67108864 b in 0.106 s (604.359 MB/s) using 0.000 MB
b85decode : 83886080 b in 0.204 s (393.040 MB/s) using 0.000 MB
z85encode : 67108864 b in 0.204 s (313.610 MB/s) using 80.000 MB
z85decode : 83886080 b in 0.300 s (266.670 MB/s) using 100.000 MB

The existing Z85 implementation translates from the standard base85 alphabet to Z85 after the fact and within Python, so it was already benefiting from this PR but with substantial performance and memory usage overhead. That overhead is now gone.

@kangtastic kangtastic force-pushed the gh-101178-rework-base85 branch from 71f1955 to 7b4aba1 Compare March 19, 2024 09:27
@python-cla-bot
Copy link

python-cla-bot bot commented Apr 18, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

Add Ascii85, base85, and Z85 encoders and decoders to `binascii`,
replacing the existing pure Python implementations in `base64`.

No API or documentation changes are necessary with respect to
`base64.a85encode()`, `b85encode()`, etc., and all existing unit
tests for those functions continue to pass without modification.

Note that attempting to decode Ascii85 or base85 data of length 1 mod 5
(after accounting for Ascii85 quirks) now produces an error, as no
encoder would emit such data. This should be the only significant
externally visible difference compared to the old implementation.

Resolves: pythongh-101178
@kangtastic kangtastic force-pushed the gh-101178-rework-base85 branch from 7b4aba1 to 05ae5ad Compare April 21, 2025 05:16
@kangtastic
Copy link
Author

PR has been rebased onto main at 78cfee6 with squashing.

@kangtastic kangtastic changed the title gh-101178: Add Ascii85 and base85 support to binascii gh-101178: Add Ascii85. base85, and Z85 support to binascii Apr 21, 2025
@kangtastic kangtastic changed the title gh-101178: Add Ascii85. base85, and Z85 support to binascii gh-101178: Add Ascii85, base85, and Z85 support to binascii Apr 21, 2025
@sergey-miryanov
Copy link
Contributor

Note that attempting to decode Ascii85, base85, or Z85 data of length 1 mod 5 now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementations.

I believe you have to document this change.

@kangtastic
Copy link
Author

Note that attempting to decode Ascii85, base85, or Z85 data of length 1 mod 5 now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementations.

I believe you have to document this change.

Fair point, I could do that.

In case anyone argues for keeping the old behavior (silently ignoring length 1 mod 5), I won't do it just yet.

If we were strictly following PEP-0399, _base64 would be a C
module for accelerated functions in base64. Due to historical
reasons, those should actually go in binascii instead.

We still want to preserve the existing Python code in base64.
Parting out facilities for accessing the C functions into a
module named _base64 shouldn't risk a naming conflict and
will simplify testing.
This is done differently to PEP-0399 to minimize the number of
changed lines.
As we're now keeping the existing Python base 85 functions, the C
implementations should behave exactly the same, down to exception
type and wording. It is also no longer an error to try to decode
data of length 1 mod 5.
@kangtastic
Copy link
Author

The PR has been updated to preserve the existing base 85 Python functions in base64 and modify the new base 85 C functions in binascii to closely match their behavior. Notably, trying to decode data of length 1 mod 5 is no longer an error.

Importing update_wrapper() from functools to copy attributes
is expensive. Do it ourselves for only the most relevant ones.
This requires some code duplication, but oh well.
Using a decorator complicates function signature introspection.
Do we really need to test the legacy API twice?
@picnixz
Copy link
Member

picnixz commented Dec 26, 2025

How about separate functions for Base85 and Z85 only on the Python API side? I'm inclined to keep them combined on the C side to avoid code duplication (with wrapper functions for the z85 parameter).

It may be better to separate them for PGO though I have no idea whether there will be an impact or not (this needs to be measured). If you're worried about parts of the code being duplicated, it can be refactored into macros (or into smaller functions). However, we should avoid the user-interface exposing different flavors with boolean switches (we usually try to avoid this, as illustrated by examples of filter/itertoolsfilterfalse and recently fnmatch.filter/fnmatch.filterfalse where the API was explicitly designed to avoid switches).

@serhiy-storchaka
Copy link
Member

Interested to see how you did it.

I pushed these changes. The code is similar to b2a_base85(), with additional special cases. The difference is so small (10-25%) in comparison with 100x of the Python implementation, that it would not be a crime to use the same code for all three encoders (I do not suggest this, this is just an option if we want to minimize the C code).

See also #143216 -- it reuses the same code for wrapping line in b2a_base64().

How about separate functions for Base85 and Z85 only on the Python API side?

Yes, of course. b2a_base85() and b2a_z85() can use the same C code with different arguments.

@kangtastic
Copy link
Author

Hi @picnixz, thanks for reviewing.

A few comments. I'd appreciate, as it's new code, that PEP-7 is followed.

I did attempt to follow PEP-7; did you have something specific in mind? If it's about the old-style variable declarations at the top of each C function, the reason for that was to match much of the rest of binascii.c as PEP-7 allows.

But since @serhiy-storchaka partially undid that in 0df9a40 (apparently following ca99af3) I don't mind cleaning up the style a bit more.

@picnixz
Copy link
Member

picnixz commented Dec 27, 2025

It's mainly to avoid } else { or } else if { and avoid many statements on the same line (AFAICT, it was introduced in this PR, e.g., if (cond) then;). Surrounding consistency is more with respect to the function itself rather than an entire file. I personally don't mind } else { or } else if { but I do mind the following:

    Py_ssize_t out_len = 5 * ((bin_len + 3) / 4);
    if (wrap)                   out_len += 4;
    if (!pad && (bin_len % 4))  out_len -= 4 - (bin_len % 4);
    if (width && out_len)       out_len += (out_len - 1) / width;

I prefer clear if (...) {<nl>...<nl>} in this case.

@kangtastic
Copy link
Author

kangtastic commented Dec 27, 2025

I personally don't mind } else { or } else if { but I do mind the following:

    Py_ssize_t out_len = 5 * ((bin_len + 3) / 4);
    if (wrap)                   out_len += 4;
    if (!pad && (bin_len % 4))  out_len -= 4 - (bin_len % 4);
    if (width && out_len)       out_len += (out_len - 1) / width;

I prefer clear if (...) {<nl>...<nl>} in this case.

Will do.

@kangtastic kangtastic changed the title gh-101178: Add Ascii85, base85, and Z85 support to binascii gh-101178: Add Ascii85, Base85, and Z85 support to binascii Dec 29, 2025
@kangtastic
Copy link
Author

kangtastic commented Dec 29, 2025

The PR has been updated. To summarize the changes:

  • C code style was updated and modernized
  • Z85 functions were added to binascii to get rid of the z85 parameter in e.g. binascii.a2b_base85()
  • pure-Python base-85 codepaths in base64 were removed along with the _base64 module

I also updated the PR description with another round of benchmarks. Notably, the optimizations to binascii.b2a_ascii85() by @serhiy-storchaka are visible on my end as well :)

Regarding C code organization in binascii, I tested separate codepaths for Base85 and Z85, both explicitly by duplicating functions with just the lookup table names changed and by adding the inline keyword to what I have now. With PGO/LTO enabled it doesn't seem to make much of a difference.

Weirdly, Z85 decoding is about 10% slower for me with PGO/LTO enabled. That might mean the Z85 tests actually are causing the compiler to optimize differently, or it might be an artifact of my ancient dev machine or my crude benchmarking methods. In any case the performance gains compared to the pure-Python implementations are still quite substantial so I'm going to leave it at that.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!

I will make final reviews after merging #143262 and #143216 (and maybe some other PRs). Some code, docs and tests can be rewritten after them. Here are some general comments.

Comment on lines +701 to +706
fold_spaces: bool = False
Allow 'y' as a short form encoding four spaces.
wrap: bool = False
Expect data to be wrapped in '<~' and '~>' as in Adobe Ascii85.
ignore: Py_buffer(c_default="NULL", py_default="b''") = None
An optional bytes-like object with input characters to be ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question -- why do you use different parameter names than in base64.a85encode()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just chose some names that sounded OK to me. I didn't like adobe in base64.a85decode() because it seemed too specific, and there wasn't commonality between binascii and base64 parameter names (lots of legacy code everywhere).

After reading #143262 and #143216 I gather that reusing CPython interned strings is implicitly good practice, so I will try to do that if you prefer.

ascii_len -= 2;
if (ascii_len >= 2
&& ascii_data[0] == BASE85_A85_PREFIX
&& ascii_data[1] == BASE85_A85_AFFIX) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 7: if the condition takes multiple lines, move { at a separate line.

Suggested change
&& ascii_data[1] == BASE85_A85_AFFIX) {
&& ascii_data[1] == BASE85_A85_AFFIX)
{

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, missed this, will do.

Comment on lines 745 to 753
Py_ssize_t bin_len = ascii_len;
unsigned char this_ch = 0;
for (Py_ssize_t i = 0; i < ascii_len; i++) {
this_ch = ascii_data[i];
if (this_ch == 'y' || this_ch == 'z') {
bin_len += 4;
}
}
bin_len = 4 * ((bin_len + 4) / 5);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calculation is not accurate, because zs and ys already accounted in the total number of characters. It is also possible to get an integer overflow on 32-bit platform -- try to decode a string of 2**29 zs.

Count separately the number of zs and ys, then the output size is (bin_len - count + 4) / 5 * 4 + count * 4, but check for integer overflow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what you mean? z and y are respectively equivalent to !!!!! and +<VdL, so it's correct to add 4 to ascii_len for each one encountered.

To be thorough I wrote the following to investigate if there are differences between the current and proposed output length calculation. It prints diffs: 0.

#include <stdio.h>

int main() {
  int diffs = 0;

  for (int ascii_len = 1; ascii_len <= 1024; ascii_len++) {
    for (int count_yz = 0; count_yz <= ascii_len; count_yz++) {
      /* Current output length calculation */
      int adj_ascii_len = ascii_len + 4 * count_yz;
      int bin_len = 4 * ((adj_ascii_len + 4) / 5);

      /* Proposed output length calculation */
      int bin_len_2 = (ascii_len - count_yz + 4) / 5 * 4 + count_yz * 4;

      if (bin_len != bin_len_2) {
        printf("ascii_len %d count_yz %d bin_len %d bin_len_2 %d\n",
          ascii_len, count_yz, bin_len, bin_len_2);
        diffs++;
      }
    }
  }

  printf("diffs: %d\n", diffs);

  return 0;
}

I didn't check for integer overflow because the rest of binascii doesn't, but I can add it.

Allow 'y' as a short form encoding four spaces.
wrap: bool = False
Expect data to be wrapped in '<~' and '~>' as in Adobe Ascii85.
ignore: Py_buffer(c_default="NULL", py_default="b''") = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None is not acceptable value, isn't? Use b''.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the signature of a2b_ascii85(string, /, *, fold_spaces=False, wrap=False, ignore=b"") was ignore=None, this would be acceptable. Because it isn't, this clinic input should be ignore: Py_buffer(c_default="NULL", py_default="b''") = b'' instead. Is that what you meant?

Emit 'y' as a short form encoding four spaces.
wrap: bool = False
Wrap result in '<~' and '~>' as in Adobe Ascii85.
width: unsigned_int(bitwise=True) = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bitwise=True make it to wrap, so 2**32 is interpreted as 0.

There is also no benefit of restricting the range to the C's int, it have to support sys.maxsize. So use size_t here. We can also use a special converter, because the current Python code does not have limitation, and negative values interpreted as 1 (although this is an implementation detail, not documented and not tested, so it is not necessary to support this).

Copy link
Author

@kangtastic kangtastic Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll investigate this further and either use size_t or a custom converter.

}

/* Allocate output buffer.
XXX: Do a pre-pass above some threshold estimate (cf. 'yz')?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is worth it.

data: ascii_buffer
/
*
strict_mode: bool = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? It is always True in base64. strict_mode=False is legacy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not so legacy, a strict_mode parameter was added to binascii.a2b_base64() in #24402 for what seem like good security-related reasons that are possibly relevant for binascii.a2b_base85() and binascii.a2b_z85() as well.

*
pad: bool = False
Pad input to a multiple of 4 before encoding.
newline: bool = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? It is always False in base64. newline=True is legacy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one I agree we probably don't really need it. I put it in following the example of binascii.b2a_base64(). Note that binascii.b2a_uu() always appends a newline but binascii.b2a_hex() doesn't. If you prefer I can remove it.

@kangtastic
Copy link
Author

The PR has been updated to address most reviewer comments. I even found some minor decoding performance gains. A bit more polish and this one will be done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review stdlib Standard Library Python modules in the Lib/ directory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

base64.b85encode uses significant amount of RAM

8 participants