gh-146192: Add base32 support to binascii by kangtastic · Pull Request #146193 · python/cpython

kangtastic · 2026-03-20T08:00:51Z

Synopsis

Add base32 encoder and decoder functions implemented in C to binascii and use them to greatly improve the performance and reduce the memory usage of the existing base32 codec functions in base64.

No API or documentation changes are necessary with respect to any functions in base64, and all existing unit tests for those functions continue to pass without modification.

Resolves: gh-146192

Discussion

The base32-related functions in base64 are now wrappers for the new functions in binascii, as envisioned in the docs:

The binascii module contains a number of methods to convert between binary and various ASCII-encoded binary representations. Normally, you will not use these functions directly but use wrapper modules like uu or base64 instead. The binascii module contains low-level functions written in C for greater speed that are used by the higher-level modules.

Comments and questions are welcome.

Benchmarks

Benchmark script

# bench_b32.py

# Note: Can be EXTREMELY SLOW on unmodified mainline CPython.

import base64
import sys
import timeit
import tracemalloc

funcs = [(base64.b64encode, base64.b64decode), # sanity check/comparison
         (base64.b32encode, base64.b32decode),
         (base64.b32hexencode, base64.b32hexdecode)]

def mb(n):
    return f"{n / 1024 / 1024:.3f}"

def stats(func, data, t, m):
    name, n, bps = func.__qualname__, len(data), len(data) / t
    print(f"{name:<16}{n:<16}{t:<11.3f}{mb(bps):<13}{mb(m)}")

if __name__ == "__main__":
    print(f"Python {sys.version}\n")
    print(f"function        processed (b)   time (s)   avg (MB/s)   mem (MB)\n")
    data = b"a" * int(sys.argv[1]) * 1024 * 1024
    for fenc, fdec in funcs:
        tracemalloc.start()
        enc = fenc(data)
        menc = tracemalloc.get_traced_memory()[1] - len(enc)
        tracemalloc.stop()
        tenc = timeit.timeit("fenc(data)", number=1, globals=globals())
        stats(fenc, data, tenc, menc)

        tracemalloc.start()
        dec = fenc(enc)
        mdec = tracemalloc.get_traced_memory()[1] - len(dec)
        tracemalloc.stop()
        tdec = timeit.timeit("fdec(enc)", number=1, globals=globals())
        stats(fdec, enc, tdec, mdec)

Unmodified mainline CPython

$ ./python bench_b32.py 16
Python 3.15.0a7+ (heads/main:d357a7dbf38, Mar 19 2026, 23:22:25) [GCC 15.2.0]

function        processed (b)   time (s)   avg (MB/s)   mem (MB)

b64encode       16777216        0.015      1088.370     0.000
b64decode       22369624        0.017      1264.389     0.000
b32encode       16777216        2.308      6.933        17.382
b32decode       26843552        3.389      7.553        27.787
b32hexencode    16777216        2.338      6.843        17.379
b32hexdecode    26843552        3.388      7.557        27.787

With this PR

$ ./python bench_b32.py 16
Python 3.15.0a7+ (heads/base32-accel:72fd0f0302a, Mar 20 2026, 00:04:23) [GCC 15.2.0]

function        processed (b)   time (s)   avg (MB/s)   mem (MB)

b64encode       16777216        0.015      1084.957     0.000
b64decode       22369624        0.016      1363.524     0.000
b32encode       16777216        0.017      967.528      0.000
b32decode       26843552        0.016      1581.002     0.000
b32hexencode    16777216        0.016      995.277      0.000
b32hexdecode    26843552        0.016      1588.353     0.000

Encoding performance is improved by ~150x, decoding performance is improved by ~200x,
and no auxiliary memory is used.

📚 Documentation preview 📚: https://cpython-previews--146193.org.readthedocs.build/

Add base32 encoder and decoder functions implemented in C to `binascii` and use them to greatly improve the performance and reduce the memory usage of the existing base32 codec functions in `base64`. No API or documentation changes are necessary with respect to any functions in `base64`, and all existing unit tests for those functions continue to pass without modification. Resolves: pythongh-146192

serhiy-storchaka · 2026-03-20T15:15:33Z

You can now update your PR, @kangtastic.

kangtastic · 2026-03-20T15:50:58Z

@serhiy-storchaka Already on it 😄

- Use the new `alphabet` parameter in `binascii` - Remove `binascii.a2b_base32hex()` and `binascii.b2a_base32hex()` - Change value for `.. versionadded::` ReST directive in docs for new `binascii` functions to "next" instead of "3.15"

serhiy-storchaka

I added some suggestions, but the core LGTM.

Please add assertions for new alphabets in test_constants.

serhiy-storchaka · 2026-03-21T09:29:36Z

Doc/library/binascii.rst

+
+.. function:: b2a_base32(data, /, *, alphabet=BASE32_ALPHABET)
+
+   Convert binary data to a line(s) of ASCII characters in base32 coding,


It is a single line.

I will add wrapcol in a separate issue.

serhiy-storchaka · 2026-03-21T09:37:23Z

Doc/library/binascii.rst

+
+   Convert base32 data back to binary and return the binary data.
+
+   Valid base32 data:


This list is incomplete and redundant. I think it is better to follow the example of ascii85 and base85 (with a reference to the RFC). Mention that the mapping is case-sensitive and no optional mapping of the digit "0" and "1" to letters "O", "I" or "l" is used.

serhiy-storchaka · 2026-03-21T09:51:12Z

Doc/library/binascii.rst


+.. data:: BASE32_ALPHABET
+
+   The base32 alphabet according to :rfc:`4648`.


Suggested change

The base32 alphabet according to :rfc:`4648`.

The Base 32 alphabet according to :rfc:`4648`.

serhiy-storchaka · 2026-03-21T09:54:19Z

Doc/library/binascii.rst

+
+.. data:: BASE32HEX_ALPHABET
+
+   The "Extended Hex" base32hex alphabet according to :rfc:`4648`.


Suggested change

The "Extended Hex" base32hex alphabet according to :rfc:`4648`.

The "Extended Hex" Base 32 alphabet according to :rfc:`4648`.

These are the names used in the table 3 and 4 captions in RFC 4648.

Oh, we can even refer directly to the table:

Suggested change

The "Extended Hex" base32hex alphabet according to :rfc:`4648`.

The "Extended Hex" Base 32 alphabet according to :rfc:`4648`, table 4.

Add this also for Base 64 alphabets if you choose this variant.

I was wondering if this would come up. RFC 4648 uses all four of the terms "Base 32", "Base32", "base 32", and "base32" to refer to this encoding at various points, but it also states e.g.:

This encoding may be referred to as "base32hex". This encoding should not be regarded as the same as the "base32" encoding and should not be referred to as only "base32".

and e.g.:

One property with this alphabet, which the base64 and base32 alphabets lack...

thus implying that "base32" and "base32hex" are preferred, even if the rest of the document doesn't adhere to the implication.

Anyway, I'll refer to it as "Base 32" in docs for now to fit what's already there, and not reference the table number or touch any Base64 stuff so as to keep the scope of this PR limited.

serhiy-storchaka · 2026-03-21T10:06:42Z

Lib/base64.py

    if len(s) % 8:
        raise binascii.Error('Incorrect padding')


Should not this be handled in the C code?

serhiy-storchaka · 2026-03-21T10:18:44Z

Lib/base64.py

-        _b32rev[alphabet] = {v: k for k, v in enumerate(alphabet)}
+
+def _b32decode_prepare(s, casefold=False, map01=None):
    s = _bytes_from_decode_data(s)


This is only needed if map01 is not None.

serhiy-storchaka · 2026-03-21T10:20:00Z

Lib/base64.py

-    if alphabet not in _b32rev:
-        _b32rev[alphabet] = {v: k for k, v in enumerate(alphabet)}
+
+def _b32decode_prepare(s, casefold=False, map01=None):


I suggest to inline this function. map01 handling is only needed for standard alphabet, and the code for casefold is trivial.

serhiy-storchaka · 2026-03-21T10:33:06Z

Modules/binascii.c

+    *
+    alphabet: Py_buffer(c_default="{NULL, NULL}") = BASE32_ALPHABET
+
+base32-code line of data.


Suggested change

base32-code line of data.

Base32-code line of data.

- Update docs to refer to "Base 32" and "Base32" - Update docs to better explain `binascii.a2b_base32()` - Inline helper function in `base64` - Add forgotten tests for presence of alphabet module globals

bedevere-app bot mentioned this pull request Mar 20, 2026

C accelerator for Base32 character encoding #146192

Open

serhiy-storchaka requested review from gpshead and serhiy-storchaka March 20, 2026 09:00

Update PR for python#145981

bf1308f

- Use the new `alphabet` parameter in `binascii` - Remove `binascii.a2b_base32hex()` and `binascii.b2a_base32hex()` - Change value for `.. versionadded::` ReST directive in docs for new `binascii` functions to "next" instead of "3.15"

kangtastic force-pushed the base32-accel branch from db96a3f to bf1308f Compare March 20, 2026 16:01

kangtastic marked this pull request as ready for review March 20, 2026 16:03

bedevere-app bot added the awaiting review label Mar 20, 2026

serhiy-storchaka reviewed Mar 21, 2026

View reviewed changes

kangtastic added 2 commits March 21, 2026 07:56

Address reviewer feedback

a9a7d26

- Update docs to refer to "Base 32" and "Base32" - Update docs to better explain `binascii.a2b_base32()` - Inline helper function in `base64` - Add forgotten tests for presence of alphabet module globals

Update generated files

6f80c54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-146192: Add base32 support to binascii#146193

gh-146192: Add base32 support to binascii#146193
kangtastic wants to merge 4 commits intopython:mainfrom
kangtastic:base32-accel

kangtastic commented Mar 20, 2026 •

edited

Loading

Uh oh!

serhiy-storchaka commented Mar 20, 2026

Uh oh!

kangtastic commented Mar 20, 2026

Uh oh!

serhiy-storchaka left a comment

Uh oh!

serhiy-storchaka Mar 21, 2026

Uh oh!

serhiy-storchaka Mar 21, 2026

Uh oh!

serhiy-storchaka Mar 21, 2026

Uh oh!

serhiy-storchaka Mar 21, 2026

Uh oh!

kangtastic Mar 21, 2026 •

edited

Loading

Uh oh!

serhiy-storchaka Mar 21, 2026

Uh oh!

serhiy-storchaka Mar 21, 2026

Uh oh!

serhiy-storchaka Mar 21, 2026

Uh oh!

serhiy-storchaka Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		.. function:: b2a_base32(data, /, *, alphabet=BASE32_ALPHABET)

		Convert binary data to a line(s) of ASCII characters in base32 coding,


		Convert base32 data back to binary and return the binary data.

		Valid base32 data:


		.. data:: BASE32_ALPHABET

		The base32 alphabet according to :rfc:`4648`.

	The base32 alphabet according to :rfc:`4648`.
	The Base 32 alphabet according to :rfc:`4648`.


		.. data:: BASE32HEX_ALPHABET

		The "Extended Hex" base32hex alphabet according to :rfc:`4648`.

	The "Extended Hex" base32hex alphabet according to :rfc:`4648`.
	The "Extended Hex" Base 32 alphabet according to :rfc:`4648`.

Uh oh!

Conversation

kangtastic commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Synopsis

Discussion

Benchmarks

Benchmark script

Unmodified mainline CPython

With this PR

Uh oh!

serhiy-storchaka commented Mar 20, 2026

Uh oh!

kangtastic commented Mar 20, 2026

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kangtastic Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kangtastic commented Mar 20, 2026 •

edited

Loading

kangtastic Mar 21, 2026 •

edited

Loading