Skip to content

Instantly share code, notes, and snippets.

@nhumrich
Created January 20, 2023 20:42
Show Gist options
  • Select an option

  • Save nhumrich/a722fab1ba0d9203f94187651e3e7ac8 to your computer and use it in GitHub Desktop.

Select an option

Save nhumrich/a722fab1ba0d9203f94187651e3e7ac8 to your computer and use it in GitHub Desktop.

Revisions

  1. nhumrich created this gist Jan 20, 2023.
    630 changes: 630 additions & 0 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,630 @@
    PEP: 501
    Title: General purpose string template literals
    Version: $Revision$
    Last-Modified: $Date$
    Author: Nick Coghlan <ncoghlan@gmail.com>, Nick Humrich <nick@humrich.us>
    Status: Draft
    Type: Standards Track
    Content-Type: text/x-rst
    Requires: 498
    Created: 08-Aug-2015
    Python-Version: 3.12
    Post-History: 08-Aug-2015, 23-Aug-2015, 30-Aug-2015

    Abstract
    ========

    PEP 498 added new syntactic support for string interpolation that is
    transparent to the compiler, allow name references from the interpolation
    operation full access to containing namespaces (as with any other expression),
    rather than being limited to explicit name references. These are referred
    to in the PEP as "f-strings" (a mnemonic for "formatted strings").

    Since acceptance of the PEP 498, f-strings have become well-established and very popular,
    but eager rending has its limitations. For example, the eagerness of f-strings
    has made code like the following very likely and common::

    os.system(f"echo {message_from_user}")

    This kind of code is superficially elegant, but poses a significant problem
    if the interpolated value ``message_from_user`` is in fact provided by an
    untrusted user: it's an opening for a form of code injection attack, where
    the supplied user data has not been properly escaped before being passed to
    the ``os.system`` call.

    To address that problem (and a number of other concerns), this PEP proposes
    the complementary introduction of "t-strings" (a mnemonic for "template literal strings"),
    where ``f"Message with {data}"`` would produce the same
    result as ``format(t"Message with {data}")``.

    Some possible examples of the proposed syntax::

    mycommand = sh(t"cat {filename}")
    myquery = sql(t"SELECT {column} FROM {table} WHERE column={value};")
    myresponse = html(t"<html><body>{response.body}</body></html>")
    logging.debug(t"Message with {detailed} {debugging} {info}")


    History
    ============

    This PEP was previously in deferred status, pending further experience with PEP 498's
    simpler approach of only supporting eager rendering without the additional
    complexity of also supporting deferred rendering. Since then, f-strings have become very popular
    and this PEP has been updated to reflect knowledge of f-strings.


    Summary of differences from PEP 498
    ===================================

    The key additions this proposal makes relative to PEP 498:

    * the "t" (template literal) prefix indicates delayed rendering, but
    otherwise uses the same syntax and semantics as formatted strings
    * template literals are available at runtime as a new kind of object
    (``types.TemplateLiteral``)
    * the default rendering used by formatted strings is invoked on an
    template literal object by calling ``format(template)`` rather than
    implicitly
    * while f-string ``f"Message {here}"`` would be *semantically* equivalent to
    ``format(t"Message {here}")``, f-strings currently avoid the runtime overhead of using
    the delayed rendering machinery that is needed for t-strings

    NOTE: This proposal spells out a draft API for ``types.TemplateLiteral``

    Proposal
    ========

    This PEP proposes the introduction of a new string prefix that declares the
    string to be a template literal rather than an ordinary string::

    template = t"Substitute {names} and {expressions()} at runtime"

    This would be effectively interpreted as::

    _raw_template = "Substitute {names:>10} and {expressions()} at runtime"
    _parsed_template = (
    ("Substitute ", "names"),
    (" and ", "expressions()"),
    (" at runtime", None),
    )
    _field_values = (names, expressions())
    _format_specifiers = (">10", "")
    template = types.InterpolationTemplate(_raw_template,
    _parsed_template,
    _field_values,
    _format_specifiers)

    The ``__format__`` method on ``types.TemplateLiteral`` would then
    implement the following ``str.format`` inspired semantics::

    >>> import datetime
    >>> name = 'Jane'
    >>> age = 50
    >>> anniversary = datetime.date(1991, 10, 12)
    >>> format(t'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.')
    'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
    >>> format(t'She said her name is {repr(name)}.')
    "She said her name is 'Jane'."

    As with formatted strings, the template literal prefix can be combined with single-quoted, double-quoted and triple quoted strings, including raw strings.
    It does not support combination with bytes literals.

    Similarly, this PEP does not propose to remove or deprecate any of the existing
    string formatting mechanisms, as those will remain valuable when formatting
    strings that are not present directly in the source code of the application.


    Rationale
    =========

    PEP 498 makes interpolating values into strings with full access to Python's
    lexical namespace semantics simpler, but it does so at the cost of creating a
    situation where interpolating values into sensitive targets like SQL queries,
    shell commands and HTML templates will enjoy a much cleaner syntax when handled
    without regard for code injection attacks than when they are handled correctly.

    This PEP proposes to provide the option of delaying the actual rendering
    of a template literal to its ``__format__`` method, allowing the use of
    other template renderers by passing the template around as a first class object.

    While very different in the technical details, the
    ``types.TemplateLiteral`` interface proposed in this PEP is
    conceptually quite similar to the ``FormattableString`` type underlying the
    `native interpolation <https://msdn.microsoft.com/en-us/library/dn961160.aspx>`__ support introduced in C# 6.0,
    as well as `template literals in Javascript <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals>`__ introduced in es6.


    Specification
    =============

    This PEP proposes the introduction of ``t`` as a new string prefix that
    results in the creation of an instance of a new type,
    ``types.TemplateLiteral``.

    Template literals are Unicode strings (bytes literals are not
    permitted), and string literal concatenation operates as normal, with the
    entire combined literal forming the template literal.

    The template string is parsed into literals, expressions and format specifiers
    as described for f-strings in PEP 498. Conversion specifiers are handled
    by the compiler, and appear as part of the field text in interpolation
    templates.

    However, rather than being rendered directly into a formatted strings, these
    components are instead organised into an instance of a new type with the
    following semantics::

    class InterpolationTemplate:
    __slots__ = ("raw_template", "parsed_template",
    "field_values", "format_specifiers")

    def __new__(cls, raw_template, parsed_template,
    field_values, format_specifiers):
    self = super().__new__(cls)
    self.raw_template = raw_template
    self.parsed_template = parsed_template
    self.field_values = field_values
    self.format_specifiers = format_specifiers
    return self

    def __repr__(self):
    return (f"<{type(self).__qualname__} {repr(self._raw_template)} "
    f"at {id(self):#x}>")

    def __format__(self, format_specifier):
    # When formatted, render to a string, and use string formatting
    return format(self.render(), format_specifier)

    def render(self, *, render_template=''.join,
    render_field=format):
    # See definition of the template rendering semantics below

    The result of an template literal expression is an instance of this
    type, rather than an already rendered string - rendering only takes
    place when the instance's ``render`` method is called (either directly, or
    indirectly via ``__format__``).

    The compiler will pass the following details to the template literal for
    later use:

    * a string containing the raw template as written in the source code
    * a parsed template tuple that allows the renderer to render the
    template without needing to reparse the raw string template for substitution
    fields
    * a tuple containing the evaluated field values, in field substitution order
    * a tuple containing the field format specifiers, in field substitution order

    This structure is designed to take full advantage of compile time constant
    folding by ensuring the parsed template is always constant, even when the
    field values and format specifiers include variable substitution expressions.

    The raw template is just the template literal as a string. By default,
    it is used to provide a human readable representation for the
    template literal.

    The parsed template consists of a tuple of 2-tuples, with each 2-tuple
    containing the following fields:

    * ``leading_text``: a leading string literal. This will be the empty string if
    the current field is at the start of the string, or immediately follows the
    preceding field.
    * ``field_expr``: the text of the expression element in the substitution field.
    This will be None for a final trailing text segment.

    The tuple of evaluated field values holds the *results* of evaluating the
    substitution expressions in the scope where the template literal appears.

    The tuple of field specifiers holds the *results* of evaluating the field
    specifiers as f-strings in the scope where the template literal appears.

    The ``TemplateLiteral.render`` implementation then defines the rendering
    process in terms of the following renderers:

    * an overall ``render_template`` operation that defines how the sequence of
    literal template sections and rendered fields are composed into a fully
    rendered result. The default template renderer is string concatenation
    using ``''.join``.
    * a per field ``render_field`` operation that receives the field value and
    format specifier for substitution fields within the template. The default
    field renderer is the ``format`` builtin.

    Given an appropriate parsed template representation and internal methods of
    iterating over it, the semantics of template rendering would then be equivalent
    to the following::

    def render(self, *, render_template=''.join,
    render_field=format):
    iter_fields = enumerate(self.parsed_template)
    values = self.field_values
    specifiers = self.format_specifiers
    template_parts = []
    for field_pos, (leading_text, field_expr) in iter_fields:
    template_parts.append(leading_text)
    if field_expr is not None:
    value = values[field_pos]
    specifier = specifiers[field_pos]
    rendered_field = render_field(value, specifier)
    template_parts.append(rendered_field)
    return render_template(template_parts)

    Conversion specifiers
    ---------------------

    NOTE:

    Appropriate handling of conversion specifiers is currently an open question.
    Exposing them more directly to custom renderers would increase the
    complexity of the ``TemplateLiteral`` definition without providing an
    increase in expressiveness (since they're redundant with calling the builtins
    directly). At the same time, they *are* made available as arbitrary strings
    when writing custom ``string.Formatter`` implementations, so it may be
    desirable to offer similar levels of flexibility of interpretation in
    template literals.

    The ``!a``, ``!r`` and ``!s`` conversion specifiers supported by ``str.format``
    and hence PEP 498 are handled in template literals as follows:

    * they're included unmodified in the raw template to ensure no information is
    lost
    * they're *replaced* in the parsed template with the corresponding builtin
    calls, in order to ensure that ``field_expr`` always contains a valid
    Python expression
    * the corresponding field value placed in the field values tuple is
    converted appropriately *before* being passed to the template literal

    This means that, for most purposes, the difference between the use of
    conversion specifiers and calling the corresponding builtins in the
    original template literal will be transparent to custom renderers. The
    difference will only be apparent if reparsing the raw template, or attempting
    to reconstruct the original template from the parsed template.

    Writing custom renderers
    ------------------------

    Writing a custom renderer doesn't require any special syntax. Instead,
    custom renderers are ordinary callables that process an interpolation
    template directly either by calling the ``render()`` method with alternate ``render_template`` or ``render_field`` implementations, or by accessing the
    template's data attributes directly.

    For example, the following function would render a template using objects'
    ``repr`` implementations rather than their native formatting support::

    def reprformat(template):
    def render_field(value, specifier):
    return format(repr(value), specifier)
    return template.render(render_field=render_field)

    When writing custom renderers, note that the return type of the overall
    rendering operation is determined by the return type of the passed in ``render_template`` callable. While this is expected to be a string in most
    cases, producing non-string objects *is* permitted. For example, a custom
    template renderer could involve an ``sqlalchemy.sql.text`` call that produces
    an `SQL Alchemy query object <http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#using-textual-sql>`__.

    Non-strings may also be returned from ``render_field``, as long as it is paired
    with a ``render_template`` implementation that expects that behaviour.

    Expression evaluation
    ---------------------

    As with f-strings, the subexpressions that are extracted from the interpolation
    template are evaluated in the context where the template literal
    appears. This means the expression has full access to local, nonlocal and global variables.
    Any valid Python expression can be used inside ``{}``, including
    function and method calls.

    Because the substitution expressions are evaluated where the string appears in
    the source code, there are no additional security concerns related to the
    contents of the expression itself, as you could have also just written the
    same expression and used runtime field parsing::

    >>> bar=10
    >>> def foo(data):
    ... return data + 20
    ...
    >>> str(t'input={bar}, output={foo(bar)}')
    'input=10, output=30'

    Is essentially equivalent to::

    >>> 'input={}, output={}'.format(bar, foo(bar))
    'input=10, output=30'

    Handling code injection attacks
    -------------------------------

    The PEP 498 formatted string syntax makes it potentially attractive to write
    code like the following::

    runquery(f"SELECT {column} FROM {table};")
    runcommand(f"cat {filename}")
    return_response(f"<html><body>{response.body}</body></html>")

    These all represent potential vectors for code injection attacks, if any of the
    variables being interpolated happen to come from an untrusted source. The
    specific proposal in this PEP is designed to make it straightforward to write
    use case specific renderers that take care of quoting interpolated values
    appropriately for the relevant security context::

    runquery(sql(t"SELECT {column} FROM {table} WHERE column={value};"))
    runcommand(sh(t"cat {filename}"))
    return_response(html(t"<html><body>{response.body}</body></html>"))

    This PEP does not cover adding such renderers to the standard library
    immediately, but rather proposes to ensure that they can be readily provided by
    third party libraries, and potentially incorporated into the standard library
    at a later date.

    For example, a renderer that aimed to offer a POSIX shell style experience for
    accessing external programs, without the significant risks posed by running
    ``os.system`` or enabling the system shell when using the ``subprocess`` module
    APIs, might provide an interface for running external programs similar to that
    offered by the
    `Julia programming language <http://julia.readthedocs.org/en/latest/manual/running-external-programs/>`__,
    only with the backtick based ``\`cat $filename\``` syntax replaced by
    ``t"cat {filename}"`` style template literals.

    Format specifiers
    -----------------

    Aside from separating them out from the substitution expression during parsing,
    format specifiers are otherwise treated as opaque strings by the interpolation
    template parser - assigning semantics to those (or, alternatively,
    prohibiting their use) is handled at runtime by the field renderer.

    Error handling
    --------------

    Either compile time or run time errors can occur when processing interpolation
    expressions. Compile time errors are limited to those errors that can be
    detected when parsing a template string into its component tuples. These
    errors all raise SyntaxError.

    Unmatched braces::

    >>> t'x={x'
    File "<stdin>", line 1
    SyntaxError: missing '}' in template literal expression

    Invalid expressions::

    >>> t'x={!x}'
    File "<fstring>", line 1
    !x
    ^
    SyntaxError: invalid syntax

    Run time errors occur when evaluating the expressions inside a
    template string before creating the template literal object. See PEP 498
    for some examples.

    Different renderers may also impose additional runtime
    constraints on acceptable interpolated expressions and other formatting
    details, which will be reported as runtime exceptions.


    Changes to subprocess module
    ============================

    With this change to templates, the subprocess modules can be changed to handle accepting template literals
    as an additional input type to ``run``, as it already accepts a sequence, or a string,
    with different behavior for each.
    With the addition of template literals, ``subprocess.run`` could accept a string in a more safe way.
    For example::

    subprocess.run(t'cat {myfile}', shell=True)

    would have the same behavior as::

    subprocess.run('cat' + shlex.quote(myfile), shell=True)

    Popen would be modified to pass all field_values through ``shlex.quote`` first.

    Alternatively, when ``subprocess.run`` is ran without ``shell=True``, it could still make using
    subprocess have a nicer syntax. For example::

    subprocess.run(t'cat {myfile} --flag {value}')

    would be equivalent to::

    subprocess.run(['cat', str(myfile), '--flag', str(value)])

    The code for such would be::

    if openargs

    Possible integration with the logging module
    ============================================

    One of the challenges with the logging module has been that we have previously
    been unable to devise a reasonable migration strategy away from the use of
    printf-style formatting. The runtime parsing and interpolation overhead for
    logging messages also poses a problem for extensive logging of runtime events
    for monitoring purposes.

    While beyond the scope of this initial PEP, template literal support
    could potentially be added to the logging module's event reporting APIs,
    permitting relevant details to be captured using forms like::

    logging.debug(i"Event: {event}; Details: {data}")
    logging.critical(i"Error: {error}; Details: {data}")

    Rather than the current mod-formatting style::

    logging.debug("Event: %s; Details: %s", event, data)
    logging.critical("Error: %s; Details: %s", event, data)

    As the template literal is passed in as an ordinary argument, other
    keyword arguments would also remain available::

    logging.critical(i"Error: {error}; Details: {data}", exc_info=True)

    As part of any such integration, a recommended approach would need to be
    defined for "lazy evaluation" of interpolated fields, as the ``logging``
    module's existing delayed interpolation support provides access to
    `various attributes <https://docs.python.org/3/library/logging.html#logrecord-attributes>`__ of the event ``LogRecord`` instance.

    For example, since template literal expressions are arbitrary Python expressions,
    string literals could be used to indicate cases where evaluation itself is
    being deferred, not just rendering::

    logging.debug(t"Logger: {'record.name'}; Event: {event}; Details: {data}")

    This could be further extended with idioms like using inline tuples to indicate
    deferred function calls to be made only if the log message is actually
    going to be rendered at current logging levels::

    logging.debug(t"Event: {event}; Details: {expensive_call, raw_data}")

    This kind of approach would be possible as having access to the actual *text*
    of the field expression would allow the logging renderer to distinguish
    between inline tuples that appear in the field expression itself, and tuples
    that happen to be passed in as data values in a normal field.


    Discussion
    ==========

    Refer to PEP 498 for additional discussion, as several of the points there
    also apply to this PEP.

    Deferring support for binary interpolation
    ------------------------------------------

    Supporting binary interpolation with this syntax would be relatively
    straightforward (the elements in the parsed fields tuple would just be
    byte strings rather than text strings, and the default renderer would be
    markedly less useful), but poses a significant likelihood of producing
    confusing type errors when a text renderer was presented with
    binary input.

    Since the proposed syntax is useful without binary interpolation support, and
    such support can be readily added later, further consideration of binary
    interpolation is considered out of scope for the current PEP.

    Interoperability with str-only interfaces
    -----------------------------------------

    For interoperability with interfaces that only accept strings, interpolation
    templates can still be prerendered with ``format``, rather than delegating the
    rendering to the called function.

    This reflects the key difference from PEP 498, which *always* eagerly applies
    the default rendering, without any way to delegate the choice of renderer to
    another section of the code.

    Preserving the raw template string
    ----------------------------------

    Earlier versions of this PEP failed to make the raw template string available
    on the template literal. Retaining it makes it possible to provide a more
    attractive template representation, as well as providing the ability to
    precisely reconstruct the original string, including both the expression text
    and the details of any eagerly rendered substitution fields in format specifiers.

    Creating a rich object rather than a global name lookup
    -------------------------------------------------------

    Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than
    a creating a new kind of object for later consumption by interpolation
    functions. Creating a rich descriptive object with a useful default renderer
    made it much easier to support customisation of the semantics of interpolation.

    Building atop PEP 498, rather than competing with it
    ----------------------------------------------------

    Earlier versions of this PEP attempted to serve as a complete substitute for
    PEP 498, rather than building a more flexible delayed rendering capability on
    top of PEP 498's eager rendering.

    Assuming the presence of f-strings as a supporting capability simplified a
    number of aspects of the proposal in this PEP (such as how to handle substitution
    fields in format specifiers)

    Deferring consideration of possible use in i18n use cases
    ---------------------------------------------------------

    The initial motivating use case for this PEP was providing a cleaner syntax
    for i18n translation, as that requires access to the original unmodified
    template. As such, it focused on compatibility with the substitution syntax used
    in Python's ``string.Template`` formatting and Mozilla's l20n project.

    However, subsequent discussion revealed there are significant additional
    considerations to be taken into account in the i18n use case, which don't
    impact the simpler cases of handling interpolation into security sensitive
    contexts (like HTML, system shells, and database queries), or producing
    application debugging messages in the preferred language of the development
    team (rather than the native language of end users).

    Due to the original design of the ``str.format`` substitution syntax in PEP
    3101 being inspired by C#'s string formatting syntax, the specific field
    substitution syntax used in PEP 498 is consistent not only with Python's own ``str.format`` syntax, but also with string formatting in C#, including the
    native "$-string" interpolation syntax introduced in C# 6.0 (released in July
    2015). The related ``IFormattable`` interface in C# forms the basis of a
    `number of elements <https://msdn.microsoft.com/en-us/library/system.iformattable.aspx>`__ of C#'s internationalization and localization
    support.

    This means that while this particular substitution syntax may not
    currently be widely used for translation of *Python* applications (losing out
    to traditional %-formatting and the designed-specifically-for-i18n
    ``string.Template`` formatting), it *is* a popular translation format in the
    wider software development ecosystem (since it is already the preferred
    format for translating C# applications).

    Acknowledgements
    ================

    * Eric V. Smith for creating PEP 498 and demonstrating the feasibility of
    arbitrary expression substitution in string interpolation
    * Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions to
    exploring the feasibility of using this model of delayed rendering in i18n
    use cases (even though the ultimate conclusion was that it was a poor fit,
    at least for current approaches to i18n in Python)

    References
    ==========

    .. [#] %-formatting
    (https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)

    .. [#] str.format
    (https://docs.python.org/3/library/string.html#formatstrings)

    .. [#] string.Template documentation
    (https://docs.python.org/3/library/string.html#template-strings)

    .. [#] PEP 215: String Interpolation
    (https://www.python.org/dev/peps/pep-0215/)

    .. [#] PEP 292: Simpler String Substitutions
    (https://www.python.org/dev/peps/pep-0292/)

    .. [#] PEP 3101: Advanced String Formatting
    (https://www.python.org/dev/peps/pep-3101/)

    .. [#] PEP 498: Literal string formatting
    (https://www.python.org/dev/peps/pep-0498/)

    .. [#] FormattableString and C# native string interpolation
    (https://msdn.microsoft.com/en-us/library/dn961160.aspx)

    .. [#] IFormattable interface in C# (see remarks for globalization notes)
    (https://msdn.microsoft.com/en-us/library/system.iformattable.aspx)

    .. [#] Running external commands in Julia
    (http://julia.readthedocs.org/en/latest/manual/running-external-programs/)

    Copyright
    =========

    This document has been placed in the public domain.


    ..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End: