nhumrich · January 20, 2023 20:42 · Jan 20, 2023
diff --git a/gistfile1.txt b/gistfile1.txt
@@ -0,0 +1,630 @@
+PEP: 501
+Title: General purpose string template literals
+Version: $Revision$
+Last-Modified: $Date$
+Author: Nick Coghlan <ncoghlan@gmail.com>, Nick Humrich <nick@humrich.us>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Requires: 498
+Created: 08-Aug-2015
+Python-Version: 3.12
+Post-History: 08-Aug-2015, 23-Aug-2015, 30-Aug-2015
+
+Abstract
+========
+
+PEP 498 added new syntactic support for string interpolation that is
+transparent to the compiler, allow name references from the interpolation
+operation full access to containing namespaces (as with any other expression),
+rather than being limited to explicit name references. These are referred
+to in the PEP as "f-strings" (a mnemonic for "formatted strings").
+
+Since acceptance of the PEP 498, f-strings have become well-established and very popular,
+but eager rending has its limitations. For example, the eagerness of f-strings
+has made code like the following very likely and common::
+
+    os.system(f"echo {message_from_user}")
+
+This kind of code is superficially elegant, but poses a significant problem
+if the interpolated value ``message_from_user`` is in fact provided by an
+untrusted user: it's an opening for a form of code injection attack, where
+the supplied user data has not been properly escaped before being passed to
+the ``os.system`` call.
+
+To address that problem (and a number of other concerns), this PEP proposes
+the complementary introduction of "t-strings" (a mnemonic for "template literal strings"),
+where ``f"Message with {data}"`` would produce the same
+result as ``format(t"Message with {data}")``.
+
+Some possible examples of the proposed syntax::
+
+    mycommand = sh(t"cat {filename}")
+    myquery = sql(t"SELECT {column} FROM {table} WHERE column={value};")
+    myresponse = html(t"<html><body>{response.body}</body></html>")
+    logging.debug(t"Message with {detailed} {debugging} {info}")
+
+
+History
+============
+
+This PEP was previously in deferred status, pending further experience with PEP 498's
+simpler approach of only supporting eager rendering without the additional
+complexity of also supporting deferred rendering. Since then, f-strings have become very popular
+and this PEP has been updated to reflect knowledge of f-strings.
+
+
+Summary of differences from PEP 498
+===================================
+
+The key additions this proposal makes relative to PEP 498:
+
+* the "t" (template literal) prefix indicates delayed rendering, but
+  otherwise uses the same syntax and semantics as formatted strings
+* template literals are available at runtime as a new kind of object
+  (``types.TemplateLiteral``)
+* the default rendering used by formatted strings is invoked on an
+  template literal object by calling ``format(template)`` rather than
+  implicitly
+* while  f-string ``f"Message {here}"`` would be *semantically* equivalent to
+  ``format(t"Message {here}")``, f-strings currently avoid the runtime overhead of using
+  the delayed rendering machinery that is needed for t-strings
+
+NOTE: This proposal spells out a draft API for ``types.TemplateLiteral``
+
+Proposal
+========
+
+This PEP proposes the introduction of a new string prefix that declares the
+string to be a template literal rather than an ordinary string::
+
+    template = t"Substitute {names} and {expressions()} at runtime"
+
+This would be effectively interpreted as::
+
+    _raw_template = "Substitute {names:>10} and {expressions()} at runtime"
+    _parsed_template = (
+        ("Substitute ", "names"),
+        (" and ", "expressions()"),
+        (" at runtime", None),
+    )
+    _field_values = (names, expressions())
+    _format_specifiers = (">10", "")
+    template = types.InterpolationTemplate(_raw_template,
+                                           _parsed_template,
+                                           _field_values,
+                                           _format_specifiers)
+
+The ``__format__`` method on ``types.TemplateLiteral`` would then
+implement the following ``str.format`` inspired semantics::
+
+  >>> import datetime
+  >>> name = 'Jane'
+  >>> age = 50
+  >>> anniversary = datetime.date(1991, 10, 12)
+  >>> format(t'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.')
+  'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
+  >>> format(t'She said her name is {repr(name)}.')
+  "She said her name is 'Jane'."
+
+As with formatted strings, the template literal prefix can be combined with single-quoted, double-quoted and triple quoted strings, including raw strings.
+It does not support combination with bytes literals.
+
+Similarly, this PEP does not propose to remove or deprecate any of the existing
+string formatting mechanisms, as those will remain valuable when formatting
+strings that are not present directly in the source code of the application.
+
+
+Rationale
+=========
+
+PEP 498 makes interpolating values into strings with full access to Python's
+lexical namespace semantics simpler, but it does so at the cost of creating a
+situation where interpolating values into sensitive targets like SQL queries,
+shell commands and HTML templates will enjoy a much cleaner syntax when handled
+without regard for code injection attacks than when they are handled correctly.
+
+This PEP proposes to provide the option of delaying the actual rendering
+of a template literal to its ``__format__`` method, allowing the use of
+other template renderers by passing the template around as a first class object.
+
+While very different in the technical details, the
+``types.TemplateLiteral`` interface proposed in this PEP is
+conceptually quite similar to the ``FormattableString`` type underlying the
+`native interpolation <https://msdn.microsoft.com/en-us/library/dn961160.aspx>`__ support introduced in C# 6.0,
+as well as `template literals in Javascript <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals>`__ introduced in es6.
+
+
+Specification
+=============
+
+This PEP proposes the introduction of ``t`` as a new string prefix that
+results in the creation of an instance of a new type,
+``types.TemplateLiteral``.
+
+Template literals are Unicode strings (bytes literals are not
+permitted), and string literal concatenation operates as normal, with the
+entire combined literal forming the template literal.
+
+The template string is parsed into literals, expressions and format specifiers
+as described for f-strings in PEP 498. Conversion specifiers are handled
+by the compiler, and appear as part of the field text in interpolation
+templates.
+
+However, rather than being rendered directly into a formatted strings, these
+components are instead organised into an instance of a new type with the
+following semantics::
+
+    class InterpolationTemplate:
+        __slots__ = ("raw_template", "parsed_template",
+                     "field_values", "format_specifiers")
+
+        def __new__(cls, raw_template, parsed_template,
+                         field_values, format_specifiers):
+            self = super().__new__(cls)
+            self.raw_template = raw_template
+            self.parsed_template = parsed_template
+            self.field_values = field_values
+            self.format_specifiers = format_specifiers
+            return self
+
+        def __repr__(self):
+            return (f"<{type(self).__qualname__} {repr(self._raw_template)} "
+                    f"at {id(self):#x}>")
+
+        def __format__(self, format_specifier):
+            # When formatted, render to a string, and use string formatting
+            return format(self.render(), format_specifier)
+
+        def render(self, *, render_template=''.join,
+                            render_field=format):
+            # See definition of the template rendering semantics below
+
+The result of an template literal expression is an instance of this
+type, rather than an already rendered string - rendering only takes
+place when the instance's ``render`` method is called (either directly, or
+indirectly via ``__format__``).
+
+The compiler will pass the following details to the template literal for
+later use:
+
+* a string containing the raw template as written in the source code
+* a parsed template tuple that allows the renderer to render the
+  template without needing to reparse the raw string template for substitution
+  fields
+* a tuple containing the evaluated field values, in field substitution order
+* a tuple containing the field format specifiers, in field substitution order
+
+This structure is designed to take full advantage of compile time constant
+folding by ensuring the parsed template is always constant, even when the
+field values and format specifiers include variable substitution expressions.
+
+The raw template is just the template literal as a string. By default,
+it is used to provide a human readable representation for the
+template literal.
+
+The parsed template consists of a tuple of 2-tuples, with each 2-tuple
+containing the following fields:
+
+* ``leading_text``:  a leading string literal. This will be the empty string if
+  the current field is at the start of the string, or immediately follows the
+  preceding field.
+* ``field_expr``: the text of the expression element in the substitution field.
+  This will be None for a final trailing text segment.
+
+The tuple of evaluated field values holds the *results* of evaluating the
+substitution expressions in the scope where the template literal appears.
+
+The tuple of field specifiers holds the *results* of evaluating the field
+specifiers as f-strings in the scope where the template literal appears.
+
+The ``TemplateLiteral.render`` implementation then defines the rendering
+process in terms of the following renderers:
+
+* an overall ``render_template`` operation that defines how the sequence of
+  literal template sections and rendered fields are composed into a fully
+  rendered result. The default template renderer is string concatenation
+  using ``''.join``.
+* a per field ``render_field`` operation that receives the field value and
+  format specifier for substitution fields within the template. The default
+  field renderer is the ``format`` builtin.
+
+Given an appropriate parsed template representation and internal methods of
+iterating over it, the semantics of template rendering would then be equivalent
+to the following::
+
+    def render(self, *, render_template=''.join,
+                        render_field=format):
+        iter_fields = enumerate(self.parsed_template)
+        values = self.field_values
+        specifiers = self.format_specifiers
+        template_parts = []
+        for field_pos, (leading_text, field_expr) in iter_fields:
+            template_parts.append(leading_text)
+            if field_expr is not None:
+                value = values[field_pos]
+                specifier = specifiers[field_pos]
+                rendered_field = render_field(value, specifier)
+                template_parts.append(rendered_field)
+        return render_template(template_parts)
+
+Conversion specifiers
+---------------------
+
+NOTE:
+
+   Appropriate handling of conversion specifiers is currently an open question.
+   Exposing them more directly to custom renderers would increase the
+   complexity of the ``TemplateLiteral`` definition without providing an
+   increase in expressiveness (since they're redundant with calling the builtins
+   directly). At the same time, they *are* made available as arbitrary strings
+   when writing custom ``string.Formatter`` implementations, so it may be
+   desirable to offer similar levels of flexibility of interpretation in
+   template literals.
+
+The ``!a``, ``!r`` and ``!s`` conversion specifiers supported by ``str.format``
+and hence PEP 498 are handled in template literals as follows:
+
+* they're included unmodified in the raw template to ensure no information is
+  lost
+* they're *replaced* in the parsed template with the corresponding builtin
+  calls, in order to ensure that ``field_expr`` always contains a valid
+  Python expression
+* the corresponding field value placed in the field values tuple is
+  converted appropriately *before* being passed to the template literal
+
+This means that, for most purposes, the difference between the use of
+conversion specifiers and calling the corresponding builtins in the
+original template literal will be transparent to custom renderers. The
+difference will only be apparent if reparsing the raw template, or attempting
+to reconstruct the original template from the parsed template.
+
+Writing custom renderers
+------------------------
+
+Writing a custom renderer doesn't require any special syntax. Instead,
+custom renderers are ordinary callables that process an interpolation
+template directly either by calling the ``render()`` method with alternate ``render_template`` or ``render_field`` implementations, or by accessing the
+template's data attributes directly.
+
+For example, the following function would render a template using objects'
+``repr`` implementations rather than their native formatting support::
+
+    def reprformat(template):
+        def render_field(value, specifier):
+            return format(repr(value), specifier)
+        return template.render(render_field=render_field)
+
+When writing custom renderers, note that the return type of the overall
+rendering operation is determined by the return type of the passed in ``render_template`` callable. While this is expected to be a string in most
+cases, producing non-string objects *is* permitted. For example, a custom
+template renderer could involve an ``sqlalchemy.sql.text`` call that produces
+an `SQL Alchemy query object <http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#using-textual-sql>`__.
+
+Non-strings may also be returned from ``render_field``, as long as it is paired
+with a ``render_template`` implementation that expects that behaviour.
+
+Expression evaluation
+---------------------
+
+As with f-strings, the subexpressions that are extracted from the interpolation
+template are evaluated in the context where the template literal
+appears. This means the expression has full access to local, nonlocal and global variables.
+Any valid Python expression can be used inside ``{}``, including
+function and method calls.
+
+Because the substitution expressions are evaluated where the string appears in
+the source code, there are no additional security concerns related to the
+contents of the expression itself, as you could have also just written the
+same expression and used runtime field parsing::
+
+  >>> bar=10
+  >>> def foo(data):
+  ...   return data + 20
+  ...
+  >>> str(t'input={bar}, output={foo(bar)}')
+  'input=10, output=30'
+
+Is essentially equivalent to::
+
+  >>> 'input={}, output={}'.format(bar, foo(bar))
+  'input=10, output=30'
+
+Handling code injection attacks
+-------------------------------
+
+The PEP 498 formatted string syntax makes it potentially attractive to write
+code like the following::
+
+    runquery(f"SELECT {column} FROM {table};")
+    runcommand(f"cat {filename}")
+    return_response(f"<html><body>{response.body}</body></html>")
+
+These all represent potential vectors for code injection attacks, if any of the
+variables being interpolated happen to come from an untrusted source. The
+specific proposal in this PEP is designed to make it straightforward to write
+use case specific renderers that take care of quoting interpolated values
+appropriately for the relevant security context::
+
+    runquery(sql(t"SELECT {column} FROM {table} WHERE column={value};"))
+    runcommand(sh(t"cat {filename}"))
+    return_response(html(t"<html><body>{response.body}</body></html>"))
+
+This PEP does not cover adding such renderers to the standard library
+immediately, but rather proposes to ensure that they can be readily provided by
+third party libraries, and potentially incorporated into the standard library
+at a later date.
+
+For example, a renderer that aimed to offer a POSIX shell style experience for
+accessing external programs, without the significant risks posed by running
+``os.system`` or enabling the system shell when using the ``subprocess`` module
+APIs, might provide an interface for running external programs similar to that
+offered by the
+`Julia programming language <http://julia.readthedocs.org/en/latest/manual/running-external-programs/>`__,
+only with the backtick based ``\`cat $filename\``` syntax replaced by
+``t"cat {filename}"`` style template literals.
+
+Format specifiers
+-----------------
+
+Aside from separating them out from the substitution expression during parsing,
+format specifiers are otherwise treated as opaque strings by the interpolation
+template parser - assigning semantics to those (or, alternatively,
+prohibiting their use) is handled at runtime by the field renderer.
+
+Error handling
+--------------
+
+Either compile time or run time errors can occur when processing interpolation
+expressions. Compile time errors are limited to those errors that can be
+detected when parsing a template string into its component tuples. These
+errors all raise SyntaxError.
+
+Unmatched braces::
+
+  >>> t'x={x'
+    File "<stdin>", line 1
+  SyntaxError: missing '}' in template literal expression
+
+Invalid expressions::
+
+  >>> t'x={!x}'
+    File "<fstring>", line 1
+      !x
+      ^
+  SyntaxError: invalid syntax
+
+Run time errors occur when evaluating the expressions inside a
+template string before creating the template literal object. See PEP 498
+for some examples.
+
+Different renderers may also impose additional runtime
+constraints on acceptable interpolated expressions and other formatting
+details, which will be reported as runtime exceptions.
+
+
+Changes to subprocess module
+============================
+
+With this change to templates, the subprocess modules can be changed to handle accepting template literals
+as an additional input type to ``run``, as it already accepts a sequence, or a string,
+with different behavior for each.
+With the addition of template literals, ``subprocess.run`` could accept a string in a more safe way.
+For example::
+
+  subprocess.run(t'cat {myfile}', shell=True)
+
+would have the same behavior as::
+
+  subprocess.run('cat' + shlex.quote(myfile), shell=True)
+
+Popen would be modified to pass all field_values through ``shlex.quote`` first.
+
+Alternatively, when ``subprocess.run`` is ran without ``shell=True``, it could still make using
+subprocess have a nicer syntax. For example::
+
+  subprocess.run(t'cat {myfile} --flag {value}')
+
+would be equivalent to::
+
+  subprocess.run(['cat', str(myfile), '--flag', str(value)])
+
+The code for such would be::
+
+  if openargs
+
+Possible integration with the logging module
+============================================
+
+One of the challenges with the logging module has been that we have previously
+been unable to devise a reasonable migration strategy away from the use of
+printf-style formatting. The runtime parsing and interpolation overhead for
+logging messages also poses a problem for extensive logging of runtime events
+for monitoring purposes.
+
+While beyond the scope of this initial PEP, template literal support
+could potentially be added to the logging module's event reporting APIs,
+permitting relevant details to be captured using forms like::
+
+    logging.debug(i"Event: {event}; Details: {data}")
+    logging.critical(i"Error: {error}; Details: {data}")
+
+Rather than the current mod-formatting style::
+
+    logging.debug("Event: %s; Details: %s", event, data)
+    logging.critical("Error: %s; Details: %s", event, data)
+
+As the template literal is passed in as an ordinary argument, other
+keyword arguments would also remain available::
+
+    logging.critical(i"Error: {error}; Details: {data}", exc_info=True)
+
+As part of any such integration, a recommended approach would need to be
+defined for "lazy evaluation" of interpolated fields, as the ``logging``
+module's existing delayed interpolation support provides access to
+`various attributes <https://docs.python.org/3/library/logging.html#logrecord-attributes>`__ of the event ``LogRecord`` instance.
+
+For example, since template literal expressions are arbitrary Python expressions,
+string literals could be used to indicate cases where evaluation itself is
+being deferred, not just rendering::
+
+    logging.debug(t"Logger: {'record.name'}; Event: {event}; Details: {data}")
+
+This could be further extended with idioms like using inline tuples to indicate
+deferred function calls to be made only if the log message is actually
+going to be rendered at current logging levels::
+
+    logging.debug(t"Event: {event}; Details: {expensive_call, raw_data}")
+
+This kind of approach would be possible as having access to the actual *text*
+of the field expression would allow the logging renderer to distinguish
+between inline tuples that appear in the field expression itself, and tuples
+that happen to be passed in as data values in a normal field.
+
+
+Discussion
+==========
+
+Refer to PEP 498 for additional discussion, as several of the points there
+also apply to this PEP.
+
+Deferring support for binary interpolation
+------------------------------------------
+
+Supporting binary interpolation with this syntax would be relatively
+straightforward (the elements in the parsed fields tuple would just be
+byte strings rather than text strings, and the default renderer would be
+markedly less useful), but poses a significant likelihood of producing
+confusing type errors when a text renderer was presented with
+binary input.
+
+Since the proposed syntax is useful without binary interpolation support, and
+such support can be readily added later, further consideration of binary
+interpolation is considered out of scope for the current PEP.
+
+Interoperability with str-only interfaces
+-----------------------------------------
+
+For interoperability with interfaces that only accept strings, interpolation
+templates can still be prerendered with ``format``, rather than delegating the
+rendering to the called function.
+
+This reflects the key difference from PEP 498, which *always* eagerly applies
+the default rendering, without any way to delegate the choice of renderer to
+another section of the code.
+
+Preserving the raw template string
+----------------------------------
+
+Earlier versions of this PEP failed to make the raw template string available
+on the template literal. Retaining it makes it possible to provide a more
+attractive template representation, as well as providing the ability to
+precisely reconstruct the original string, including both the expression text
+and the details of any eagerly rendered substitution fields in format specifiers.
+
+Creating a rich object rather than a global name lookup
+-------------------------------------------------------
+
+Earlier versions of this PEP used an ``__interpolate__`` builtin, rather than
+a creating a new kind of object for later consumption by interpolation
+functions. Creating a rich descriptive object with a useful default renderer
+made it much easier to support customisation of the semantics of interpolation.
+
+Building atop PEP 498, rather than competing with it
+----------------------------------------------------
+
+Earlier versions of this PEP attempted to serve as a complete substitute for
+PEP 498, rather than building a more flexible delayed rendering capability on
+top of PEP 498's eager rendering.
+
+Assuming the presence of f-strings as a supporting capability simplified a
+number of aspects of the proposal in this PEP (such as how to handle substitution
+fields in format specifiers)
+
+Deferring consideration of possible use in i18n use cases
+---------------------------------------------------------
+
+The initial motivating use case for this PEP was providing a cleaner syntax
+for i18n translation, as that requires access to the original unmodified
+template. As such, it focused on compatibility with the substitution syntax used
+in Python's ``string.Template`` formatting and Mozilla's l20n project.
+
+However, subsequent discussion revealed there are significant additional
+considerations to be taken into account in the i18n use case, which don't
+impact the simpler cases of handling interpolation into security sensitive
+contexts (like HTML, system shells, and database queries), or producing
+application debugging messages in the preferred language of the development
+team (rather than the native language of end users).
+
+Due to the original design of the ``str.format`` substitution syntax in PEP
+3101 being inspired by C#'s string formatting syntax, the specific field
+substitution syntax used in PEP 498 is consistent not only with Python's own ``str.format`` syntax, but also with string formatting in C#, including the
+native "$-string" interpolation syntax introduced in C# 6.0 (released in July
+2015).  The related ``IFormattable`` interface in C# forms the basis of a
+`number of elements <https://msdn.microsoft.com/en-us/library/system.iformattable.aspx>`__ of C#'s internationalization and localization
+support.
+
+This means that while this particular substitution syntax may not
+currently be widely used for translation of *Python* applications (losing out
+to traditional %-formatting and the designed-specifically-for-i18n
+``string.Template`` formatting), it *is* a popular translation format in the
+wider software development ecosystem (since it is already the preferred
+format for translating C# applications).
+
+Acknowledgements
+================
+
+* Eric V. Smith for creating PEP 498 and demonstrating the feasibility of
+  arbitrary expression substitution in string interpolation
+* Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions to
+  exploring the feasibility of using this model of delayed rendering in i18n
+  use cases (even though the ultimate conclusion was that it was a poor fit,
+  at least for current approaches to i18n in Python)
+
+References
+==========
+
+.. [#] %-formatting
+       (https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)
+
+.. [#] str.format
+       (https://docs.python.org/3/library/string.html#formatstrings)
+
+.. [#] string.Template documentation
+       (https://docs.python.org/3/library/string.html#template-strings)
+
+.. [#] PEP 215: String Interpolation
+       (https://www.python.org/dev/peps/pep-0215/)
+
+.. [#] PEP 292: Simpler String Substitutions
+       (https://www.python.org/dev/peps/pep-0292/)
+
+.. [#] PEP 3101: Advanced String Formatting
+       (https://www.python.org/dev/peps/pep-3101/)
+
+.. [#] PEP 498: Literal string formatting
+       (https://www.python.org/dev/peps/pep-0498/)
+
+.. [#] FormattableString and C# native string interpolation
+       (https://msdn.microsoft.com/en-us/library/dn961160.aspx)
+
+.. [#] IFormattable interface in C# (see remarks for globalization notes)
+       (https://msdn.microsoft.com/en-us/library/system.iformattable.aspx)
+
+.. [#] Running external commands in Julia
+       (http://julia.readthedocs.org/en/latest/manual/running-external-programs/)
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End:
No results found