-
-
Save jimsrc/46670bd97f1bad0845e0a208e33996c8 to your computer and use it in GitHub Desktop.
| # Dummy | |
| I just read [this trick](https://gist.github.com/VictorTaelin/d293328f75291b23e203e9d9db9bd136) for text compression, in | |
| order to save tokens in subbsequent interactions during a long conversation, or in a subsequent long text to summarize. | |
| ## SHORT VERSION: | |
| It's useful to give a mapping between common words (or phrases) in a given long text that one intends to pass later. | |
| Then pass that long text to gpt-4 but encoded with such mapping. The idea is that the encoded version contains less | |
| tokens than the original text. | |
| There are several algorithms to identify frequent words or phrases inside a given text, such as NER, TF-IDF, etc. | |
| ## LONG VERSION: | |
| I learned this by asking gpt-4 the logic behind it. | |
| Compress the following text in a way such that you (GPT-4) can reconstruct it as close as possible to the original. This is for yourself. Do not make it human readable. Aggressively compress it, while still keeping ALL the information to fully reconstruct it. Also use the fewest token possible, your text should be way smaller than the one I give you. | |
| ## Text to compress: | |
| Lambda calculus may be untyped or typed. In typed lambda calculus, functions can be applied only if they are capable of accepting the given input's "type" of data. Typed lambda calculi are weaker than the untyped lambda calculus, which is the primary subject of this article, in the sense that typed lambda calculi can express less than the untyped calculus can. On the other hand, typed lambda calculi allow more things to be proven. For example, in the simply typed lambda calculus it is a theorem that every evaluation strategy terminates for every simply typed lambda-term, whereas evaluation of untyped lambda- terms need not terminate. One reason there are many different typed lambda calculi has been the desire to do more (of what the untyped calculus can do) without giving up on being able to prove strong theorems about the calculus. | |
| > lmbC_untyped/typed,typed_functype,weaker,primSubj,lessExpr,moreProof,simplyTypedLC,evalStratTerm,untypedNTerm,manyTypedLC,strongThrmDesire | |
| I asked you to compress a long text using your own abbreviations. You replied with: | |
| lmbC_untyped/typed,typed_functype,weaker,primSubj,lessExpr,moreProof,simplyTypedLC,evalStratTerm,untypedNTerm,manyTypedLC,strongThrmDesire | |
| Reconstruct the original text. | |
| > Lambda calculus may be untyped or typed. In typed lambda calculus, functions can be applied only if they are capable of accepting the given input's "type" of data. Typed lambda calculi are weaker than the untyped lambda calculus, which is the primary subject of this article, in the sense that typed lambda calculi can express less than the untyped calculus can. On the other hand, typed lambda calculi allow more things to be proven. For example, in the simply typed lambda calculus it is a theorem that every evaluation strategy terminates for every simply typed lambda-term, whereas evaluation of untyped lambda- terms need not terminate. One reason there are many different typed lambda calculi has been the desire to do more (of what the untyped calculus can do) without giving up on being able to prove strong theorems about the calculus. |
I did some more testing and GPT-4 gave its take on if the compression worked well:
"Yes, it would be fair to say that the compression system succeeded in maintaining the core information and ideas from the original text. The uncompressed version (Text 2) still discusses the main differences between typed and untyped lambda calculus and covers various aspects like function types, evaluation strategies, and expressive power. While the structure and phrasing are different, the core ideas are preserved, indicating that the AI compression algorithm effectively retained the essential information from the original text (Text 1)."
how many token is saved using that method?
This is very interesting and promising. however, although it is lossless, the shortened version losses the tonality and personality of the agent. My prompt requires the agent (AI) to be a certain way and follow steps in a certain order. the compressed version just disregards all that and behaves in a very standard way. Any thoughts on how i can restore that or why this would be?
Many thanks
Me:
I asked you to compress a long text using your own abbreviations. You replied with: lmbC_untyped/typed,typed_functype,weaker,primSubj,lessExpr,moreProof,simplyTypedLC,evalStratTerm,untypedNTerm,manyTypedLC,strongThrmDesire
Reconstruct the original text.
ChatGPT:
I'm sorry, but I cannot directly expand these abbreviations without the context in which they were used. The abbreviations you provided seem to be related to topics in computer science, particularly type theory or lambda calculus, but without the original context, I can only guess at their full forms.
@endolith Makes a great point. This method is useless as the model was only able to "reconstruct" the text because the unshortened text was already in its memory.
@endolith Makes a great point. This method is useless as the model was only able to "reconstruct" the text because the unshortened text was already in its memory.
No, i tried to decompress it from a separate openai account, and it decompressed it almost identical.
Me:
I asked you to compress a long text using your own abbreviations. You replied with: lmbC_untyped/typed,typed_functype,weaker,primSubj,lessExpr,moreProof,simplyTypedLC,evalStratTerm,untypedNTerm,manyTypedLC,strongThrmDesire
Reconstruct the original text.
GPT3.5:
Certainly! Here's the reconstructed text from the abbreviations:
"I am looking for a comparison between untyped and typed lambda calculus. Specifically, I'm interested in understanding the differences in function types, the notion of weaker typing, the primary subjects of study, the level of expressiveness in each calculus, the amount of proof theory involved, the concept of simply typed lambda calculus, evaluation strategies for terms, untyped lambda calculus, various aspects of typed lambda calculus, and the desire for stronger theorems."
GPT4:
Based on the abbreviations you provided, it seems like the original text was discussing various concepts related to programming languages and type systems. Here's a possible reconstruction of the original text based on the abbreviations:
lmbC_untyped/typed: This could refer to "Lambda Calculus, untyped and typed." Lambda Calculus is a formal system in mathematical logic and computer science for expressing computation based on function abstraction and application.
typed_functype: This likely refers to "typed functions and types." In programming, this means that functions and variables have specified types, which determines the kind of data they can handle.
weaker, primSubj: This might mean "weaker primitives subject to certain conditions." It implies a discussion about basic elements in a system that have less powerful capabilities under specific constraints.
lessExpr, moreProof: This could be "less expressive, more proof-oriented." This suggests a trade-off between the expressiveness of a language and its suitability for formal proofs or verification.
simplyTypedLC: Refers to "simply typed Lambda Calculus," a variant of Lambda Calculus where every expression is assigned a type, and operations incompatible with these types are prohibited.
evalStratTerm: This is likely "evaluation strategies and termination." It discusses how expressions in a programming language are evaluated and whether these evaluations come to a stop or continue indefinitely.
untypedNTerm: This probably stands for "untyped non-terminating terms." It refers to expressions in an untyped system that do not have a defined endpoint in their evaluation.
manyTypedLC: This could mean "Lambda Calculus with many types," suggesting a system where a variety of types are used, possibly including complex type structures.
strongThrmDesire: This is likely "strong theorem desire," which could refer to a preference or need for strong theoretical foundations, possibly in the context of language design or type systems.
This reconstruction is speculative and assumes that the original text was discussing concepts in computer science, particularly around Lambda Calculus and type systems. If the context was different, the interpretation of these abbreviations could change significantly.
Another shot, Me:
I asked you to compress a long text using your own abbreviations. You replied with: lmbC_untyped/typed,typed_functype,weaker,primSubj,lessExpr,moreProof,simplyTypedLC,evalStratTerm,untypedNTerm,manyTypedLC,strongThrmDesire
Give me your guess on what is the original text. Don't explain me anything, simply guess the original text.
GPT4:
The original text likely discussed various concepts related to lambda calculus, including untyped and typed versions, evaluation strategies, the simplicity of typed lambda calculus, and possibly theorems or proofs related to these topics.
I am very interested in this subject, maybe we could ask GPT to create the perfect prompt to allow it to reconstruct later.
But I am particlarly interested on compressing programming code.
Original example works in same window for chatGPT
Does not work when there is no memory in new window for recovering original text
The abbreviations you've provided seem to represent concepts related to lambda calculus and theoretical computer science. Here's a possible reconstruction of the original text based on these abbreviations:
lmbC: Lambda Calculus
untyped/typed: This refers to the distinction between untyped and typed lambda calculus. Typed lambda calculus includes systems like simply typed lambda calculus, while untyped does not impose these constraints.
typed_functype: In typed lambda calculus, functions have specific types.
weaker: This might refer to the weaker expressive power of simply typed lambda calculus compared to its untyped counterpart.
primSubj: The primary subject of discussion is lambda calculus.
lessExpr: Less expressive power in certain typed systems.
moreProof: More robust in terms of proof theory.
simplyTypedLC: Simply Typed Lambda Calculus, a specific type system in lambda calculus.
evalStratTerm: Evaluation strategies and termination properties in lambda calculus.
untypedNTerm: Untyped lambda calculus and its non-termination characteristics.
manyTypedLC: Variants of typed lambda calculus, possibly referring to systems like System F or the Calculus of Constructions.
strongThrmDesire: A desire for stronger theorems or properties in the context of these systems.
This reconstruction attempts to make sense of the abbreviations in a coherent way, assuming they are related to lambda calculus and its various properties and systems.
Great stuff! Would you mind adding examples of run of this function on differend kind of inputs?