Skip to content

Instantly share code, notes, and snippets.

@arthurafarias
Last active January 25, 2026 09:45
Show Gist options
  • Select an option

  • Save arthurafarias/56fec2cd49a32f374c02d1df2b6c350f to your computer and use it in GitHub Desktop.

Select an option

Save arthurafarias/56fec2cd49a32f374c02d1df2b6c350f to your computer and use it in GitHub Desktop.
Encoding URI and URI Component in C++

Encode and Decode HTTP URIs and URI components in C++

What is a URI?

A Uniform Resource Identifier (URI) is a string of characters that unambiguously identifies a particular resource. To guarantee uniformity, all URIs follow a predefined set of syntax rules,[1] but also maintain extensibility through a separately defined hierarchical naming scheme (e.g. "http://").

Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols. Schemes specifying a concrete syntax and associated protocols define each URI. The most common form of URI is the Uniform Resource Locator (URL), frequently referred to informally as a web address. More rarely seen in usage is the Uniform Resource Name (URN), which was designed to complement URLs by providing a mechanism for the identification of resources in particular namespaces.

The common parts of a URI are described below.

 foo://example.com:8042/over/there?name=ferret#nose
 \_/   \______________/\_________/ \_________/ \__/
  |           |            |            |        |
scheme     authority       path        query   fragment

There are 5 parts in a URI:

  • scheme: the scheme of the URI (related to protocol stuff, ex: http, https, ftp...).
  • authority: in URLs, they are composed of 3 parts.
  • path: the path to the resource being accessed.
  • query: key-value pairs with relevant encoded information to the server.
  • fragment: information that is not sent to the process.

In the authority there are 3 parts:

john:doe@example.com:8042
\______/ \_________/ \__/
    |         |        |
userinfo    host     port
  • userinfo: relevant information to an authentication
  • host: the domain name of the resource
  • port: The TCP port that the resource is being served

What is the difference between a URI and a URI component?

And URI is what we described earlier. A URI component is a string sequence that can encode a URI inside a URI.

decoding a string like this

http://google.com/path?key=http://google.com

would be ambiguous, so the relevant characters like:, /, @, =, & and # are encoded to avoid ambiguity while decoding.

How to use this header

it's simple just copy and paste encode.h to your includes directory and add #include "encode.h" to your sources and use the function as you wish.

#ifndef ENCODE_H_
#define ENCODE_H_
std::string decodeURIComponent(std::string encoded) {
std::string decoded = encoded;
std::smatch sm;
std::string haystack;
int dynamicLength = decoded.size() - 2;
if (decoded.size() < 3) return decoded;
for (int i = 0; i < dynamicLength; i++)
{
haystack = decoded.substr(i, 3);
if (std::regex_match(haystack, sm, std::regex("%[0-9A-F]{2}")))
{
haystack = haystack.replace(0, 1, "0x");
std::string rc = {(char)std::stoi(haystack, nullptr, 16)};
decoded = decoded.replace(decoded.begin() + i, decoded.begin() + i + 3, rc);
}
dynamicLength = decoded.size() - 2;
}
return decoded;
}
std::string encodeURIComponent(std::string decoded)
{
std::ostringstream oss;
std::regex r("[!'\\(\\)*-.0-9A-Za-z_~]");
for (char &c : decoded)
{
if (std::regex_match((std::string){c}, r))
{
oss << c;
}
else
{
oss << "%" << std::uppercase << std::hex << (0xff & c);
}
}
return oss.str();
}
#endif
@arthurafarias
Copy link
Copy Markdown
Author

arthurafarias commented May 20, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment