elliasmatheus/C Concepts.md

Introduction
Compilers
Hello World
Memory Allocation with different Types
Constants vs Directives
Char Type, null terminator and memory allocation

Introduction

This isn't a 'How to program in C' tutorial, this is just a grouping of topics/concepts from the C language that I found interesting while learning the language (from the perspective of an already establed developer). Some of the things I make mention of might appear obvious, but sometimes it's best to avoid ambiguity.

When writing a program in a language like C, that by itself is not executable (i.e. you can't run a C file). So you need to convert the C source code into machine code (i.e. something the computer's CPU can understand).

Machine code is as low-level as you can get when interacting with a computer. So the C language is considered a 'higher-level' abstraction to save us from having to write machine code ourselves. A language like Python is an even 'higher-level' abstraction to save us from having to write C (i.e. the Python language is actually written in C).

In order to convert C code into machine code, we need a compiler.

Strictly speaking you also need a linker which takes multiple compiled objects and places them into a single executable file. Generally speaking, when we say "compile a C file", we're really combining two separate processes (compiling and linking) into the single generic term "compile"

Compilers

To compile C source code into an executable you need a compiler, of which there are many options. The two most popular being LLVM's clang and GNU's gcc.

You might find there is a cc available, but typically this is aliased to an existing compiler.

Also, Mac doesn't provide a compiler by default. But if you install X-Code you'll get the LLVM's suite of compilers. Below we see that we get quite a few alias' and all of them point to the same embeded LLVM compiler:

$ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
                                                                   
$ llvm-gcc --version
Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
                                                                   
$ clang --version
Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
                                                                   
$ cc --version
Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

The first two alias' gcc and llvm-gcc are confusing and a bit misleading as their not GNU's version. They're still the LLVM's compiler but in the first instance (gcc) the compiler is configured to use some additional libraries that are provided by c++

It's worth noting that even with a plain C file all these alias' work to compile the source code into an executable. It's just they allow you to utilise additional extensions not provided by the standard c language.

LLVM's licensing is BSD, meaning Apple can embed it within their own software that is not GPL-licensed. Typically LLVM's compiler is faster, but in some cases might not support all the same targets as GNU's.

For more comparison details see http://clang.llvm.org/comparison.html

Hello World

#include <stdio.h>          // pre-processor directive to include code file at compile time
#define NAME "World"        // pre-processor directive to substitute any reference to NAME _before_ compilation

// returns an int type and takes in no arguments (void)
int main(void) {
  printf("hello %s", NAME); // can't use single quotes
  return 0;                 // zero indicates no problems
}

It's important to note that the directives #include and #define are 'processed' at the start of the compilation process. This is at the request of the compiler. It'll be one of the compiler's first steps (to pull in the preprocessor and have it ensure the file is setup ready for the reset of the compilation)

You compile it like so:

cc hello-world.c -o hw

Now you have a macOS compatible executable:

./hw # prints the message "Hello World"

To cross-compile for another OS (e.g. Linux) then use Docker or a VM

Memory Allocation with different Types

See here for understanding RAM and bits

In C you can define an integer to be either signed or unsigned. The former means the number can be both negative and positive as well as zero.

So typically, if a number is negative, you'll prefix it with -. If the number is positive, then it is just the number. For example, -1 and 1.

You don't need to explicitly provide the signed keyword (e.g. signed int <var_name>), it's just implied.

The latter (unsigned) is an integer that can only be positive. So if you need to store an integer and you know the value will always be zero or positive, then you can define it as being unsigned and the compiler can make appropriate optimisations based on that understanding.

So although the underlying memory allocation is the same for signed or unsigned, the actual values represented are slightly different in that unsigned allows for storing values that are twice the size of signed, because half of signed's values have to account for negatives.

Constants vs Directives

We saw in the above 'Hello World' example the use of the directive #define which allowed us to use a single identifier (NAME in this case) throughout our program. The benefit is that we can change the value once and have it updated everywhere.

But do not get this confused with a variable. It is not. This is just a sequence of characters that are blindly replaced at the pre-processing stage. The value assigned to NAME will be replaced inside your program regardless of whether it's valid code or not. Meaning it could cause the compiler to error in an unclear way.

On the other hand you can define a proper constant like so:

#include <stdio.h>

int main(void) {
  const char NAME[] = "World";
  printf("Hello %s", NAME);
  return 0;
}

What this gives you is a variable that has an actual type assigned to it. Meaning the compiler will help you identify an incorrect value if necessary much more easily than using the #define directive.

Char Type, null terminator and memory allocation

The char type is used when storing characters such as A.

But it's important to realise that when assigning a string to a variable of char type, that the value assigned is actually a pointer to an array. Even if the string you provide is just one character.

This is because of what's known as the null terminator.

Consider the following code:

char my_string[2] = "a";

The reason we specify a length of 2 is because the underlying array that my_string is being pointed towards looks like this:

["a", "\0"] // yes it has two elements

The last element is known as the null terminator. When this data is stored in memory, we can now start at the location in memory where it is stored, and then step through memory until we reach the null terminator where we'll then find the end of the string.