A simple C "hello world" brings up many possible learning opportunities. To fully comprehend the C "Hello World", one must understand all of the pieces which make it work :^)
Here's an exploration of C using "Hello World" as a starting point!
#include "stdio.h"
int main()
{
printf("Hello World\n");
return 0;
}
You must first understand that "main()" is an entry-point to your program. When you run the program, your computer looks to the contents of main for instructions for it to execute. In main there is a line that says "printf("hello world\n");
. This is a statement. The semicolon ;
is a "statement terminator" and acts as a sequence point. You can think of them like ending an english sentence with a period. This line is a "call" to a "function". Functions are pieces of code which can be executed by invokation - "calling" the function by appending parentheses ()
after it's name, and "passing" parameters (data) for it to use inside of the parens. "Arguments" (things being passed as parameters) can be seperated by a comma ,
delimiter. This means your program will push
the data that is to be passed onto the "stack" (special region of your computer's memory), make a jump to a piece of code and begin executing it. That code can then pop
them off of the stack to use them. Code in a function is executed from top-to-bottom in order. When the computer begins executing main()
and enters the function's scope (designated by the {}
braces), and in this case; its first instruction is to make a call to printf.
But, there is a perplexing question to ask: What is this printf()
function and where is it's code? You didn't write it... The answer is that the code is writen by someone else, and is stored in what's known as a "library". To use that code, first you must include the library. To understand how this "black magic" works, you need to understand the preprocess language and #include
which is kind of copy-pasting a header (plain text file) into your source, defining functions and their parameters so that you may call them. This will begin a down-the-rabbit-hole adventure of the compiler toolchain including preprocessor, compiler, linker, and how they can work with headers and pre-compiled libraries to allow you to call a function that you didn't write. Also a good opertunity to talk about what a standard is, how the standard library is seperate from the language itself, and how compiler vendors are free to implement things differently so long as it conforms to the language and library specifications.
So, printf()
is the function that's being called. But, what's this "Hello World\n"
parameter that's being passed to it? Now you need to understand c-style "strings". Looking up the documentation for that function shows that the first argument it expects is a char *
. What's that? First, you need to know that a variable can be seen as an alias for a memory address, at which data can be stored. Variables are defined as a data-type, followed by an name alias (ex. char MyLetter;
). Variables can be assigned a value (or, data to store at it's address) by using the =
operator, and their address can be retrieved by prepending an &
infront of the variable name. A pointer is a type of variable who's data is the memory address of another variable, and is defined by adding an asterisk *
after the data-type, to declare that the pointer points to said data-type (ex. char* MyPtr = &MyLetter;
). "Dereferencing" a pointer is an operation we do to access the memory it's pointing at. With a char
variable, we can store a single character like MyLetter = 'C';
(assigning the character 'C'
to variable MyLetter
). Note that the apostrophe (aka single-quotes) '
species that the enclosed character is to be interpretted as char data instead of the name of another variable. To dereference a pointer, we invoke the variable name but prepend an asterisk *
. Without the asterisk, the pointer will be interpretted as the address it points to. With the asterisk, we're saying "I'm not talking about this pointer - I'm talking about what it points at!". As such, *MyPtr = 'W';
will assign the W character to the memory MyLetter
represents.
Printf expects an address that points to a char, but here we're passing a sentence contained within double-quotes "
with some weird \n
at the end. I said we needed to understand strings; so what's all this "char pointer" business? A c-style string is defined as being a region of null-terminated memory... What the heck is that? To make it easier to understand, let's have a look at arrays.
Storing a sentence in single variables is obviously a silly idea. Instead, we'll use an array. An array is a region of contigious memory which is represented as a variable appended with square-brackets []
, included in which is an integer number which defines what the length of the array will be. Each unit (element) is the size of the data-type the array contains. For example, char a[13];
declares a char array with 13 element, each one big enough to store a character and is individually addressable. To "index" an array (access a single element), you invoke it's name followed by square-brackets []
in which will be an integer number index to what element we want to access. Arrays begin at the zeroth element [0]
, so our a[13]
array has 0
to 12
as valid indecies. Example, a[0] = 'H'; a[1] = 'e'; a[2] = 'l';
and so on, spelling out Hello World
up until we get to the 11th element. In Hello World, we have an extra \n
in the string. The \
is an "escape character". It says that the following character isnt to be interpretted as a regular character. Instead, it's to be treated as a special "escape code" which has some pre-defined meaning. \n
is the code for the newline break
character - which is what happens when you hit your enter/carrage-return key in a text editor. Your carat goes to the next line. So, the 11th element in our array is the '\n'
character. Next, the final element. A string is defined as being "null-terminated". That means the string ends with a null character, which tells whatever's reading it to stop, and that they've reached the end of the string. This null character can be added by using the null-terminator escape code: '\0'
. Without this null terminator, whatever is reading our string will run off the end of the valid data and begin reading junk. This is undefined behavior and can crash your program. We call this a "buffer-overflow". Bad bad bad.
Now, we could pass our new array to printf like this: printf(a);
or alternatively, printf(&a[0]);
. Technically, the first is 'incorrect', but is still valid. The array is being type-cast to a char*
for printf to accept. Type-casting tells the compiler to treat one data-type as another, and can be done manually in C by prefixing the variable with parentheses containing the data-type you want to cast to. In this case, it's acting as if you typed printf( (char*)a );
because there is an implicit conversion from the char[]
to char*
going on that it's not showing you. This is easy for it to do because "under the hood", or better phrased: in executing code, an array is effectively just a pointer to the first (zeroth) element. Indexing the array with square-brackets []
and a number is doing whats known as "pointer arithmatic". It's using a base-address (the zeroth element), and an offset (index) which represents how many units of the base type size to move past. For an example, using made-up numbers; if 120 is the address of a
, and each element of a's data-type is 4 units of memory, indexing a[2]
would be derefferencing data stored at address 128. This kind of pointer arithmatic can be done manually and doesn't require array indexing syntax. If we made char* b = &a[0];
and dereferenced *(b+2)
, be saying "add to the address stored in b, 2 times the width of b's pointed type (char), and derefence it", and would also arrive at 128. Note the parens which are explicitly dictating order-of-operations, ensuring the addition happens before the derefence. This isn't required in this case, but it makes it clear what's going on.
That's awesome and all, but what about the simple double-quotes in printf? There isnt even a \0
at the end! Well, the quotes make a "string literal". This string is stored, in whole, in your programs executable file. It's then loaded into memory, and can be accessed by a pointer to it's first character. This is static const data - and can't be altered. (segwey into [const, static, restrict, and volatile] keyword but im lazy). As the double-quotes are designed to create a string, they include the null-terminator at the end. Where the string in code is created, you could imagine it being a char*
- that is to say, the address of the char (the first, zeroth, character). So, you could char* h = "Hello World\n";
and then printf(h);
, and it would work as you expect. So, when we say printf("Hello World\n");
, we are infact passing a char*
as it expects. However, that char*
isn't stored anywhere... It's used as an argument to printf, and then lost forever. As such, this string literal is temperary, unless you stored it like we did with out char* h
.
Finally, we have return 0;
at the end of our main. Functions can "return" data-types. The int main()
declares a function named main which returns an integer, as per the specification for the entrypoint function. The return
statement This terminates the process - exiting main, with a return code of "0". Essentially reporting to the system that everything went well and the process terminated normally.
Obviously, there's a lot more stuff to dive into, and this is a very small part of C. But it gives you an example of how much there is to learn as a beginner - even from the simplest of programs.