Intermediate C Programming Notes

apaprocki · on July 20, 2014

Solid resource, but definitely a little dated (i.e. no C99). The 'Returning Arrays' part talks about this:

    char *itoa(int n, char buf[]) {
      sprintf(buf, "%d", n);
      return buf;
    }

... but you'd really want to take a length parameter and call snprintf to prevent buffer overflows. This pre-C99 style C is one of the reasons C has a bad rep wrt memory hazards.

edit: Anyone interested in this should probably read through something newer, like http://c.learncodethehardway.org/book/

mtdewcmu · on July 20, 2014

It's converting an int to decimal, so there's an upper bound on how long the string will need to be. You could argue that the length parameter would be superfluous.

Someone · on July 20, 2014

Yeah, you can assume everybody will pass a buffer of length at least ten (2^31 has 10 digits).

Oops, make that eleven, for the zero terminator.

Oops, make that twelve, for the sign.

Oops, int can be 64 bits, better make that twenty-one.

Oops, sprintf honors locale. There may be thousands separators in there. Make it twenty-seven (or twenty-eight? I lost count. Also, looking at lconv, decimal points and thousands separators need not be at every third digit and may be more than one character long)

Yeah, I am sure that length parameter is superfluous.

jabits · on July 20, 2014

Thanks for the link!

advocaat23 · on July 20, 2014

I think it's always weird how multi-dimensional arrays are implemented in most C programming books. Do you guys really use a pointer-pointer approach to represent them? Because nowadays I always allocate a flat array and just calculate the indices. E.g.

  struct dmatrix {
    size_t rows, cols;
    double *data;
  }

and then a simple

  double dmatrix_get(struct dmatrix *m, size_t r, size_t c) {
    return m->data[r * m->cols + c];
  }

is all you need? In particular I find explicit array sizes in function headers very contrived and I have never used them in serious code. Normally it's just a pointer and a length argument similar to e.g. read/write functions in the standard library.

silentvoice · on July 20, 2014

I actually believe this is the preferable approach for performance portability reasons. It's easier to reason about cache /vectorization performance across architectures and compilers when dealing with a single pointer and calculated indices. If you absolutely have to have a pointer to say the first element of a row (resp. column if using column major) you can always do this:

    double* dmatrix_get_row(struct dmatrix* m, size_t r){
     return &m->data[r*m->cols];
    }

A pointer-to-pointer approach I think is most suitable for an actual array-of-arrays, where the length of each subarray can be different. Then you store an array of pointers to the first element of each subarray, and if the data is contiguous you can use that to calculate lengths as well.

mgraczyk · on July 20, 2014

I did not like the "Returning Arrays" section. There are perfectly reasonable ways of returning small arrays in C.

    typedef struct {
        char c_str[25];
    } number_string_t;

    number_string_t
    itoa(int i) {
        number_string_t retval = {0};
        int written = snprintf(&retval.c_str[0], sizeof(retval.c_str), "%d", i);
        // Error checking.....

        return retval;
    }

nly · on July 20, 2014

"Error checking" seems pointless when you've got no means to return an error code. You shouldn't need it anyway, as converting any integer to base10 in a preallocated buffer should be a no-fail operation.

apaprocki · on July 20, 2014

There are performance reasons why you may not want to do so, though. GCC has a warning option (-Waggregate-return) to detect this.

DSMan195276 · on July 20, 2014

I'm curious on the performance issues. IIRC, common x86 C calling convention does returning structs by requiring the caller to allocate memory for the struct (on the stack or otherwise) and then pass a pointer. If that happens then it becomes no different then if you had simply allocated the array on the stack and then passed a pointer to it to the function.

Are the performance issues in the case the architecture does aggregate type returning differently?

apaprocki · on July 20, 2014

If you know your target and know your compiler very well (and make sure that nothing changes the assumptions made over time) then I suppose you can take advantage of that. C++ also has return value optimization but there is no way for the author of the code to know if the various platforms/compilers will actually optimize it, so blindly relying on it can result in poor performance.

And yes, different platforms handle it differently. IIRC SysV PowerPC only supports up to 64 bits in r3,r4 but AIX POWER forces all returned structures into memory. I think SPARC ABI prior to V9 did not use registers either.

fit2rule · on July 20, 2014

>I'm curious on the performance issues.

Whether you can put your return value in a register, or whether you need to housekeep the stack. That's a performance difference.

tpush · on July 20, 2014

Note that, according to SysV AMD64 calling convention, structs with integer/pointer members up to 128 bits(including padding) are passed in registers. ARM 64 bit is similar in this respect, I believe.

DSMan195276 · on July 20, 2014

This looks like a fairly good read for C programming, but it does seem to be showing it's age and has some (now) misinformation. The Copyright is 1996 to 1999, so from what I've looked at this is all C89, no C99 which is what you'll probably see at this point if you look at some C code.

I admit to not having read the entire thing, but I've looked through most of it. A few things that stood out to me:

1. They dance around what a multidimensional array in C actually is, and the line "C does not have true multidimensional arrays." is simply false. I'd let it go, but there's a disconnect from when they go from sections 23 and 23.1 to section 23.2. Section 23.2 is most definitely not a multidimension array. Allocating an actual multidimensional array is actually much simpler, just malloc a block of memory into a pointer to an array:

    int (*a)[3] = malloc(2 * 3 * sizeof(int));

This allocated array works just like any other multidimensional array in C and can be passed as such, and it's much easier to use then the pointer-to-pointer approach. The pointer-to-pointer approach is the only one shown though, as though it's the only option, even though it's not equivalent.

2. In section 18.1.7, type qualifiers, they say 'const' is fairly rare to see. That may have been true when this was written, but in properly written code today it should be decently common to see, mostly in the case that functions are receiving pointers to data they shouldn't be changing and for string literals that shouldn't be changed. Not using const correctly is a source for lots of annoying compilation warnings and runtime errors.

3. The very end of section 20.1 notes that there is no way to define a variable argument macro. That was true at the time of writing (kinda), but it's not true with C99.

4. In section 18.3.1, the switch statement, everything's perfectly fine except that where it says it's common to use a list of #define's over a switch. It's much better to use an enum type to do so since it can annotate your code a bit better, acts basically the same way, you'll get better compilation warnings from it, and in most cases you can let the compiler assign the numbers for you and thus not risk any conflicts from adding and removing options. For whatever reason, I couldn't find any info on enum's in the entire read, which is a huge red-flag.

5. A lot of their function pointer information involves pointers defined as:

    int (*p)();

The huge issue with that is that they defined it with an unknown parameter list, and it was intended by the author. There are basically zero times when you actually want to do that, in most cases you should either include the argument types, such as:

    int (*p)(int);

Or put void to indicate no arguments:

    int (*p)(void);

canadev · on July 20, 2014

Could you recommend some modern books on C programming?

pwaring · on July 20, 2014

There's 21st Century C from O'Reilly:

http://shop.oreilly.com/product/0636920025108.do

It has mixed reviews - my impression from reading it was that if you agree with the author's choice of tools, it's a good read. There's supposedly a new edition coming out in September.

There's also Learn C the Hard Way:

http://c.learncodethehardway.org/book/

I've struggled to find any other modern books on C programming.

clarry · on July 20, 2014

> There's also Learn C the Hard Way

Mixed reviews applies to this book as well. For how often I see it recommended on the net these days, I am disappointed at how many annoying little lies it has in it; it's littered with them. It's also very opinionated (and in my opinion some of these opinions are bullshit) and again it likes to impose the author's choice of tools on you, to the point where he doesn't really even bother explaining how to live without them.

It could do a better job of introducing standard C terminology instead of only presenting the ideas the author came up with, and which often conflict with the concepts as defined by C. It could also do a better job of explaining C's pitfalls and UB instead of just making the reader break certain things and see how his Linux/OS X/Valgrind responds.

EDIT n+1: I just skimmed again through half of the book. Zed actually uses bstring in one of his examples. I don't know whether that should be taken as an implicit endorsement or not, but shame on him. Also, his ideas about secure string handling make me cringe. Also, his "safer" string copying function is not correct. In fact the entire book is littered with lies about strings which no doubt contribute to the confusion people have about them. I'm sure he's well intentioned and he makes good points but for someone wanting to learn how to do C cleanly and securely, I just cannot recommend this book.

cremno · on July 20, 2014

Besides being a terrible book, it also contains many small mistakes. My favorite one is checking the return value of memcpy() for non-NULL.

http://c.learncodethehardway.org/book/ex44.html

cremno · on July 20, 2014

I've noticed the first sentence is very weird and doesn't make much sense. Also that isn't just a small mistake. But I can't edit my post anymore. I'm sorry.

I originally wanted to write more and point out some other important things. For example, “long long” never gets mentioned! This is not a good book to learn C. I believe it could be an adequate (or even good) one to train already existent C skills, but it isn't even that at the moment.

Okay, it isn't a final version. We have to wait until next year for that, but LCTHW gets too much positive attention for being just another less-than-mediocre tutorial book on the Internet.

clarry · on July 20, 2014

Hilarious, I didn't pick that one up since it's in the later chapters I skipped..

The bstring thing and his safer copy function are still my favorites for the irony of it. He's sounds so condescending when he attacks C strings and an old K&R example. And then goes on to show how to do it wrong, without really even solving the problem he's trying to solve. And he keeps repeating the lie about C strings just being nothing but an array of chars.

aninteger · on July 20, 2014

Ok, I'll bite. What is wrong with bstring and what do you think is a better replacement? I tried to look up criticism on Google about bstrlib and found nothing. I've used bstring in several applications without problem. I have started using antirez's sds though for newer applications.

clarry · on July 20, 2014

https://news.ycombinator.com/item?id=7192044

I don't know what'd be a better replacement; it depends so much on the use case. I'd say there is no one-size-fits-all solution to string handling in C. For most code which occassionally deals with strings but isn't really focusing on text, the usual standard library functions and BSD extensions are just fine. I mean malloc, free, strdup, strlcat, strlcpy, strchr, strstr, snprintf, and so on.

For my code editor, I wrote my own dynamic text buffer with the goal of supporting large files with binary data in it. This is highly application specific code and wouldn't make a good general purpose string library.

I skimmed through the bstring code long ago, and I recall seeing quite a bit of questionable or hairy code. But abusing undefined behavior for security purpose was the final stake so I gave up and decided I do not want to use or endorse that library. Also, using ints for string lengths is just stupid if your code ever has to interact with the outside world. Why does bstring do this?

EDIT: Actually, let me just quote the (in)security statement here:

> Bstrlib is, by design, impervious to memory size overflow attacks. The reason is it is resiliant to length overflows is that bstring lengths are bounded above by INT_MAX, instead of ~(size_t)0. So length addition overflows cause a wrap around of the integer value making them negative causing balloc() to fail before an erroneous operation can occurr.

clarry · on July 20, 2014

Oh crap. Reading further, he does indeed recommend bstring. It wasn't just a random library in a one-off example. Some of the examples also show he doesn't care about arithmetic overflows much. I didn't find him talking about these anywhere much, either (though I skipped a few chapters).

ch_123 · on July 20, 2014

I read through some of "C Programming, a Modern Approach 2nd Edition" by K.N. King. The parts I read were good.

ejr · on July 20, 2014

This looks to be very good. One tiny, and I do mean tiny, wish is that the code follows BSD/KNF style. I know that's all down to preference anyway, but it does make things easier to absorb.