Hacker News new | past | comments | ask | show | jobs | submit login

Unfortunately, much of the pain with C surrounds dealing with strings. It’s been a bit of a theme on Hacker News for the past few days, but it’s actually a pretty good spotlight on something I feel is not always appreciated - strings in C are actually hard, and even the most safe standard functions like strlcpy and strlcat are still only good if truncation is a safe option in a given circumstance (it isn’t always.)

(~~Technically~~ Optionally, C11 has strcpy_s and strcat_s which fail explicitly on truncation. So if C11 is acceptable for you, that might be the a reasonable option, provided you always handle the failure case. Apparently, though, it is not usually implemented outside of Microsoft CRT.)

edit: Updated notes regarding C11.




Whenever I review C code, I first look at the string function uses. Almost always I'll find a bug. It's usually an off by one error dealing with the terminating 0. It's also always a tangled bit of code, and slow due to repeatedly running strlen.

But strings in BASIC are so simple. They just work. I decided when designing D that it wouldn't be good unless string handling was as easy as in BASIC.


In the case of C, it's a design decision Denis Ritchie made that came down to the particular instruction set of PDP-11, that could efficiently process zero terminated strings.

So a severely memory limited architecture of the 70s led to blending of data with control - which is never a safe idea, see naked SQL. We now perpetuate this madness of nul-terminated strings on architectures that have 4 to 6 orders of magnitude more memory than the original PDP-11.

It's also highly inefficient, because a the length of string is a fundamental property that must me recomputed frequently if not cached.

Bottom line, unless you work on non-security sensitive embedded systems like microwave ovens or mice, there is absolutely no place for nul-terminated strings in today's computing.


Mr. Bright, I just want to thank you for creating D.

It is by far my favorite language, because it is filled with elegant solutions to hard language problems.

As a perfectionist, there are very few things I would change about it. People rave about Rust these days, but I rave about D in return.

Just wanted to say thanks (and that I bought a D hoodie).


Your words have just convinced me to try out D. Maybe some good will come out of it :)


Thanks for the kind words!


Hello Walter! All things considered, you are probably the best person to ask for tips on string handling in C.

Would you might sharing the things that you look for, from the obvious to the subtle? I would love to see some rejected push requests if possible. If I were writing C under your direction, what would you drill into me?

Thank you, it is an honour to address you here.


1. whenever you see strncpy(), there's a bug in the code. Nobody remembers if the `n` includes the terminating 0 or nor. I implemented it, and I never remember. I always have to look it up. Don't trust your memory on it. Same goes for all the `n` string functions.

2. be aware of all the C string functions that do strlen. Only do strlen once. Then use memcmp, memcpy, memchr.

3. assign strlen result to a const variable.

4. for performance, use a temporary array on the stack rather than malloc. Have it fail over to malloc if it isn't long enough. You'd be amazed how this speeds things up. Use a shorter array length for debug builds, so your tests are sure to trip the fail over.

5. remove all hard-coded string length maximums

6. make sure size_t is used for all string lengths

7. disassemble the string handling code you're proud of after it compiles. You'll learn a lot about how to write better string code that way

8. I've found subtle errors in online documentation of the string functions. Never use them. Use the C Standard. Especially for the `n` string functions.

9. If you're doing 32 bit code and dealing with user input, be wary of length overflows.

10. check again to ensure your created string is 0 terminated

11. check again to ensure adding the terminating 0 does not overflow the buffer

12. don't forget to check for a NULL pointer

13. ensure all variables are initialized before using them

14. minimize the lifetime of each variable

15. do not recycle variables - give each temporary its own name. Try to make these temporaries const, refactor if that'll enable it to be const.

16. watch out for `char` being either signed or unsigned

17. I structure loops so the condition is <. Avoid using <=, as odds are high that'll will result in a fencepost error

That's all off the top of my head. Hope it's useful for you!


So many potential pitfalls to string functions. But memcpy and friends can have pitfalls too.

I was working on a RISC processor and somebody started using various std lib functions like memcpy from a linux tool chain. I got a bug report - it crashed on certain alignments. Made sense - this processor could only copy words on word alignment etc.

So I wrote a test program for memcpy. Copy 0-128 bytes from a source buffer from offsets 0-128 to a destination buffer at offset 0-128, all combinations of that. Faulted on an alignment issue in code that tried to save cycles by doing register-sized load and store without checking alignment. That was easy! Fixed it. Ran again. Faulted again - different issue, different place.

Before I was done, I had to fix 11 alignment issues. A total fail for whomever wrote that memcpy implementation.

What was the lesson? Well, writing exhaustive tests is a good one. Not blindly trusting std intrinsic libraries is another.

But the one I took with me was, why the hell isn't there an instruction in every processor to efficiently copy from arbitrary source to arbitrary destination with maximum bus efficiency? Why was this a software issue at all! I've been facing code issues like this for decades, and it seems like it will never end.

</rant>


The x86 does have a builtin memcpy instruction. But whether it is best to use it or not depends on which iteration of the x86 you're targeting. Sigh.


>why the hell isn't there an instruction in every processor to efficiently copy from arbitrary source to arbitrary destination with maximum bus efficiency?

Uh, you're not an hardware designer and it shows.. What if there's a page fault during the copy, you handle it in the CPU? That said, have a look at RISC-V vectors instruction (not yet stable AFAIK) and ARM's SVE2: both should allow very efficient memcpy(among other things) much more easily than with current SIMD ISA.


Do they manage alignment? Say a source string starting at offset 3 inside a dword, to a destination at offset 1? That's the issue. Not just block copy of align register-sized memory.

Page fault is irrelevant. It already can happen in block copy instructions.


No, they don't provide alignment but they provide a way to write code once whatever the size of the implementation's vector registers.

As for block copy instruction AFAIK there's no such things in RISC-V for example.


So, no, they don't have anything like an arbitrary block copy that adjusts for alignment. Not surprising; nobody does. So we struggle in software, and have libraries with 11 bugs etc.


strncpy() suffers from its naming. It never was a string function in reality. It is a function to write and clear a fixed size buffer. It was invented to write filenames in the 14 character buffer of a directory entry in early Unix. It should have been name mem-something and people would have never come to the idea to use it for general string routines.


If it respects null terminator, then it is a string function.


It basically expects a string as the source and a fixed-size, not necessarily zero-terminated, buffer as the destination.


Super insightful list.

What will be the alternative for strncpy/strncat? I thought they're a safer strcpy/strcat but now I need something to replace them.

I assume snprintf for sprintf, vsnprintf for vsprintf.

No idea what to do with gmtime/localtime/ctime/ctime_r/asctime/asctime_r, any alternatives for them too?


My alternative is to do a strlen for each string, then use memcpy memset memchr instead.

> I thought they're a safer strcpy/strcat

Let's look at the documentation for strncpy, from the C Standard:

"The strncpy function copies not more than n characters (characters that follow a null character are not copied) from the array pointed to by s2 to the array pointed to by s1."

There's a subtle gotcha there. It may not result in a 0 terminated string!

"If the array pointed to by s2 is a string that is shorter than n characters, null characters are appended to the copy in the array pointed to by s1, until n characters in all have been written."

A performance problem if you're using a large buffer.

Yeah, always prefer snprintf.

The time functions? I'm just very careful using them.


Also most of the time people have serious performance regressions with strncpy() as the function overwrites with 0 all the test of the buffer.

     char buffer[2000];
     strcpy(buffer, "hello", sizeof buffer);
writes "hello" and 1995 0 to the buffer.


Thank you Walter! I will be sure to internalize this. There are some terrific tips in here, such as using shorter array lengths for debug build and avoiding <= as a loop condition. And I don't recall ever seeing char signed, but now I'm terrified.

Thank you, have a great weekend!


char being signed used to be commonplace. But it is allowed by the C Standard, and it's best not to assume one way or the other.


Thank you for the great list. Could you give examples of 8. subtle errors in online documentation?


The trouble stems from the C Standard being copyrighted. Hence, anyone writing online documentation is forced to rewrite and rephrase what the Standard says. The Standard is written in very precise language, and is vetted by the best and most persnickety C programmers.

But the various rewrites and rephrases? Nope. If you absolutely, positively want to get it right, refer to the C Standard.

printf is particularly troublesome. The interactions between argument types and the various formatting flags is not at all simple.

Other sources of error are the 0 handling of `n` functions, and behavior when a NaN is seen.


With so many gotchas, it irks me when they still teach C for the undergraduates.


IIRC, in the early days of the Commodore PET, it used a method of keeping track of strings that was fine in an 8k machine but was too slow in a 32k machine. They had to make a change that avoided quadratic time on the larger machine. So string handling in BASIC wasn't always that simple.


It always blows my mind when I remember 8-bit computers had garbage-collected strings.


+1 for the PET mention since it was my first "computer". much overlooked in favour of the 64


Ah, yes. I recall the luxury of a Commodore of my very own (a C128), after using PETs in school. We had a whole three of them at the time, with a shared, dual-floppy drive for the set.

Naturally, our teacher wisely pushed hard on figuring what you could out on paper first.


  > Naturally, our teacher wisely pushed hard on figuring what
  > you could out on paper first.
Specifically in the case of the Commodores (I grew up on a C128) I find this observation backwards. Sure, if you only had three machines for twenty students then time on the machine was valuable. But on those machines there was so much to explore with poke (and peek to know what to put back). From changing the colours of the display to changing the behaviour of the interpreter.

I think that I discovered the for loop at eight years old just to poke faster!


I’m not sure if it was the original purpose of C, or of it’s what made C popular, but compared to BASIC, processing strings in C was much faster.


Everything was faster in C - it was compiled and BASIC was interpreted.

Better comparison would be between C and Turbo Pascal strings in DOS times. TP strings were limited to 255 characters but they were almost as fast as C strings, in some operations (like checking length) they were faster, and you had to work very hard to create a memory leak or security problem using them.

I've learnt Pascal before C and the whole mess with arrays/strings/pointers was shocking to me.


UCSD and Turbo-Pascal had it easy with the 255 byte strings. They had real strings but these were compiler extensions. Real Pascal didn't have string support and you could only work with packed array of chars of fixed size and as the language was extremely strong types, to packed chars types of different lengths were considered different types, so you had to write procedures and functions for all used packed array sizes.


Brian Kernighan on "Why Pascal is not my Favorite Programming Language" (https://www.lysator.liu.se/c/bwk-on-pascal.html) [1981].

Turbo Pascal wasn't released until 1983, if the wiki is to be believed.


I find it strange that he complained about Pascal's lack of dynamic arrays, when the Pascal solution is to use pointers (exactly what C does for all arrays and strings anyway).

Many of his other points are solved by Turbo Pascal and Delphi/Object Pascal.

But of course nowadays there are better languages for real world programming. It's just a shame that there's nothing as simple and elegant for teaching programming ().

() lisp is even more elegant, but it has a lot of gotchas and it's so far from mainstream that using it for teaching isn't a good idea IMHO


I learned C before Pascal and having to write so much code to deal with 255 character limits was kind of jarring.


I teach at university as external lecturer. Teaching strings in C is the hardest thing I have to do every time. The university decided to explain C to first year student without previous experience. My feedback was to do a precourse in Python to let them relax a bit with programming as a concept and then teach C in a second course.


> I teach at university as external lecturer. Teaching strings in C is the hardest thing I have to do every time.

But if you keep up the good work you will one day go from

  extern void *lecturer;
to

  static const lecturer;


More commonly

     volatile unsigned short lecturer;


Actually it usually ends up being much simpler than a compiled language. Something like this:

    delete from schema.hr.employee

    where employee.employee_type = 'Lecturer'

    having rownum = cast(dbms_random.value(1,count(*)) as int)
Most Deans' computers have it mapped to alt-delete. They don't even know what it does-- it's just called the "reduce budget function". Which is really unfortunate because when they hit ctrl-alt-delete on a frozen system, but miss the ctrl key by accident, some poor lecturer gets fired and at the end of the semester the Dean says "Huh, wonder where that budget surplus came from.".

Once an entire physics department was disbanded when their Dean's keyboard had a broken ctrl key.


In C++ we'd have to decide if lecturer needs to support move semantics.


Probably just delete.


Not if they're tenured. Then you can assume they'll never move.


Minor detail: lecturers don't get tenure.

The job role of 'professor' may be able to get tenure (I think these roles usually do) but 'lecturer' really means 'full time temporary teacher, with a contract for a specified amount of time.


I occasionally adjunct. What students call me at the beginning of the semester is always awkward:

Them: "Hello Professor"

Me: "Technically I'm not a professor."

Them: "Okay, we'll just call you Doctor."

Me: "Yeah, about that... not a doctor either."

Them: "So why are we paying you?"

Me: "Technically, you're paying the school. And the school is paying me... very little"

Them: "Answer the question"

Me: "Because I know stuff that you don't."

Mostly they still just call me professor and I feel awkward every time.


I knew someone who was TA'ing a class back when they were in grad school. I heard a story about him - to get ahead of this uncertainty he gave the class three options for what to call him:

1) 'Steve' (his first name)

2) 'Mr. Wolfman' (his last name)

3) 'Darth Wolfman' (funny, obvious not meant to be taken seriously, option)

Guess what the class overwhelmingly voted for? :)


I don't think you should feel awkward. I refer to all my teachers in emails as professor ( unless I want to list more detailed honorifics ). My current analytics guy is clearly very smart, seems to be in that adjunct zone, but I address him as professor out of sheer respect.


For all the complicated social protocols in that neck of the woods, this would be simple in Japan. You're just 先生 (sensei) and that's it.


If they ever ask "What do we call you?" you should answer,

"God-Boss."

(Pace Steven Brust.)


I teach first-years in Australia, where boys from private schools call me "sir". When I'm feeling mean, I tell them to drop and give me ten pushups.


> ten pushups

I'm guessing you don't teach computer science


OK. 10 push_backs() then.



Not the case in the UK at least.


    Professor(Professor&&) = delete;


I'm not really const. I'm definitely volatile depending on the budget. It's definitely a side gig.


I need a side gig, for shits and giggles. I miss uni a lot, for the community of it. Would you recommend it?


I love teach students what I know. I would love it to be a full time job. But then I realized I got it due to my work experience so...


I can confirm, this is exactly what happened to me.


In my school, we had two days to understand the basics of text editors, git (add, commit, rebase, reset, push) and basic bash functions (ls, cd, cp, mv, diff and patch, find, grep...) + pipes, then a day to understand how while, if/else and function calls work, then a day to understand how pointer work, then a day to understand how malloc(), free() and string works (we had to remake strlen, strcpy, and protect them). Two days, over the weekend, to do a small project to validate this.

Then on the monday, it was makefiles if i remember correctly, then open(), read(), close() and write(). Then linking (and new libc functions, like strcat) . A day to consolidate everything, including bash and git (a new small project every hour for 24 hours, you could of course wait until the end of the day to execute each of them). And then some recursivity and the 8 queen problem. Then a small weekend project, a sudoku solver (the hard part was to work with people you never met before tbh).

The 3rd week was more of the same: basic struct/enums exercises, then linked list the next day, maybe static and other keyword in-between. I used the Btree day to understand how linked list worked (and understand how did pointer incrementation and casting really work), and i don't remember the last day (i was probably still on linked lists). Then a big, 5-day project, and either you're in, or you're out.

I assure you, strings were not the hardest part. Not having any leaks was.


This heavily filters for people who have had experience with programming in high-school or even before that, there's no way for a programming novice to pass that grueling routine.

And then people rhetorically ask themselves why students coming from economically disadvantaged households are under-represented in this industry (one of the best paying industries in this time and age). Stuff like that has got to change.


> one of the best paying industries in this time and age

Medicine is still better paid and better paid universally. Silicon valley is really the outlier here, most of Europe and the world programmers don't get paid that much in comparison.


Software development is usually better paid in Poland than medicine. Medicine starts to pay good way later in life, and only in certain specialisations.


And that's ultimately why we had the most surplus deaths in 2020 in EU.


Medicine also requires, after college, medical school and a residency - typically 6 to 9 years work. Programming requires none of this.


In the US. In many other places, medicine is an undergraduate field of study, you're in a hospital from your first year, and by year 3 you're being paid.


> Programming requires none of this.

But it requires you to refresh your knowledge constantly so from this point of view it's similar


If you're arguing that medicine does not, then I hope you're doing software engineering.


> so from this point of view it's similar


In The Netherlands this seems to be true. However, as a programmer you can work from home in many cases, especially now. So suppose that a junior psychiatrist makes 5000 EUR gross in NL [1] and a junior developer 2600 EUR gross [2].

A few things though:

1. A psychiatrist has to commute 1 to 2 hours per day. So that salary is not for 8 hours per day, but 9 hours at minimum. Adjusting their salary to an 8 hour basis, it needs to be multiplied by 8/9 or higher like 8/10.

2. The psychiatrist has to be on location. The cost associated with that is hard to quantify, but it is there. For example, I always sleep during the afternoon for 20 minutes, a psychiatrist can't do that. Also, I can take a break whenever I want, a psychiatrist can be on call for 24 hours straight in severe cases. Let's suppose this gives a cost of 1/16 as a multiplier (half an hour of extra work per day).

So the minimum overhead a psychiatrist has is 16/19, their salary is then 4200 EUR. This can be amazing or not so much, considering your own personal preference. My personal multiplier is 0.8 on top of all of this, so for me a 5000 EUR salary is worth 3360 EUR if it's working as a psychiatrist.

As a developer I experience something different, which is:

1. I do not have to commute, I can if I want to, but don't have to.

2. I do not have to be on location, nor do I have a strict schedule for going client after client. I can take random breaks during the day if it helps me be more productive.

So a developer's salary for 2600 EUR is much more like an actual 2600 EUR in that sense. Moreover, my personal multiplier for being a developer is a 1. There are some things I dislike and some things I absolutely love about being a dev (e.g. being a true netizen in the sense that you can randomly act with APIs if you want to).

To conclude: the absolute values are far apart, but the relative values might not. It differs on a person by person basis, and I haven't discussed the whole picture of course (e.g. needing to stay sharp as a dev, I don't know how that works for psychiatrists).

[1] https://www.monsterboard.nl/vacatures/zoeken/?q=Psychiater&w...

[2] https://www.glassdoor.nl/Salarissen/junior-web-developer-sal...


At what age are you a junior developer and at what age are you a junior psychiatrist in NL? A bachelor's developer could be as young as 21 I guess, but at least for most jobs in medicine you can't work independently until much later. Maybe it's different for psychiatry?


It's not different for psychiatry. You need to have done a bachelor + master in medicine and on top of that a specialization. I don't know how long that takes though, but I wouldn't be surprised if it's 8+ years.


Junior developers aren't really a thing in many places. They're just developers.


In NL they are.


Medicine has been more poorly paid than FAANG software engineering in the last two places I've lived (South Africa, Australia)


FAANG engineers are also massively over paid, compared to your average software engineer in any random country.

When people on HN discuses salaries, or I see a job posting from a Silicon Valley company I can't help think that we don't even pay our CTO that much. Frequently you could get two developers for the same price here in Denmark.


Those companies have one heck of a combined market position and it is all built with software. I would say their software engineers are paid more but I doubt that "overpaid" applies.

Think about what your CTO could do in that setting and realize that he's probably worth more to FAANG shareholders than to you hence the salary differential.

For the record, I do not work at a FAANG.


> Medicine has been more poorly paid than FAANG software engineering in the last two places I've lived (South Africa, Australia)

Interesting. I'm in South Africa, right now. The largest offer for a senior C# dev *right now* on www.pnet.co.za is R960k/a.

Twelve years ago, the GP that I was dating, who worked in a *state hospital* (i.e. not making as much as she could have in private practice) was making more than that.

I don't believe that doctors' salaries over the last 12 years have effectively been lowered. OTOH, if you know of places where they are offering more than R1.8m/a for senior developers, then by all means give me their contact details.


Happy to refer you to AWS! I was making over 1m rand (TC) as a mid level non dev position. I had friends in development making above the number you're talking.


Medicine has other filtering systems.


Having gone through the same experience, I can tell you that it isn't necessarily the case. More often than not, those who had some programming experience in some high-level language would often get discouraged with the difficulty and drop out.

In the end, it was mostly those that didn't get discouraged and socialized with the other students that would remain in the end.

I myself did not have any programming experience before going through that ordeal.


My experience with C courses with this structure of automatically validated home works not only filter "the weak" but also people with previous (especially C on Unix) experience, because nobody with any kind of practical Unix experience will write code that will pass these kinds of rigorous C-standard conformance and memory leaks checks, because for practical applications doing all that is actually not only unnecessary bud also detrimental to runtime efficiency.


I think a passing test suite, no diff after clang-format, clean valgrind and clang-analyze checks are not too much to ask for. As long as the requirements are documented and the system is transparent and allows resubmission.

But I agree there is a risk of academic instructors going way overboard in practice, e.g. by flagging actually useful minor standard conformance violations (like zero length arrays or properly #ifdef'd code assuming bitfield order).


My aversion to such systems is primarily motivated by the fact that every one of such system somehow penalized resubmissions. I probably don't have anything against "you have to write program that is compiled by this gcc/llvm commandline without producing any diagnostics and then passes this intentionally partially documented test suite". But in most cases the first part ends up meaning something like "cc -Werror -std=c89 -ansi -strict" where the real definition of what that really means depends on what exactly the "cc" is and the teachers usually don't document that and don't even see why that is required (ie. you can probably produce some set of edge-case inputs to gcc to prove that gcc is or isn't valid implementation of some definition of C, but this conjecture does not work the other way around).


In most of my courses that did something like this there was no resubmission.* The professor supplied a driver program, sample input the driver would use to test your program, expected output, and a Makefile template that gave you the 3 compilers + their flags that your program was expected to compile against and execute without issue. His server would do the compile-and-run for all 3 against the sample input and against hidden input revealed with the grade. He used the same compiler versions as were on the school lab computers.

* As a potentially amusing aside, a different course in a different degree program had a professor rage-quit after his first semester because he didn't want to deal with children -- he had a policy of giving 0s on papers with no name or class info on them, and enough students ("children") failed to do that correctly but complained hard enough to overturn the policy and get a resubmit.


You shouldn't underestimate the novice. The professors who do such weeder classes will have the data though, so you don't have to believe anyone's experiences if you can instead ask a professor... For what it's worth though I'll add to the sibling comments and state in my experience too prior programming experience is less correlative than you seem to think. (I had it, though I quickly found out after my first week "I thought I knew C, I do not know C.")

Those who have been subjected to such programs can also probably agree that the filtering of the first semester (and there is a filter, but again we think it's a fair one not dependent on prior programming experience or other such privilege) ends up normalizing everyone, for the benefit of everyone. For the people who started at 0, they're now Somewhere nearby everyone else, ready for the next (harder) material, and for people who started with some "advantages" they've discovered they... are also now Somewhere, not Somewhere Else ahead of everyone like they might have been at the very start. In these sorts of programs, people with prior experience find that they couldn't sleep through their classes and get A's like they might have pulled off in high school, their advantages were not actually that significant after all, and indeed some from-nothings can and do perform better than they.

For anyone who just wants access to the software industry's wealth, I'd encourage them to ignore college entirely. There may be a case-by-case basis to consider college, especially if you need economic relief now in the form of scholarships/grants/loans only accessible through the traditional college protocol, but in general, avoid.

(If you want something besides just access to the wealth, you have more considerations to make.)


I went through a very similar gauntlet in my first undergrad computers class. I didn’t know anything about programming or linux, but it was fine.

I think the filter is more effective for finding those who can quickly adapt, learn, and grok a methodical mindset. Not necessary characteristics to be a programmer, but necessary characteristics to excel at programming.


Does it though, or is it more survivorship bias and maybe lucking into finding someone who will spend hours mentoring you?

I've mentored quite a few first semester students (in my spare time, to help. Not as a job) and there is no way some of them would've passed without serious help.

At some point I used to think privately that CS should have a programming test as an admission exam, because these students did drag everyone down. If medicine and law have admission restrictions, why not CS too?

But I have changed my opinion because I think everyone deserves a real opportunity, and our school system does not provide a level playing field sadly. (Also the medicine & law admission criteria are GPA based and that is the last thing I'd want for CS.)

Anyway the real filter was always maths.


I don't understand your concern. How is teaching programming in school discriminatory? Would it be better to not teach programming at all?


Depends how you teach it. Imagine teachers at primary schools started teaching English by analyzing Shakespeare.

Or imagine math was taught by giving kids all the axioms and requiring them to derive the other rules needed to solve tasks as needed :)

Kids from well off families would be ok - it would just be considered another random thing you have to teach your kids to help them make it.

But other kids would suffer and think "English and math is not for me".


I understand what you mean. This has nothing to do with programming, it's a general (and difficult!) concern regarding everything that is taught at schools.

By that same argument, schools should not teach anything that is not widely known by 100% of the parents of each kid. Otherwise, it would be discrimination to those kids whose parents cannot help. I disagree very strongly with this principle.

I have two kids, and the best things that they learn in school are precisely those that I'm unable to teach them. For a start: mastery of the language, since I'm not a native speaker of the place where we live. I would be frankly enraged if the school lowered the level of language exigence to accommodate for the needs of my kids who do not speak it at home!


> By that same argument, schools should not teach anything that is not widely known by 100% of the parents of each kid. Otherwise, it would be discrimination to those kids whose parents cannot help. I disagree very strongly with this principle.

Not at all what I mean. I mean schools (at least primary schools) should be designed for top 80% or 90% not for top 10% or 20%. You can never get to 100% but resigning from the start and going for 20% makes no sense.

You should expect people taking math at university to be able to solve linear equations and explaining it is a waste of time but you shouldn't expect kids in primary school to be able to do the same and it is your responsibility to prepare them in case they want to pursue academic career.

If public schools teach linear equations it's ok to assume that knowledge at university.

If they don't - it's not.

It should be the same with teaching programming and anything else is just funding rich people kids education by everybody's taxes.

The whole point of common public low-level education is to maximize the number of people participating in the economy. It's much better if everybody can read and write. Whole industries are impossible without this. And so is democracy.

It's the same with basic programming and math literacy. It benefits the whole society if vast majority of people have it.

If you "weed out" 60% or 80% of population just because they happen to be born in the wrong environment or went to the wrong school - you lose massive amounts of money and economic/scientific potential. Then you have to import these people from countries which don't fuck their own citizens in such a way.


I agree that public school should not leave any kids behind. I also want my taxes to be raised to fund a higher-level education for kids who may find it useful, even if it's only a small percentage of kids.


Sure but that's only fair if the assumed skills at higher levels are attainable for an average person that went to a public school.

BTW "no child left behind" isn't practical, there are people who can't learn basic stuff no matter how hard you try. But "less than X% kids left behind" is for some low value of X.


paganel doesn't think it's discriminatory to teach programming. Rather, he thinks orwin describes a class that's too fast paced - a class that wouldn't teach much, and would mostly weed out kids who hadn't self-taught themselves before they reached college.

He fears while a professor might imagine they're weeding out people who lack 'dedication' or 'aptitude' they're actually weeding out people who didn't grow up with a PC at home.


My professor (head of CS dept) referred to these as 'weed out classes'.

If that sounds evil, imagine the grief, wasted money, time, frustration, and stress of letting people get 3-4 years into computer science and then dropping out because it's fucking hard.

So my second hardest classes were freshman year. 3rd year (micro-architecture and assembler)finally bested them.


I don't really get the correlation between household income and programming experience in high school.

Their parents can't afford a laptop? They can't afford an Internet connection? The kids don't have a good place to learn in their house? They don't have time?

Is programming affected more than other subjects like math, English/grammar, science, etc?


> Their parents can't afford a laptop?

Yes! There are millions of kids in the US whose parents can't afford a cheap $300 laptop. The federal government pays for school lunches because there are so many kids who otherwise wouldn't even be getting decent food otherwise.

> They can't afford an Internet connection?

See above. Also, there are many places in the US where getting broadband service is very difficult. Including places just an hour outside of Washington, DC. My parents were only able to get conventional broadband service a few years ago. Prior to that they paid exorbitant fees for satellite internet service with a 500mb per month cap.

> The kids don't have a good place to learn in their house?

Imagine being a kid with 3 siblings and your parent(s) living in a studio apartment. Or a kid that doesn't have a stable "home" at all.

> They don't have time?

That can be an issue too, depending on age. A teenager may be working outside of school hours to help take care of the family's financial needs.


> Their parents can't afford a laptop? They can't afford an Internet connection? The kids don't have a good place to learn in their house? They don't have time?

All of the above, and it's surprising this isn't obvious. It may be hard to notice or internalize if you've never seen it and only know privilege, but possession of all or even some of those things is not a guarantee for everyone. Believe it or not, there are some who don't come home to a computer, caring (or even existent!) parents, stable meals, or free time.


It isn't obvious to me because I've never lived in the US. I was genuinely asking, not trying to shame people who can't afford a computer.

Maybe I didn't use the right words to formulate my question.


Thanks for clarifying the context of your question. Very helpful, and changes the tone completely. 1/2 of the people in the U.S. "don't pay taxes", that is, don't make enough money to owe taxes. So that's one issue. I mentor a hispanic kid who's mother's English was so weak and her knowledge of 'the system' so weak, that she couldn't take advantage of programs to provide used computers to her kids, or low-cost Internet access to her household. And the $10/month for low-cost Internet access WAS out of reach. 10 people in a two bedroom apartment was also their norm.


> I don't really get the correlation between household income and programming experience in high school.

> Their parents can't afford a laptop?

Holy crap, the amount of privilege shown off in just two sentences is absolutely astounding.

This may come as a shock to you, but a very significant number of people don't have a couple hundred dollars to buy a low-end used laptop. 40% of Americans would struggle to come up with $400 for an emergency expense [0], let alone save $400 for a laptop.

[0] https://www.cnbc.com/2019/07/20/heres-why-so-many-americans-...


> 40% of Americans would struggle to come up with $400 for an emergency expense [0], let alone save $400 for a laptop.

It actually doesn't say that, it says they don't have $400 in cash equivalents but may be able to produce it by selling "assets". So a person who keeps all their savings in CDs or investments also counts, although only for expenses you can't put on credit cards.


The working poor simply don't have the safety net of good credit or wealthy families that the vast majority of HN commenters do. If you're living paycheck to paycheck and get a $400 surprise expense, you don't have $400 just sitting in some money market account, or two shares of SPY they can just sell, because the poverty wages being paid to millions of working people leave zero margin to build any sort of financial security, leading to a kind of precariousness that is unimaginable to the educated professional with a comfortable upbringing. "Assets" means things like a wedding ring, some power tools, a computer, or maybe even a 1995 Dodge Neon; if you go to a pawn shop, you can see the sorts of things people pawn (or sell) when they desperately need $400 for an emergency expense.

They often take payday loans, mortgaging their next minimum-wage paycheck; since the next paycheck minus payment no longer covers their regular living expenses, they take another predatory loan or pawn another heirloom. 80% of people who take a payday loan have to renew it because they can't repay it. I have a deep personal dislike for Dave Ramsey, but he does a good job of explaining how even minor emergency expenses can lead to a cycle of debt and further despair. (https://www.daveramsey.com/blog/get-out-payday-loan-trap)

There is so much more instability and precarity in this country than most PMC people can imagine.


Only 19% of the people who could not be able to pay cash or its equivalent said they would be able to sell something. 29% said they would be unable to pay.

Figure 12 on page 21 of the underlying report - https://www.federalreserve.gov/publications/files/2017-repor... .


Do you think people living paycheck-to-paycheck, constantly having to decide which bill to pay and which to allow to collect a late fee, have investments or CDs?


74% of American families own a computer. The other 26% do not. Incomes are also highly clustered- chances are any given cohort of high school students either has 90% or more or 10% or less kids with computers, without a lot in between. Relatively poor districts may have a very general computer literacy class (typing, word processing, spreadsheets) but won't have a programming class, because the logistics of getting them adequate time at a computer to complete the work is impossible.

https://www.statista.com/statistics/756054/united-states-adu...


This may be less so now than it was 10 years ago, but I absolutely promise you that having a decent computer (not great, not a gaming pc, but just something that a kid can feel comfortable experimenting with) that is readily available (and not being shared with siblings) is absolutely a luxury.


In the UK, every student does Math, English and Science to a basic level. Maths and English in particular were held up as non-negotiable if you ever wanted a job, I suspect so you had a reasonable level of literacy and numeracy to be able to count money, read letters, etc.

Conversely, programming was not available in my fairly middle-class school. In terms of money, we only have to look to the laptops schools are providing to students (or not depending on government funding) to see how many children don't have access to a laptop. A good place to learn can also be hard to find for large families in small houses which is sadly all too common for low income households.


The cost of computers has come down a ton, but it was a much bigger deal in the 90's and earlier. A lot of people didn't have computers at home. A decent x86 system (like a 486 with VGA etc) was at least $2500 or so. That's without any programming tools... compilers weren't free. When I meet fellow developers who didn't have computers growing up, I realize how privileged and lucky I was.


"Is programming affected more than other subjects like math, English/grammar, science, etc?"

Probably a bit more, as it is common to learn other subjects by a book, but learning programming without a computer ... sounds hard.


Dijkstra probably would disagree, but his isn't a common opinion.


Ooh, the Epitech cursus. Nice.

Also, I'd say "not having segfaults" is the hardest thing to get right when you're going through that.


Eh, not really possible in my experience... more like ‘incidentally becoming a gdb wizard in order to be productive with C’!


Seeing valgrind come up with 0 leaks after like 10 hours straight on a lab was such a good feeling


F. This sounds too hard. I mean, I know how to turn code into money but I'd fail this.


I don't this looks like a beginners course though. My students have zero experience.


Let's face it, the Moulinette was the hardest part.


No, the hardest part is not punching the asteks in the face when you ask them for help on a problem that has stumped you for two hours, they take one look at your code and they go "C'est pas à la norme!" AND THEY WON'T EVEN TELL YOU WHERE.


Piscine ?


Most of the C I wrote was while in college. I think understanding the question, "why are strings in C hard?" is a good gateway to understanding how programming languages and memory work generally. I agree with you though that teaching C as introductory is probably not the best — our "Programming in C" course was taken in sophomore year.

I wouldn't want to use it my day job, but I'm glad that it was taught in university just to give the impression that string manipulation is not quite as straightforward as it's made to appear in other languages.

The early days of Swift also reminded me of this problem – strings get even more challenging when you begin to deal with unicode characters, etc.


It's also because other languages have better designed strings. D, go, rust, etc. have pointers too but their string handling is based on slices and arrays, which are approximately 10,000 times less footgunny.


I am seeing Python becoming the go-to language for many academics because it's easy to hack something together that somehow works.

Unfortunately most of those developers don't care much about efficiency and Python is out of the box inefficient compared to other high-level languages like Java [1] or C#. OO Java courses circulating in academia lack modern functional, and to be frank educational, concepts and must to be refreshed first.

I personally would recommend to start with Java and Maven because it's still faster than C# [2], open source, and has a proven track record in regards of stability and backwards compatibility. Plus quickly introduce Spring Framework and Lombok to reduce boiler plate code.

For advanced systems programming I suggest looking into Rust instead of C/C++.

And last but not least the use of IDE's should be encouraged and properly introduced, so aspiring developers are not overwhelmed by them and learn how to use them properly (e.g. refactoring, linting, dead code detection, ...). I recommend Eclipse with Darkest Theme DevStyle Plugin [3] for a modern look.

[1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

[2] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

[3] https://marketplace.eclipse.org/content/darkest-dark-theme-d...


I generally agree with you (especially on the update java guides), though I think it is important to teach C/C++ after some experience with a higher level language, if for nothing else, the large amount of already existing code bases.

I also like the newfound interest in some FP languages, I for example had a mandatory Haskell course in first year — we did not take Monads in this course yet, but I think it is a great introduction for students for a different take from the more imperative world.


Python and Java fill very different niches in the ecosystem. You're not going to cobble something together quickly in Java, the language just isn't designed to do that. Python is the software equivalent of cardboard-and-hot-glue prototyping, which is fairly common in academia.


I've studied IT in early 00s in Poland, the course was a little outdated, but it had some advantages.

We've started from 0 with no assumption of any computer knowledge and first 2 years most courses were using Delphi (console only, no GUI stuff, basically it could just as well have been Turbo Pascal, some Linux enthusiasts used FPC instead of Delphi and it worked).

We all complained that we want C++ then, but I've learnt to appreciate Pascal later. After first few months we've known every nook and cranny and there was very little corner cases and gotchas. So basically we focused on algorithms and not on avoiding the traps language set for us.

Most people had no programming experience and after a few weeks they wrote correct programs no problem.

I doubt this would happen if we started with C++ as most people wanted, and I think it's better than Python as a starting language because it teaches about static typing and difference between compile- and run-time.

Sadly it's a dead language now.


I've actually had more success teaching assembly as a first language than C. There's less magic, and you borderline have to start with the indirection of pointers in a way that people seem to grok a lot easier than the last month of the semester of learning C.


+1, my university's program seemed to work well with "program anything" (Python), "program with objects" (Java), "program some cool lower-level stuff" (C)


If I had to choose a language to teach programmers to absolute beginners, I think I'd actually go with Go.

I understand the predilection for Python but there are some parts of Python that are just... odd.


Python is great fun, and you can be really productive with it, but for people first coming into programming, a language with an explicit and strict type system is invaluable.

I used to think that everyone should be taught python first, because it lets you focus on the meat of computer science - algorithms, data manipulation, actually _doing_ something - but after helping my girlfriend out with some comp sci 101-104 projects, I really think Go, Java, or Rust should be everyone's first language. It's hard for someone new to the field to understand the nuances that come with python's corner cutting. You can work yourself into some weird corners because of how permissive the language is, where in a (strongly) typed language, the complier just says no.


I use Python a lot professionally these days, after having been a C#/Java developer for a while (and some experience with C and Free Pascal). I absolutely love the language.

I always feel a little iffy when people talk about Python like it's a language ideally suited to beginners.

Dynamic typing puts so much power in your hands to create expressive structures. But it requires discipline to use properly. It's a great trade off for me but I don't think it would be for beginners.


My reasons for suggesting Python compared to Java for example is due to the fact that I teach to Electrotechnical engineers and there are plenty of libraries to experiment with Raspberry and stuff and it's a little bit higher level than C. Every language has its own difficulty to teach but the fact the companies have a banned.h it's basically saying "well C gives you functions for C, but don't use them". It makes it unecessary harder to explain something to people with no experience.


> You can work yourself into some weird corners because of how permissive the language is, where in a (strongly) typed language, the complier just says no.

Could you share an example?


Here's one my (intro programming, non-major) students have just been tripping over this week:

  if word == "this" or "that":
      ...
Not an error, always runs. Very mysterious to a beginner. (Shared with C/C++) Another one:

  counter = "0"
  for thing in things:
      if matches(thing):
          counter += 1
The error is in the init, by someone who is overzealous with their quoting, but the error is reported, as a runtime error, on the attempted increment, which throws a TypeError and helpfully tells them "must be str, not int", and of course I know exactly why it's reporting the problem there and why it's giving that error, but it's a bit confusing to the newbie programmer and it doesn't even turn up until they actually test their code effectively, which they are also still just learning how to do.


There are parts of Go that are similarly odd. Arrays and slices, and the hoops you have to jump through to do something as simple as adding a new item to a list, are very unlike anything else, for example.

In Python, the weird stuff is generally easy to avoid/ignore until it's actually needed.


Very true, but if you're teaching CS then this also exposes students to the topic of ownership-versus-reference in a much gentler way than C.


If I remember correctly, slices in Go also have ownership semantics, in a sense that so long as any slice exists, the array is kept alive by the garbage collector.

Or do you mean value vs reference semantics? In that case, I think C pointers are simpler as a fundamental concept, and slices are best defined in terms of pointer + size.


In Java you do the exact same reallocation dance as append does behind the scenes when using arrays.


Java standard library does that, not you directly. The issue here is not performance, but rather ergonomics. For example, in Go, you can forget to assign the result of append to the variable, and it'll even work most of the time (because there was still some unused capacity in the array, so there was no need for reallocation).


Which method? Note im talking about the language level array, not ArrayList. Also this was years ago.


Java arrays don't have add() at all - they're fixed-size once allocated.

ArrayList etc do have add(), and they implement it by re-allocating the backing array once capacity is exceeded.

In practice, you'd use the ArrayList anyway. I don't think it's worthwhile comparing Go and Java "language-only", because the standard library is as much a part of the language definition as the fundamental syntax; indeed, what goes where is largely arbitrary. E.g. maps in Go are fundamental, but the primary reason is that they couldn't be implemented as a library in Go with proper type safety, due to the lack of generics.


Sorry to bug you since this is unrelated. I'm a huge fan of teaching others and I was wondering how you got to be an external lecturer at a college? I'd love to teach classes related to software engineering and data structures. Would you mind emailing me (in my profile) about this?


So this is what happened for me: I went for a walk in the forest with my wife and some of her friends. There was one friend that had an husband working at university.

We started talking and basically we discovered what he was teaching really related for what I do for work so he asked me to become a "mentor" meaning a professional that helps students with their thesys.

In the meantime I went to talk during his class about product management as an engineer where basically I said "I'm an engineer like you, go and talk to customers, it's part of the job", plus extreme programming stuff etc...

After that there was a position open and this professor recommended me because I told him it was one of my goals to be a teacher as well.

And then from there I met the head of dept. He was happy with me being versatile, I usually handle C, database design or java.

But the usual stuff is go to the university you like an look for open positions.

I need to get confirmed every semester and apply again. Usually this job is done by people with a main job and sometimes it happens you don't have time in a semester.


I’d imagine you look for job openings for “adjunct professor” or “instructor” at universities. You can look forward to part-time employment with no benefits and no chance for tenure (this isn’t a dig at adjunct faculty, it’s unfair how it works in the US). Depending on the field of study and school, you need anywhere from bachelors to a PhD to qualify.


You are a good teacher.

20 years ago I was in the exact situation of one of your students, i.e. I was put in front with the C language in the first semester of the first year. I barely, barely passed, failed with glory a similar course in the second semester which I only passed (with an A, to put it in US university terms) a couple of years later after I had managed to learn Python by myself in the meantime.


Thanks! I just think that you can get out of practical programming concepts (like loops, conditions) without the need to understand that a string is an array of chars and that chars are actually integers.

Because if remove the basics of programming with something like Python, you can fully concentrate on the second course on low hardware stuff, how to use memory etc..which is really important for my students, them being Electrotechnical engineers.


Yep, agree. I used a lot of assembler on C64 and Amiga until I touched so called high level programming languages for the first time. For me thinking in strings was really a weird concept.

Nowadays I find it extremely strange to think of bits and bytes when being confronted with strings.


Question: how do you teach for-loops?

That is something I have a hard time convey as a teacher. My problem is that I have done this so long that I have no idea what there is not to understand about loops ... it's such a simple thing. But my (undergrad biology) students regularly have a hard time groking the concept no matter what explanation I use.


Not OP but I'm teaching undergrad C. I'm assuming you have covered while loops before hand, if not, start there and cover the constructs with them that you would normally use a for loop for, example:

  int i = 0; //a
   
  while(i < 10) //b
  {
      printf("%d\n"); //c
      ++i; //d
  }
and introduce for loops as a special case of the while loop:

      //a        //b     //d
  for(int i = 0; i < 10; ++i)
  {
      printf("%d\n",i); //c
  }
Then outline situations when you would use a for loop over a while loop, fixed number of repetitions, use with arrays etc.


Well I took the time and made some simple animation of how that works. For that and pointers I use animation so they can visualize what's happening. (Wasn't my idea, I googled it "how to explain pointers")


thats probably because pointers are considered "hard" , too hard to exist in other languages. It's interesting that in the 80s, the standard library modeled C++ iterators with pointer semantics because they assumed everyone could do pointer arithmetic, but nowadays the concept is not mainstream at all.


But that's point right? I have two hours per week for a semester. So basically I tried to be really fast at the beginning with if else and loops, gave then an exercise that counted towards final score and then it was pointers and pointers related stuff.


I don’t think the reason for hiding pointers is because they are hard — it’s just that arbitrary pointer arithmetics are especially error prone and can be avoided in most codebases.


well hard to get right anyway. in any case newer generations of programmers are less likely to be familiar with the pattern


My partner was on a doctoral training course at Oxford and they had to learn C over a few days; string manipulation is the hardest thing she remembers doing out of any medical science crash course they studied over 2 terms


> Technically C11 has strcpy_s and strcat_s

"Theoretically" is the word you're looking for: they're part of the optional Annex K so technically you can't rely on them being available in a portable program.

And they're basically not implemented by anyone but microsoft (which created them and lobbied for their inclusion).


Microsoft doesn't actually implement Annex K! Annex K is based on MSFT's routines, but they diverged. So Annex K is portable nowhere, in addition to having largely awful APIs.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm#:~...

> Microsoft Visual Studio implements an early version of the APIs. However, the implementation is incomplete and conforms neither to C11 nor to the original TR 24731-1. For example, it doesn't provide the set_constraint_handler_s function but instead defines a _invalid_parameter_handler _set_invalid_parameter_handler(_invalid_parameter_handler) function with similar behavior but a slightly different and incompatible signature. It also doesn't define the abort_handler_s and ignore_handler_s functions, the memset_s function (which isn't part of the TR), or the RSIZE_MAX macro.The Microsoft implementation also doesn't treat overlapping source and destination sequences as runtime-constraint violations and instead has undefined behavior in such cases.

> As a result of the numerous deviations from the specification the Microsoft implementation cannot be considered conforming or portable.


I didn’t know that it was Microsoft that lobbied for them; that perplexes me since I thought Microsoft’s version of them were a bit different (for example, I think C11’s explicitly fail on overlapping inputs where Microsoft specifies undefined behavior) and because Microsoft didn’t bother supporting C99 for the longest time. (Probably still don’t, since VLA was not optional in C99, IIRC. I think Microsoft was right to avoid VLA, though.)


VLA syntax can be useful because you can cast other pointers to them - for instance you can cast int* to int[w][h] and then access it as [y][x] instead of [y*w+x].

As a bonus this crashes icc if you do it.


>As a bonus this crashes icc if you do it.

I thought this was a pretty funny thing but unfortunately when I tried this on ICC it seemed to compile just fine.

Though I am amused by one thing: the VLA version generates worse code on all compilers I've tried. Seems to validate the common refrain that VLAs tend to break optimizations. (Surely it's worse when you have an on-stack VLA though.)


This does it: https://gcc.godbolt.org/z/5fz8sM

I'm not sure if they take bug reports if you're not a customer, but this one goes back at least 8 years.


Microsoft's implementations are distinct and incompatible (and they haven't changed to be compatible with the standard versions because of backwards compatibility).


Important to note that strcpy_s doesn't truncate, it aborts your app if it fails:

> "if the destination string size dest_size is too small, the invalid parameter handler is invoked"

> "The invalid parameter handler dispatch function calls the currently assigned invalid parameter handler. By default, the invalid parameter calls _invoke_watson, which causes the application to close and generate a mini-dump."

https://docs.microsoft.com/en-us/cpp/c-runtime-library/refer...


They are also implemented by embedded compilers such as IAR and Keil.


The issue is pretending that C even has strings as a semantic concept. It just doesn't. C has sugar to obtain a contiguous block of memory storing a set number of bytes and to initialize them with values you can understand as the string you want. Then you are passing a memory address around and hoping the magic value byte is where it should be.

C is semantically so poor, I find it hard to understand why people use it for new projects today. C++ is over complicated but at least you can find a good subset of it.


C is a good language for solo projects because of it’s simplicity. By simplicity, I mean understanding what it’s doing under the hood. ‘Portable assembly’ is not an unfitting title.

Big C projects work well when they are carefully maintained (like Git).


> but at least you can find a good subset of it.

It's a constantly shifting subset, though. Moving slowly is a feature of C for some.


You can pick a subset of C++ now and it will not be worse in the future. There may be better ways to do things added but that seems a weird thing to complain about and you can just ignore it if you want.


> strings in C are actually hard,

Strings in C are more like a lie. You get a pointer to a character and the hope there is a null somewhere before you hit a memory protection wall. Or a buffer for something completely unrelated to your string.

And that's with ASCII, where a character fits inside a byte. Don't even think about UTF-8 or any other variable-length character representation.

In fairness, the moment you realize ASCII strings are a tiny subset of what a string can be, you also understand why strings are actually very complicated.


> In fairness, the moment you realize ASCII strings are a tiny subset of what a string can be, you also understand why strings are actually very complicated.

Oh absolutely, but it's a pretty reasonable expectation that any contemporary language should handle that complexity for you. The entire job of a language is to make the fundamental concepts easier to work with.


C very much does make the fundamental concepts easier to work with, it merely disagrees with you about exactly which concepts are fundamental :).


Sadly, strings are, at the same time, complicated enough to be left outside the fundamental concepts of a language, but far too useful to be left outside the fundamental concepts of a realistically viable language.


What I don’t understand is why C programmers use the built in strings. It’s like rolling your own sorting algorithm every time you need it. Surely someone could write a better string library in C that hides the complexity. The real problem is that C programmers are apparently allergic to using other people’s code.


Because most projects involve interfacing with other third-party libraries that will undoubtedly not know about this other third-party library that implements a nice string.


It can contain a to_cstring() function and problem solved.

If it uses a struct with length of string and pointer to a c-style string, even the conversion can be elided (at the price of some inflexibility/unnecessary copying while in use)


That would mean you lose the ability to do all sorts of optimizations and memory sharing. Or at least, you can do them, but then the c_string() function requires copying the data. And that also means that it's a one-way thing: you can't use the copy on something that wants to modify the string, and expect your FancyString instance to reflect the modification.


I think nearly every C programmer has gone through the phase of Oooh I'll write my own string library! Sure, that works. Except you have to call system libraries and all kinds of other external functions all of which naturally assume the conventional char arrays. So you spend a bunch of time converting back and forth until eventually realizing it's silly, just learn the convention and go with it.


There are a large number of those libraries. Every large C project eventually seems to grow its own string class.


... except for libc, which apparently which is hardly ever questioned.


>>> Surely someone could write a better string library in C that hides the complexity.

In short, it's not possible to write a nice string library in C because C simply doesn't support objects, and by extension doesn't support libraries.

Strings are a perfect example of an "object" in what is later known as object oriented programming. C doesn't have objects, it's the last mainstream language that's simply not object oriented at all, and that prevents from making things like a nice string library.

If you're curious, the closest thing you will see in the C world is something like GTK, a massive C library including string and much more (it's more known as a GUI toolkit but there are many lower level building blocks). It's an absolute nightmare to use because everything is some abuse of void pointers and structs.


What rubbish! You do not need objects to make a library. Structs, typedefs, and functions do just fine. There are even techniques in C to define abstract data types if you want!

Take another look at https://developer.gnome.org/glib/stable/glib-Strings.html#g-... . That’s all C, baby, and could be replicated in a completely independent strings-only library built on the standard library if you wished. The reasons no such library exists are ecological, not technical.


I think you mean GLib as seen here https://developer.gnome.org/glib/stable/glib-Strings.html

GLib and GTK are closely aligned parts of GNOME so they are easy to get mixed up.


Right. The string library is in glib.

There were a few big libraries in the ecosystem if I remember well, GTK, glib and another two. They're from the same origin and often mixed together.

It's been almost a decade since I dabbled into this stuff day-to-day. I think being forced to use glib is the turning point in a developer's life where you realize you simply have to move on to a more usable language.


So true


Strings have nothing to do with objects. You can write a string library, eg. [sds](https://github.com/antirez/sds). It's just not standard.


The challenge is not to write a string library, but to write a "nice" string library.

Let's say, something that's easier to use and doesn't have all the footguns of the char arrays.

The library you link doesn't come anywhere close to that. It's 99% like the standard library and it has the exact same issues.


I would love to see what you mean by "exact same issues".

sds strings contain their lengths, so operating on them you don't have to rely on null termination, which (to my knowledge as a lower-midlevel C programmer) is the most prevalent reason why people take issue with C strings.

If you mean that they're not really "strings" but byte arrays I would say that I agree, but to all intents and purposes that's what the C ecosystem considers as strings.

Keeping an API which is very similar to the standard library is also a plus, as it doesn't force developers to change the way they reason about the code.


> sds strings contain their lengths, so operating on them you don't have to rely on null termination, which (to my knowledge as a lower-midlevel C programmer) is the most prevalent reason why people take issue with C strings.

Wait, haven't I seen that idea somewhere else...?

> If you mean that they're not really "strings" but byte arrays I would say that I agree, but to all intents and purposes that's what the C ecosystem considers as strings.

Aha, strings as byte arrays but with a built-in length marker.

But yeah, Pascal is sooo outmoded and inferior to C...

Sigh.


You’re moving goalposts now. Just earlier you wrote that you couldn’t write a library in C because C does not support objects, not that you couldn’t write a nice library (for whatever definition of “nice” you want to use, which will be different from someone else’s).

In fact there are several libraries for string-like objects; the main barrier to use them is that none of them is standard. You can at least acknowledge that before talking about nice-ness, which is a whole other point.


I'm partial to https://github.com/antirez/sds these days


The only problem I have with antirez's lib is that he didn't make it into a single header library.


Is it so hard to add a single source file to your build system?

If yes, then you can do #include “sds.c“ in some random source file. In fact, that's what so-called header-only libraries in C implicitly do. shudder


A C file implies a compilation unit. For the projects I write I like to have a single compilation unit per binary (what's called a unity build). In the case of C, this doesn't bring much speed to the table, but it allows for a simpler build toolchain none the less.


strcpy is a coding challenge where I work for interviews. I typically ask them to write it as the standard version and ask them why they might not want to use it to see if they are aware of the risks. After that I ask them to modify the code to be buffer safe. And for those claiming C++ knowledge ask them to make it work for wchar_t as well to see if they can write a template. Some people really struggle with this.


If only C had followed the Pascal way to have the size with a string - so much human suffering could have been avoided!


It was considered:

> C designer Dennis Ritchie chose to follow the convention of null-termination, already established in BCPL, to avoid the limitation on the length of a string and because maintaining the count seemed, in his experience, less convenient than using a terminator.[1][2]

* https://en.wikipedia.org/wiki/Null-terminated_string#History

Richie et al had experience with the B language:

> In BCPL, the first packed byte contains the number of characters in the string; in B, there is no count and strings are terminated by a special character, which B spelled *e. This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed, in our experience, less convenient than using a terminator.

* https://www.bell-labs.com/usr/dmr/www/chist.html


> ~~Technically~~ Optionally, C11 has strcpy_s and strcat_s which fail explicitly on truncation. So if C11 is acceptable for you, that might be the a reasonable option, provided you always handle the failure case.

One of the big problems with C programmers is they often neglect to check for and handle those failure cases. Did you know that printf() can fail, and has a return value that you can check for error? (Not you, personally, but the "HN reader" you) Do you check for this error in your code? Many of the string functions will return special values on error, but I frequently see code that never checks. Unfortunately, there isn't a great way to audit your code for ignored return values with the compiler, as far as I know. GCC has -Wunused-result, but it only outputs a warning if the offending function is attributed with "warn_unused_result".

I'm not a huge fan of using return values for error checking, but we have the C library that we have.


Truncation, even if it is wrong in an application logic sense, is strictly superior to UB (and in practice, buffer overruns, which can be exploitable). That's the main benefit of strlcpy/strlcat. It is certainly possible to construct a security bug due through truncation! But it is much more common to have security bugs from uncontrolled buffer overruns.


Yeah. I just avoid str manipulations in general in C and when I have to, fuzz it ... (but still, the perf cliff is definitely new to learn in the past few days).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: