RetroBSD

2.11BSD operating system for microcontrollers
It is currently Wed Jun 03, 2020 7:02 pm

All times are UTC




Post new topic Reply to topic  [ 53 posts ]  Go to page Previous  1, 2, 3
Author Message
PostPosted: Tue Dec 15, 2015 7:38 am 
Contributor

Joined: Mon Nov 12, 2012 1:34 pm
Posts: 1092
Hi Matt,

Nice research :) :).

I has always been interesting to me who comes to defend what when a 'bug' or other deficiency is found.

Is it a bug or is it a feature?

At least in my opinion it should be obvious what happened and why. Kinda along the lines of the ADA language.

At least in my mind, this makes a very strong case for reverse assembly and compilation. So I can see clearly where the 'mistake' was made and much more importantly how to correct things so I always get reliable code in the future.

IMHO it is a pretty hard problem in the case of compilers.

Lots of fun :).

Wiz


Top
 Profile  
 
PostPosted: Tue Dec 15, 2015 9:17 am 
Contributor

Joined: Mon Apr 29, 2013 1:56 am
Posts: 196
wiz wrote:
I has always been interesting to me who comes to defend what when a 'bug' or other deficiency is found.

Is it a bug or is it a feature?


If the language isn't misused/abused (permitted by C/C++ and assembly), it's easy to tell (it's a yes! :) ). If it is misused/abused, you're bound to have lots of fun figuring out the root cause of the observed problem.

wiz wrote:
At least in my opinion it should be obvious what happened and why. Kinda along the lines of the ADA language.


C is very lax in terms of what can be constructed and written, much like the assembly language, its ancestor, but at the same time it has quite a bit of implicit behaviors and implementation-specific/defined and undefined behaviors, all of which complicate life, especially if you're new to them. Most CPU instructions have very precisely defined behavior (only a few are documented as producing undefined/unspecified results or having undefined effect under certain conditions) and aren't disastrous, except when accessing memory at wrong addresses or trying to do something without having the necessary privileges. So, at each instruction you can check with the CPU manual whether the instruction is safe. Is MIPS "addu" safe? Yes. Is "+" in "a + b" safe in C? It depends. It depends on the operand types and values, it's not about "addu" anymore, even if both operands are 32-bit integers and can indeed be added with this very instruction (there's special treatment of signed integer overflows in C, it's one of the wonderful undefined behaviors in C). Can "a + b + c" be somehow better or worse than "a + c + b"? Yes, it can, for the same reason (changing the order may introduce or eliminate overflow). There are language rules that aren't enforced but are expected to be followed nonetheless. Those rules add on top of logic, math (as you learned at school and should unlearn or relearn for floating point arithmetic) and how the CPU works. You don't follow them and things break. Not always, maybe they'll break with another compiler or on another machine.

At any rate, no compiler, ADA or not, can tell if your program is correct, even if it's known that it's not misusing or abusing the language or resulting in undefined behavior.

wiz wrote:
At least in my mind, this makes a very strong case for reverse assembly and compilation. So I can see clearly where the 'mistake' was made and much more importantly how to correct things so I always get reliable code in the future.


Not necessarily so. Like I said above, things may break immediately or much later, seemingly by themselves. The fact that your program has compiled and is apparently working right now does not guarantee by itself that you didn't make a mistake in its overall logic or that you didn't misuse/abuse the language while implementing said logic. You can disassemble this program and find no flaw in its machine code. It is only when it doesn't work as you expect it to that you can discover something interesting in the disassembly. But then what you discover these days may drive you crazy for hours or days. See, in modern C/C++ compilers it is assumed that you do not misuse/abuse the language by invoking undefined behavior. [If you invoke UB, it's your problem, you're expected to have learned UBs and ways of avoiding them.] This lets the compiler better optimize your code. Assuming there's no UB, the compiler can reason that "a + 1" isn't ever going to overflow when "a" is a signed integer and so, if you write code to check for overflow like "if (a + 1 < a) /* ¡Ay, caramba, überflow! */;", the compiler can remove the whole if statement. This removal may be somewhere in the middle of a chain of decisions that the compiler's optimizer is making and so, not only the overflow check can get removed, but something else dependent on it can get removed or otherwise changed as well. Looking at such after-aftereffects in the diassembly can be quite puzzling.

wiz wrote:
IMHO it is a pretty hard problem in the case of compilers.


However, not all is lost, even in C/C++. Modern compilers (clang, gcc, Microsoft's Visual C++) can catch many (but not all) problems at compile time, but you usually have to ask for this explicitly via compiler options. There exist tools (various "sanitizers" and such) that can instrument your compiled code and catch a bunch more problems (but again, not all) at run time. This used to be nearly impossible because of hardware limitations that imposed limitations on cleverness and usability of software (by software I mean compilers and that extra instrumentation code). Not anymore. So, while there's still no compiler to tell you of all of the problems in your code (or your head :) ), there are some very handy tools to identify many common problems, which are then easy to fix when identified. IMO, these tools should not serve as an alternative to learning the language and learning to write proper code as prescribed by the language standard/specification/definition/etc.

HTH


Top
 Profile  
 
PostPosted: Tue Dec 15, 2015 9:55 am 
Committer
User avatar

Joined: Wed Oct 10, 2012 11:01 pm
Posts: 1081
Location: Sunnyvale, CA
Hi Wiz,

The 'exit' issue is essentially a feature of linker. It would arise in any language, even in assembler. Imagine file A.s defines the symbol 'exit' as a variable, and file B.s assumes it's a function and jumps to it. The linker will not catch this bug, and produce an executable which will crash at run time.

Bugs of this kind are easy to avoid in C. Just take a habit to declare all external variables and functions in an include file, common to all C files of the program.

Regards,
--Serge


Top
 Profile  
 
PostPosted: Tue Dec 15, 2015 10:48 am 
Contributor

Joined: Mon Apr 29, 2013 1:56 am
Posts: 196
To make it entirely clear, there's some code that executes before main(). What this code does is "exit(main(argc, argv));", IOW, it passes the value returned from main() into exit().

At link time the linker pulls in the file containing this startup code and the files, where the exit and main symbols are defined and all other files that define the symbols referenced by main() and so on. If you redefine exit (as a function or a variable of your own), the linker can (and in this case it does) use your exit instead of the one defined in the standard C library. The linker is as dumb as the compiler. There are no safeguards to prevent situations like this. The C standard allows this dumbness (and many others) so as to make C available on as many systems as possible, including the slow ones and the ones with little memory, where looking for every identifiable programmer's mistake is prohibitive. We take advantage of this and implement just enough functionality for things to work when the code is correct. But when the code is grossly incorrect, as is the case, you get screwed. Sorry. You can blame it on Dennis Ritchie, if you want to. :) Though, it's too late.


Top
 Profile  
 
PostPosted: Tue Dec 15, 2015 9:15 pm 
Contributor

Joined: Mon Nov 12, 2012 1:34 pm
Posts: 1092
Hi Serge and Alex and all,

WOW! I guess I struck a nerve :). I guess it is getting towards holiday time :).

I gotta say that right now I have much less confidence that reliable programs can be written in C at all?

Somehow it reminds me of an explanation I had from someone many years ago that it was not a bug when a browser crashed with an unaligned memory error.

Alex and Serge I will study what you have written more carefully. But my take right now from what Alex has written is that it is not possible to write reliable code in C.

But let me give the matter some more study....

More fun than I expected :).

Wiz


Top
 Profile  
 
PostPosted: Tue Dec 15, 2015 9:28 pm 
Committer
User avatar

Joined: Wed Oct 10, 2012 11:01 pm
Posts: 1081
Location: Sunnyvale, CA
Well, it's not just C. The nature is so arranged that it's not possible to write reliable programs at all. :)

It's much like Gödel's theorem: any language capable of expressing computer programs cannot be both consistent and complete. :)

C language by itself is similar to a sharp axe. You need to get a skilled hand on it, otherwise it can be extremely dangerous. :)


Top
 Profile  
 
PostPosted: Tue Dec 15, 2015 9:48 pm 
Contributor

Joined: Mon Nov 12, 2012 1:34 pm
Posts: 1092
Hi Serge,

p.s.- Just read your latest comment. Thanks. I guess it really is code and pray? And I gather there is nothing I can do to make sure this type of error does not occur in future code I write ??!! And that something like this may be a currently hidden problem in the code I have just written....

Hmmm.... Lots to ponder.... I have never had these sorts of problems in all the assembly code I have written.... Maybe I just write simple programs?

--- my original response ---- edited now

Further to your comment. I don't see how a header file as Alex suggested would have changed things?

My program defined a variable that was apparently also a name used as a function defined in the c library?

I probably don't understand how things really work, but how would the presence a header file to my single C program have changed things? Rather than say including the whole header file within my single C program itself. Or does loading parts of the c library by the linker somehow know that 'exit' should not be used this way?

How do I know this won't happen again with some yet unknown to me problem word in the c library somewhere?

Is the above really correct?

Lots of fun :).

Wiz


Top
 Profile  
 
PostPosted: Wed Dec 16, 2015 12:22 am 
Committer
User avatar

Joined: Thu Oct 11, 2012 8:45 am
Posts: 1801
Location: Room 217, Floor 8, Arm 8, Wheel S7, Mars Base Alpha 3
Including the header that defines the exit() function (stdlib.h) would have (or should have) caused it to throw an error about redefinition of a symbol. You should always get into the habit of including the right headers for the functions you are using.

I always compile with -Wall and -Werror so that if a function is undefined (by not including the right header) it throws a complete wobbly and refuses to compile.

_________________
Why not visit my shop? http://majenko.co.uk/catalog
Universal IDE: http://uecide.org
"I was trying to find out if it was possible to only eat one Jaffa Cake. I had to abandon the experiment because I ran out of Jaffa Cakes".


Top
 Profile  
 
PostPosted: Wed Dec 16, 2015 9:09 am 
Contributor

Joined: Mon Apr 29, 2013 1:56 am
Posts: 196
wiz wrote:
I gotta say that right now I have much less confidence that reliable programs can be written in C at all?


It depends on how you look at it. In principle, it is possible to write reliable software (in many languages) to the extent that Serge has indicated. Namely, to the extent of your paranoia and to the point, where you say enough is enough and give up trying to detect and correct or accommodate for every single error possible and to the point of realization that you can't gracefully handle multiple failures, occurring in parallel or in an odd sequence with a variety of interdependencies. For example, you can detect that a sensor in your system is giving you unusual readings, but you can't know for sure if it's because of a real but unusual physical input, because the sensor is somehow obstructed, because the sensor is faulty or damaged, because the system is on low power or because (horrors!) the CPU or the RAM is malfunctioning and nearly nothing can be be trusted. You can add redundancy in the system to cope with some problems, but this adds complexity and changes the requirements and costs and the redundancy itself has its limitations. A great example is the File Allocation Table (FAT) file system from Microsoft in MSDOS and similar ones. The data structure, which organizes disk sectors into files and directories (essentially a set of linked lists), is duplicated, there are two parallel copies, which are supposed to always be identical, the idea being that you can restore stuff if one of the two goes bad. So, if one of the copies is damaged (worse if both), this can be detected, but you can't always be sure which copy to trust, or which parts of each copy to trust, because there are no checksums or anything of the sort in the file system itself. Oopsie! Contemporary file systems are better, but more complex. And they still aren't perfect! :)

If you set those fundamental issues aside and consider just the language, then it again depends. The language is documented and has been around for a long time and hasn't had drastic changes between its revisions since its initial standardization in 1989. In this sense the language is generally (to the collective programming community) well known and stable and there are no secrets about it. The language standard and a number of good books and articles are available and the language can be learned from them. So, again, in principle it's possible to write proper C code. In practice, however, for this to actually happen you need to be a grammar nazi of sorts, you need to know the language inside-out or use only those constructs that you know are correct. There are a lot of quirks in C that you need to know about and that may not make sense at first sight (and some make no sense at a 100's sight :) ). It is this very combination of the programmer's ignorance and C compilers not always catching their hand (you have to ask for that and you have to know that you can and you have to know how) that makes things appear harder than they really are. When you don't know the language well you can't explain why your code isn't working and why rearranging it or changing something seemingly innocuous makes a ton of difference. Some continue to live with these mysteries. Others try to reason about the stuff and more often than not surround themselves with a number of misconceptions about the language. But the right way is to learn the fncking tool. There's no magic or god's will or anything of the kind behind the way it operates. It's documented and has been verified. [Sure, there are bugs in compilers, but when something doesn't work it's 99+% chance a bug in the input to the compiler rather than in the compiler itself.]

The C standard (the ultimate language reference) is a bit dull. As any technical reference it has never been meant to divert the reader as a good poem for it has a different purpose. But like I said, there are gentler reads and intros into the language, with more examples and less lawyer-esque lingo. You've been suggested a couple of titles already, I believe. You can start from there. If you can't find something in those books, your next best shot is the standard (or stackoverflow, but before asking a question there see if something similar has been asked and answered before, and chances are it has, so search stackoverflow first :) ). At some point you should be able to read the standard, it's not too bad (C++ is much much worse). You will find yourself increasingly familiar and comfortable with it, knowing which sections of it contain what important stuff. If/when you get there, navigating C (pun semi-intended!) will become easy as you'll be able to find answers to your questions more quickly and often by yourself.

wiz wrote:
Somehow it reminds me of an explanation I had from someone many years ago that it was not a bug when a browser crashed with an unaligned memory error.


I've debugged code on a buggy CPU several times and such a crash is pretty normal in the situation. :)

But with C it's a different level of funny. The standard says that certain things result in undefined behavior at compile time, meaning that the compiler is allowed to crash or hang or whatever if it encounters such and such input. Nice, innit? And while I don't recall Microsoft's compiler crashing while compiling C code, it is notorious for crashing or stopping with an internal error while compiling C++ code. :)

wiz wrote:
Alex and Serge I will study what you have written more carefully. But my take right now from what Alex has written is that it is not possible to write reliable code in C.


You need to be careful when writing C/C++ code. Not in the sense of touching the keys gently and always being alert and on the lookout of your physical surroundings, but questioning whether you know something for sure or are wandering into an unfamiliar territory. Begin with simple things, follow the K&R or H&S order, don't use things you haven't read about yet. Don't try to be smart as you may outsmart yourself. :) Bookmark things that you haven't understood yet or think may get forgotten while being important. Bookmark odd things. Revisit them as needed and just to refresh.

wiz wrote:
But let me give the matter some more study....


Certainly.

wiz wrote:
More fun than I expected :).


Absolutely! :)


Top
 Profile  
 
PostPosted: Wed Dec 16, 2015 10:01 am 
Contributor

Joined: Mon Nov 12, 2012 1:34 pm
Posts: 1092
Hi Serge and Alex and Matt,

So if I understand what you all are saying, the basic problem is which 'header' 'needs' to be included with which [possibly built-in] library functions that your program happens to be using?

You include the wrong library or get the wrong header and all bets are off as to whether your code will have hidden bugs. If you are 'lucky' you program will have some obvious problem.

And you need to 'just learn' [by trial and error?] which those are?

As far as CPU bugs are concerned, most that I have encountered remain the same at least until the chip is layed out again. Very occasionally a given CPU has its own unique problem. As long as chips or boards can be easily swapped this has not been much of a problem.

Somewhere I have around one of the earliest 6502s which did not have a ROR instruction. It was added in a panic soon after its real need was discovered (division speed).

But the C language problem that we are talking about is certainly much more common. No wonder Bill Gates came to the conclusion that 'all code has bugs'. Really just a language design problem.

Unless I have missed something you all are saying, the nature of the problem is starting to make sense to me.

So thank you all for your comments and help!

Lots more fun :).

Wiz


Top
 Profile  
 
PostPosted: Wed Dec 16, 2015 10:09 am 
Contributor

Joined: Mon Apr 29, 2013 1:56 am
Posts: 196
wiz wrote:
Maybe I just write simple programs?


Yep, do so. And for the purpose of learning the language, please do it not on RetroBSD with Smaller C, but on your PC with gcc or clang (and use the options to show you compilation warnings about your code, it's been mentioned).

Smaller C isn't quite an educational tool to help with learning the language.

wiz wrote:
I don't see how a header file as Alex suggested would have changed things?


It wasn't my advice. And I'm not very sure it's a very sound one. While on one hand the compiler should bark at conflicting declarations and definitions of the same thing (and at other violations), to be really sure there are no such naming collisions you'd need to include every standard and system header file. Which is kind of stupid and pointless, you should include only what you use. Think of it, just to do that you'd need to know about all these files. A better way may be finding a reference of reserved identifiers (macros and functions) and using that. [I may look in what I have to help with such a list.]

The other problem is that you can't expect Smaller C do that. At the moment it does not check for conflicts in names. It turns out that it's easier not to check than to check. The reason lies not only in having to write code for the checks, but also in that if you get them wrong (or just incomplete), you'll prevent correct code from compiling. But if you don't check correct code, it will compile. :) I'm sorry, but that's how things are with Smaller C right now. I may fix this eventually, but it's not in the plans for the near future. I'm myself amazed that gcc can compile Smaller C to fit into 96KB of RetroBSD's user RAM and it works. Fortunately, there aren't too many important deviations from the language (as defined in the standard) in Smaller C.

wiz wrote:
My program defined a variable that was apparently also a name used as a function defined in the c library?


Yes.

wiz wrote:
I probably don't understand how things really work, but how would the presence a header file to my single C program have changed things?


The compiler (not Smaller C or not now) would see in some order the two following lines:

Code:
int exit;
void exit(int);


And it would have to attempt to memorize both for two reasons:
  • When they are referenced further in the code, the compiler would know they exist and what they are/how to "use" them
  • When they are misused or incorrectly redeclared/redefined, the compiler can catch that

So, even if the compiler does not treat exit specially (by attaching a specific meaning to this symbol like: this symbol is in the library, it must be of such and such type and be never redeclared differently and never redefined anywhere else), it will still see a conflict in the type, which it must issue an error for, regardless of exit being special in other ways.

IOW, you say "Hey, compila, I promise you exit as an int" and the compiler says back "OK". And then you give it exit, which is a function returning nothing, and the compiler goes "WTF, dude, you just promised me an int!". Smaller C has amnesia here. It always says "OK". :)

wiz wrote:
Rather than say including the whole header file within my single C program itself. Or does loading parts of the c library by the linker somehow know that 'exit' should not be used this way?


I've told you already that there's code that gets executed prior to main(). This code does some things (e.g. initializes .bss, sets the gp register, flosses its teeth), calls main(), takes the value returned by main() and calls exit() with this value. This code is in src/startup-mips/crt0.c. exit() is in src/libc/stdio/exit.c. Your main() is in a third file. All three are compiled into object files. The linker first pulls crt0.o, because it contains the start symbol (and the start code) of the program. Then it looks into crt0.o further and sees that it wants symbols main and exit. And it looks for object files that contain them and goes over the object files in an order. And so, if it finds both main and exit in your object file (made from the source code you wrote), it won't be looking for any other main or exit, 'cause there can't be two of each in a single program, so why bother. And so you get a variable in place of a subroutine. The variable occupies memory alright, has a size and some byte values inside. And you can execute those as code. Just as you can view code as a sequence of data bytes. But obviously not all data is good code and vice versa and when you confuse code with data things start to look, sound and smell funny. Lots of fun.

wiz wrote:
How do I know this won't happen again with some yet unknown to me problem word in the c library somewhere?


You kind of get a list (the C standard has most of it (most because there are functions outside of the C standard but are restricted as well)).

You know the "Seven Words You Can't Say On TV", don't you:

:)

wiz wrote:
Is the above really correct?


Yes (a bit simplified at times, but still yes).


Top
 Profile  
 
PostPosted: Wed Dec 16, 2015 10:35 am 
Contributor

Joined: Mon Apr 29, 2013 1:56 am
Posts: 196
wiz wrote:
So if I understand what you all are saying, the basic problem is which 'header' 'needs' to be included with which [possibly built-in] library functions that your program happens to be using?


Yes, you should know what comes from where and go there and not elsewhere. Again, it's documented. The C standard and decent books teaching C will tell you to #include <stdlib.h> for exit() just like they tell you to #include <stdio.h> for printf().

wiz wrote:
You include the wrong library or get the wrong header and all bets are off as to whether your code will have hidden bugs. If you are 'lucky' you program will have some obvious problem.


That is a very common outcome, yes. If the header is non-existent, you get an obvious error right away. If the header exists and there's a library for it, but you only include the header and forget to link the corresponding library, you also get an obvious error, almost immediately. But if you mix and match things you shouldn't be mixing and matching (like redefining exit), all bets are off. If you don't like UB (undefined behavior), you can use ABAO (all bets are off :) ).

wiz wrote:
And you need to 'just learn' [by trial and error?] which those are?


While I can't guarantee you will never enter the holy waters of trial and error, as by now you already have, I'd discourage you from making that the main learning strategy or methodology. Like I said, many/most things are documented.

wiz wrote:
But the C language problem that we are talking about is certainly much more common. No wonder Bill Gates came to the conclusion that 'all code has bugs'. Really just a language design problem.


It was a reasonable trade-off back in the day. And it still sometimes is. But now we have more choice and there are better languages than C (perhaps, not better in every and all aspects, but in many), all thanks to C that made it happen.

wiz wrote:
Unless I have missed something you all are saying, the nature of the problem is starting to make sense to me.

So thank you all for your comments and help!


You're welcome.

There's a document by Dennis Ritchie, which describes the early days of and design decisions in C, which explain why it's so weird (English is weird too and it too had a tough childhood:) ). I don't know how much of it you can understand and appreciate now, but I think it may help understand the nature of the language. It's a 30 page pdf file titled

C Reference Manual
Dennis M. Ritchie
Bell Telephone Laboratories
Murray Hill, New Jersey 07974

It used to be hosted on DMR's page, where he worked, but the page's gone now. I can't believe people would remove it so quickly, perhaps less than in 5 years after DMR's death. So much for cheap storage and clouds. But I digress.

There are other enlightening reads, but I won't quote them now. You've got enough to read for xmas and for some time beyond. :)


Top
 Profile  
 
PostPosted: Wed Dec 16, 2015 3:17 pm 
Contributor

Joined: Mon Nov 12, 2012 1:34 pm
Posts: 1092
Hi Alex and all,

Thanks :). Again it will take me a little bit to carefully read and re-read what you have written.

Another observation is also making sense. To get my debugger to work correctly I copied the library routine xtoa and some other routines into my code. Then things worked.

Apparently I was [somehow] calling whatever incorrectly and it was not working. When I copied the library C code into my debugger I could see what it was doing. So i easily got things working correctly. The buffer bug [really a kernel problem] and the other bugs took a little longer to figure out.

My PIC32 comes to life in my own code including my debugger. I start RetroBSD by calling 0x9d000000. This has always worked without a problem. Now I appear to have my debugger working as a PIC32 application [maybe?]. And have learned a bit more about C.

Right now I am guessing that I cannot load programs into user RAM to test them since they may well be overwritten as swapping takes place? I guess I should try that. Programs do load OK into upper kernel RAM and run fine that way. So that is what I am now doing. In my system, kernel RAM is clear from 0x80006100 up to the kernel stack regions.

Programming in our RetroBSD system is quick and easy.

I wonder how much smaller my debugger would be written in MIPS assembly? Probably a question I 'should' answer some day :).

My first 6502 debugger ran in under 255 bytes.

It continues to amaze me that a 'real' browser somehow uses up 1/2 GB of RAM. Shameful.

Right now I am using Dillo. Works fine. Fast. Ignores the Google Giggle.

Lots of fun :).

Wiz


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 53 posts ]  Go to page Previous  1, 2, 3

All times are UTC


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron




Powered by phpBB® Forum Software © phpBB Group

BSD Daemon used with permission