Today's Soapbox

Where Programmers are Born

It's been quite a while since my last soapbox. Don't you feel drained? More accurately, I'm the one who should feel drained. Well, I decided I'd have a little fun and put in another entry.

This is one topic that suddenly came to mind. What does a guy do to get into programming, or at least, become halfway capable. In my own case, I just grew more and more interested in little subtopics, but that doesn't always happen. Pretty often, you see people who program to make their own little handy-dandy tools for entirely separate purposes. But in the end, you still need to have a clue what you're doing. This puts you on a completely separate level from the computer-illiterates who probably do little to delve outside of Word and Internet Exploder.

The average programmer in the industry is nothing like me. I was taken by programming. I can't imagine doing without it. To the average programmer or IT guy, programming is a route to a paycheck. It's their own business if they want to be that way, and I can respect the person if I knew enough about them, but I can't respect that attitude. What's annoying about this is the fact that people look from day one into a field that can make them money without first considering if they've even found their life's passion. And supposedly, Computer Science is the easiest road, since there is an indefinite demand for programmers, and you can get a job with a 4-year degree. Sure, you could make a lot more in Law or Medicine, but in the earliest stages of life, nobody wants to go for some 6-10 years of college. What irks me even more are the people who go into CS/CE/EE for their undergrad with the immediate intention of following up to an MBA in grad school. It's one thing if you're in the industry for so long and suddenly decide, "hmmm... I don't think I want to stare at a screen from 9 to 5 for the rest of my life." It's entirely another thing if you go right into an engineering curriculum and look upon it from day one as a stepping stone that isn't worthy of any respect.

The person who is only interested in computers as a money-maker goes to school and makes sure to take classes on databases, software engineering, and networking/admin. A person like me will take courses on graphics, information theory, numerical analysis, digital signal processing, and other things. The money-minded programmer goes out and gets certifications like Oracle, MCSE, Cisco, etc. A guy like me goes out and does research and publishes papers, and constantly studies and learns new things. More than that, he tries to hide how much of a nuisance everyone else is who can't seem to respect that idea. Most of all, and most ignorant of all, the gold-digging programmer, as well as 99% of the computer-illiterate population of the world, does not see programming as a creative process.

Here I am to offer my views from one who thinks money-grubbers of the world are to be dragged out and shot.

Start with BASIC.

As the name suggests, this is a pretty good starting point. Beginners' All-purpose Symbolic Instruction Code... Well, I don't know if I'd go as far as to say "all-purpose." Why do I suggest starting with BASIC? First of all, it's an imperative language. When you get right down to it, an imperative language is important for learning to code because that is fundamentally how your computer runs. An instruction goes through the pipeline, and it produces an absolute change based on what everything was before and what the instruction is. If you run that same instruction from the same starting state, you'll get the same result every time. Machine code itself is an imperative language. Imperative programming is basically characterized by the idea of a state (that is, the content of information and what it holds) and commands that modify (or use) that state.

Second reason I suggest BASIC to start off, is because it has several things that make life easy for a new programmer. A seasoned programmer often has a hard time remembering how hard it was to to learn to program in the first place. And so we can't really accept that it is actually difficult. Especially not with something so ridiculously limited and low-grade as BASIC. BASIC is easy partially because so many commands are so simple with simple syntax. At the same time, it introduces the very concept of syntactical correctness. Another thing (if we look back to GW-BASIC), is the line numbers. Remember that imperative programming shows dependency on order, and having line numbers is a nice illustration of order. Of course, having goto's and gosub's kind of changes things, but that's another matter. Most important of all, that really makes BASIC easy is the absence of concepts like scope, dynamic memory allocation, pointers, etc. A beginner would not be able to appreciate these concepts without first having an idea of programming. Somewhere down the line, a newbie who has gotten the hang of BASIC will think, "Damn, if only I could have a list that grew in size as I needed it to."

If someone can't really get the hang of BASIC, that probably indicates that programming is not for them. On the other hand, someone who starts at C++ and can't seem to grasp templates or AVL trees on their earliest ventures doesn't necessarily have any disabling problem. The thing is that learning such matters requires some preconceived notions and some prior motivation. How does that exist if the person had no clue how to program in the first place?

Start with Pascal

Well, Pascal is another option for the first language which I also happen to advocate. Pascal often proves to be a great learning tool. First and most important is that it can teach you about structured programming and keeping things in order. One of the annoying things in BASIC is that, say you want to have a little subtask that you keep coming back to. If you want to maintain a regular, linear order to your code in BASIC, you've got to keep copying that code in place wherever you need it. The only other option is goto's and gosub's, but that destroys linearity which is rather meant to be reinforced by line numbers. In the case of a structured language, you've divided things into separated tasks. In the actual machine code, this is implemented just the same as a goto/gosub (albeit with a different addressing mode). And that separation and division of tasks is part of what programming is about. We don't have a Do-What-I-Mean language, so we have to break things down until we can think of things in specific instructions and operations that need be done. And when subtasks can be separated into procedures/functions, it actually makes for less work.

Another lovely thing about Pascal is its appearance. It looks so much like just plain pseudocode that it's very readable. On top of that, it also has plenty of power as compared to BASIC. There's no need to rush into pointers and objects, but so long as Pascal has them, that means you can eventually learn these concepts without having to learn a whole new language. In theory, this is still really possible with C, but with C, you have a far more bizarre syntactic construction and a far more context-sensitive grammar. Try and imagine a line of code like " r = *q <= 1.0 ? p : p**q; " In plenty of so-called "tutorials" on C, you'll probably see things like that. Can you imagine how bizzare that might look to a new programmer? And yet, the books by Kernighan and Ritchie, the very creators of the language, will tell you that's the best way to write C, because it's so compact. In that spirit, I also don't know if I'd say Stroustrup touts the very best model of C++ code, but at the very least, he has some very ingenious tricks here and there that could only have come from the fellow who created the language.

Prior to actually owning my own legal copy of Visual C++, I was forced to live in Pascal long beyond its useful life (although the use of inline assembler code certainly extended that life). However, even before Pascal's useful life had died, I was often on the Pascal side of the Pascal vs. C holy wars on the newsgroups. The single argument that I had that nobody could fight was that almost every complaint about Pascal was about the compilers available, and not about the language itself. Granted, they were very valid reasons not to use Pascal in the real world, but my major point was that they were not reasons to condemn it. Problem is that everything in the newsgroup world tends to turn into holy wars and it all becomes a crusade to entertain one's own ego. I've fallen into that trap just as much as anybody else. Anyone who's regularly posted to coding newsgroups would be lying if they said otherwise.

Pascal, you have to realize was borne out of ALGOL, which was primarily meant to be a highly-structured language for scientific computing. The point of Pascal was never to make fast or tight code. It was meant to make an easier and cleaner means to write larger scale applications as compared to Fortran and the like. While Pascal as a language in the professional and academian world is quite dead, it's still a good learning tool. As far as I'm concerned, the only thing that killed Pascal was history. People were writing good quality C and C++ compilers, while at the time, the very best Pascal compilers still produced 16-bit code. There were of course, the outside examples like FPK Pascal, Stonybrook Pascal, and so on, but they were buggy and done by random small groups and so they only found favour with enthusiasts. TMT Pascal and Delphi, however, are very much usable today, and as Pascal compilers, they are very good learning tools, but as for the business world, they became powerful and stable far too late to overthrow C++.

Real programmers are self-taught

One of the things that makes self-teaching of programming a difficult thing is theory. It's very easy on your own to find information on coding. It's not so easy on your own to find information on computer science. I grew up in a house of nothing but engineers, so for me personally, it was not that hard. Moreover, when you read academic publications and so on, they are almost always pure theory rather than code. However, they're of no use to someone who doesn't already have the knowledge to follow the topic. Which still leaves newbies dead in the water.

Academia is at the opposite extreme. It extends the theory far beyond the realistic or the ever-usable. And in almost all baccalaureate programs, excluding the trade schools like DeVry or FullSail, that level of theory is a requirement. I can't say I'll throw a party the day that someone proves that P=NP. If you can pick up a good amount of theory by yourself, then you basically don't need higher education for more than a piece of paper. If your entire knowledge of programming came from hacking about and looking through tutorials without any theory or mathematics to back them up, you're basically fit to be an underpaid, overworked whipping boy for the work-a-day world. If your entire knowledge of programming comes from classes and textbooks, you're fit only to be an underpaid, overworked whipping boy for academia. If you managed to learn a whole lot of theory AND practice and you're completely self-taught without any higher education, you're basically ideal for everything, but no one will appreciate that initially because academia gets 10 times the respect it deserves. More importantly for an employer, academia is more like a guarantee of minimum specs. With a talented and knowledgable self-taught programmer, it's a gamble. In short, a self-taught code+CS-god will be an underpaid, overworked whipping boy with a bright future. Someone who is largely self-taught in coding but still went to school for CS still has a future to speak of. This person will have the same future potential as the "self-taught code+CS-god," but because he got the piece of paper, he is less of a gamble, so he will not start his future from the dungeons. And then of course, there's the self-taught code+CS-god, who has no use for higher education, but went anyway. And this person is no different from a pure coder/hacker who went to school.

The basic rule I'm getting at here is that self-teaching is more important than anything else when learning to program. There are plenty of things floating around on the web that will NEVER, and I do mean "NEVER", be taught in school. But no matter how much you've taught yourself, that knowledge will never be appreciated by those who don't know you. It takes a long time for anyone to realize that you know anything, and that kind of time is not something the employing world is ever going to be willing to invest in a new hire. Regardless of how much you care about the art of programming over salary, you still have to pay bills.

The other side of the self-teaching coin is experience. Programming is one field where a mere 3-4 years of experience is worth more than any degree. In some industries, in fact, it can be as little as 2 years. But of course, you actually have to GET a job before you can have experience, so it never really adds up to an easy road. One of the things I'm glad I did in my self-teaching road was being part of a demo group. The value was not so much that we learned |\/|(-|d 1337 5ki11z, though that benefit did exist. The value was working in a team. When working in a team, you are more forced to write clean, neat code that others can read and build upon. Lately, the demo scene seems to be more filled with loners who are able to tie their code together for little more reason than a common API, and the sort of synergy that was once there has dwindled. A seasoned real programmer can point to anyone in academia and say, "I have a real world answer for all your pointless insignificat ramblings." And at the same time, point to any hacker and say, "I have clean, re-usable, modifiable, scalable code that does what your spaghetti-code does."

Assembler is Not Crap

For the past ten years, I have been carrying around the name CPIasminc or C.P.I. of ASMiNC in my demoscene days. And as you can probably guess, the heading above is what ASMiNC means. Anyone who says that assembler is obsolete and not worth learning is a beginner... a beginner forever if you dare to hold true to that. I don't recommend newbies to jump into assembly, but no programmer should ever be permitted to go on without learning it.

The biggest, most important, and most overlooked benefit of assembler is that it gives you a very deep understanding of how your machine actually works. Assembler, you have to realize is basically machine code with names applied to everything. Although, that hides the fact that a lot of asm compilers support certain directives for various purposes and have pseudo-instructions and common macros and things. The point is, you're looking at the lowest-level language short of trying to write everything in 0's and 1's.

The second big thing with assembler is your compiler. Most compilers of today produce amazingly well-optimized code... up to a point. Making use of the parallel pipes and loop unrolling and inlining and adjusting call conventions for special cases, etc. are all done very well by the compilers of today. What they are NOT good at is SIMD optimizations. Now Intel's compiler and codeplay's VectorC are trying to change that, but even they are not perfect. One of the things that makes humans so much better at it is that they can look at code and understand the meaning, whereas a compiler is not going to end up looking at very large blocks of code. A single loop is pretty meaningless. And because of that ability to understand large quantities of code, we can analyze the operations and think of other ways to do them. Moreover, even the Intel and VectorC compilers do not end up catching 100% of the possibilities. In real life, when you want SIMD optimizations, it's best to ignore the majority of the code and stick to applying them on the most critical functions that are either major bottlenecks or frequently used.

Thirdly, saying assembler is dead leaves out all the little guys. That is, things like game consoles. Most of the code there is going to be assembler. Writing firmware for some small-scale embedded platform... all code is Assembler. Code for all that software that sits in the ROM/Flash on your cellphone. That's going to be straight C initially with touches of assembler, and then later optimized within 100% assembler so that the code resides in less memory. Writing DSP platform tools, assembler... writing drivers, assembler... writing BIOS firmware, assembler. And the thing is that these are mostly all entirely different instruction sets, too. So it pays to not just know assembler of a particular architecture, but to know it inside and out and have a feel for it. Once you've understood an architecture to that extent, the transition to a different platform is much easier.

Also, never be a dogmatic. I've seen too many h4x0r5 who have gotten a good hand on straight x86 and x87 ASM. The moment they look at MIPS code, though, they react... "whoa, no machine code at all! Well, it's kinda like assembler, but weirdly different." *sigh~~~~* Lesson to you morons, MIPS IS assembler. It's a different Instruction Set for a different platform! Assembler does not mean IA-32!!!

Other types of languages

Y'know, I don't mind when there are 25 or 30 different programming languages for a couple of different functions. But one of the things that really bothers me is the fact that there are thousands just for ONE particular class of tasks! It's probably a better thing to support a language than not to support, since obviously, somebody's got to use it. It just gets annoying when there's a new language every week that's touted to be the end-all be-all of programming. Ruby was one such example... after the initial excitement, I can't recall hearing a word about it.

Well, as I said, an imperative language is characterized by a state and commands that alter the state. There are the two other main types of languages -- functional and logical. Frankly, I think ALL beginners should start in imperative languages. One, the general flow of activity is very predictable and straightforward in an imperative language. Everything happens exactly as you've said it to happen and in that exact order. Two, machine code is imperative, and so you gain a deeper understanding of the nature of the CPU. Other forms of languages will appear more to gloss things over and hide what is actually being produced. Say I wrote a neural network library in Scheme... the output code could be multi-threaded without my knowing about it. And so I end up not getting some information about what I've written and why it's performing somewhat differently than I expected. If I were to do the same in C, I'd have to initialize and enable threads myself, so I'd know why my code performs the way it does. Three, imperative languages are easier to read and write on the whole when you're doing large-scale tasks and tools. Just try looking at some Scheme code for GUI manipulation and imagine what DirectX or OpenGL bindings for Scheme might look like. Four, the real world is dominated by C++ and not Scheme or Prolog. The question of whether or not you'll ever need them later is quite simply always going to have the same answer -- "maybe"

A functional language is characterized by values and functional forms. By values, I refer not to stored information, but to information that could come through to a function at any time in any form. And by functional forms, I mean that a function does not act on data and then return back to the called location and give out a value... instead the function represents expressions that act on arbitrary data for which the expressions are applicable which become mixed with other expressions. In the end, an entire program is reduced to a single expression that is executed by the evaluation of that one expression.

For an example (taken from an advocacy page on Haskell) --

qsort []     = []
qsort (x:xs) = qsort elts_lt_x ++ [x] ++ qsort elts_greq_x
                 where
                   elts_lt_x   = [y | y <- xs, y < x]
                   elts_greq_x = [y | y <- xs, y >= x]

What you see above is the quicksort algorithm implemented in Haskell.
Below is the same algorithm implemented in C.

qsort( a, lo, hi ) int a[], hi, lo;
{
  int h, l, p, t;

  if (lo < hi) {
    l = lo;
    h = hi;
    p = a[hi];

    do {
      while ((l < h) && (a[l] <= p))
          l = l+1;
      while ((h > l) && (a[h] >= p))
          h = h-1;
      if (l < h) {
          t = a[l];
          a[l] = a[h];
          a[h] = t;
      }
    } while (l < h);

    t = a[l];
    a[l] = a[hi];
    a[hi] = t;

    qsort( a, lo, l-1 );
    qsort( a, l+1, hi );
  }
}

Well, the C one probably looks a lot worse, doesn't it? 4 lines of Haskell code doing the same task.
One of the nice things about the Haskell code is that it really gives you a better idea of how the quicksort algorithm itself works. But the really good thing about the C code as opposed to the Haskell code is that it does the sort IN PLACE. The Haskell code keeps allocating lists and joining lists. The C code will clearly run faster and save a lot of memory and prevent fragmentation of memory. On the other hand, the C code has a certain weakness... It is strongly typed. It can only sort integers. I'd have to rewrite that code again to do anything with floats or structs. This is largely why the built-in quicksort in C has you pass in types as well as a pointer to a function that is used for comparison. The Haskell version is unbound on types. It can sort lists of anything. It can even sort lists of lists. Anything on which you can use comparison operators. Another thing that is nice about the Haskell version is that its crash-proof. The Haskell version is still strongly typed in that you cannot use non-comparable types. e.g., you can't sort booleans or pointers. Also, Haskell makes it impossible for you to typecast, so you can't substitute a pointer as a 32-bit integer like you could do in C/C++.

Anyway, for anyone who is accustomed to imperative languages, the transition to functional languages is quite painful. And I personally still can't stand most of them. What I showed you was a single task, a single routine. Doing data structure management in a functional language is always very clean. But dealing with things like files or GUI management in a functional language makes for a really hideous mess. Especially since most API's out there are C or C++ bindings. We've all heard about Microsoft's .NET which would theoretically allow us to mix our languages very seamlessly, but one little problem is that everything goes through with objects. This is fine for C++ or Java binding, but something like Haskell doesn't have objects, and so the language itself has to be slightly mutated for the sake of the .NET platform. How seamless is that?

In my days in lower-level school, the first functional language that people got introduced to was Logo. We all remember Logo, right? That slow-ass Apple II turtlegraphics interpreter? But that was relatively painless because it was limited in its function. Similarly, the worlds most common functional language, HTML, is also relatively painless because it's limited to serve a certain set of tasks. In the case of HTML, since you're describing flow control of a document, an interpreter is rather simple because you cannot reduce the expressions any further than they already are. One day, someone might develop such a thing as "algebra of typesets," but until then, HTML will be a linearly interpreted functional language that assumes that all expressions are pre-reduced. But HTML is simple because of that, as well. On the other hand, something like Haskell or lisp is made to be general purpose with a flair for a certain class of tasks. They are made to be languages in which one might write an executable. As a result, they are far more broken and far more prone to signs that these can't even be called code.

The third class of languages is logical languages. These are characterized by the basis of relationships and inferences. In the case of an imperative language, we'd define squaring a number as y = x * x. In a functional language, we'd have a function that is defined as a mapping from number --> number and that function holds the operation x * x being mapped to a number y. In a logical language, we'd generate an infinite set of pairs (note that while it's infinite, it is only represented, not actually stored) (X, Y) such that X is some number and Y is X * X.

Again, I've lived in imperative languages for so very long and my brainstorming is always in the form of Assembler and C++ and C (in that order)... more than anything else. So it's very hard for me to accept other styles of programming. Also, in the process of my learning languages like Scheme and Prolog, most of that was for individual classes in college... all of which, had the most amazingly poor examples of teaching I've ever known. Early on, I'd often vent my frustrations at the pitiful example of "higher education" that was displayed before me at the class material. As that passed, I just found that logical languages like Prolog were just so utterly foreign to me, and I could never develop any liking for them for any functionality that I might even give a damn about. As a strongly imperative programmer, I think in terms of pure mathematics and not abstractions. I think in terms of actions producing results rather than relationships being determined and scanning of sets. So to me, logical languages are like drifting dinghies in the middle of the ocean carried along by currents. Functional languages, I see as high-class yachts with plenty of amenities but with a poor capacity for passengers, and has only its single sail as a means of power. Imperative languages, I see as gigantic oceanliners that offer the most enjoyable cruise vacations conceivable for any person, except that this liner is powered by a nuclear reactor using core fuel of weapons-grade purity. [If you don't get it, just think, bad]

You can argue with me all you like, but I personally regret ever having learned anything about Prolog and Scheme today. They have their places, especially in AI and all... but I just don't like them. If I were completely against their very existence, though, I probably would have just kept my mouth shut about it and not let the newbie know anything about them. All programming languages still technically suck very much, but I can't hate the languages that give me direct power and control and more understandable flow control... at least not as much as I hate other languages.

What is a programmer?

One of the biggest things a programmer is, is a mathematician. Everything you do boils down to computations and methodologies and ways to solve things. The majority of tasks to deal with involve computations of various systems of equations and moving results along to another place. Some of the mathematics is quite on the order of traditional "ugly math." Other tasks will be far more abstract. Something like writing a compiler is still very much mathematics, but it's a purely abstract mathematics involving grammar rules and their expressivity. Even a largely memory intensive task is still one that could require unique mathematical tricks to achieve good performance. Note, though, that the thinking of a computer scientist is not quite identical to that of a pure mathematician. A computer scientist is able to deliver shortcuts. We do not think of 3520 as 11 * 320... We think (11 * 256) + (11 * 64). When a mathematician thinks of quaternions, they think in terms of the algebraic side. A computer scientist will know the algebraic side, but also see it as a data structure that holds four floating point members, which can in turn be used to store the same parameters that are used for the computations themselves. A pure mathematician works to get things reduced to equations. A computer scientist reduces things to algorithms and equivalencies.

This gets me into the second big thing a programmer is. A problem-solver. No matter what it is, you are coming up with ways to solve a problem. You're coming up with a way to do a task. Even if it's a simple thing like "I want this list sorted," you will still be making the decision of what sort of sort algorithm would be best suited to the problem. You probably won't be coming up with your own algorithm to sort, but you are still solving a problem with a priori knowledge. And anything that still looks really difficult to solve involves a programmer who can come up with ingenious ways to deal with it or at least come up with an approximate solution. And it won't just be task methodologies. There will also be performance problems or efficiency problems to solve. And along the way, you'll hit many bumps, and you'll have to deal with them, too.

And those bumps come in spades. We've all heard the phrase that "90% of programming is debugging." This means another role for the programmer. As a detective. All those bugs have to sniffed out and you have to search for sources of the problem, and that takes up a good chunk of your time. And this is not easy. It involves more intuition than anything else. The tools referred to as "debugging" tools are simply there to give you hints and clues as to what's going wrong. And from that, you piece together what's happening. Moreover, there are always cases where they cannot help you at all, and then it boils down to your understanding of how the code functions.

That leaves one big thing that people always overlook. A programmer is an artist. This is not all about some down and dirty math and data manipulation and algorithms that are apparently bane to the layman. There is a degree to which you are doing things by feel. There is a degree to which you are really trying to come up with something ingenious. There is a degree to which you're trying to make things clean and accessible for others. Every so often, a true programmer works his way down and has an "Aha!" moment, and his creative juices are suddenly flowing. The computer-illiterates of the world see programmers as guys who make the machine do what its supposed to do, and they're simply following the instructions of their bosses. At no point do they see the fact that a programmer is not just some robot who's earning a living. And similarly, a money-grubbing programmer who sees computing only as a road to make money will not appreciate the idea of programming because you want to. Someone who resides in databases day in and day out only to produce an interface is much less likely to see the side where people are being creative in their coding and really making computer programming into an art.

Those who make an art of it also have another aspect. A real programmer, just like any other scientist, is a hopeless romantic. There is a certain way of thinking that makes something in the world of computing interesting to someone. Only those people can actually look at an idea and see it unfold. Only a true romantic can think out an algorithm and picture every last detail of its workings without writing a single line of code. Most importantly, this is the kind of person who really enjoys his work. And that is exactly how I see the meaning of doing something for a living. How easily can you say you are living if you don't even see your life's work as a part of life? If you're just there to earn bucks, then what meaning does it have? It's little more than an accessory that keeps your bills paid. It's a rather pointless existence when something you do everyday is carrying weight only in your wallet and not in your heart.

View the Soapbox archives .
Back to my home page .