Category: Algorithms

Ad infinitum

Quality is fundamental in any job, and software is no exception. Although fairly good software is relatively easy to do, really good software is an art that few can truly reach.

While in some places you see a complete lack of understanding about the minimal standards of software development, in others you see it in excess. It’s no good either. In the end, as we all know, the only thing that prevails is common sense. Quality management, all sorts of tests and refactoring is fundamental to the agile development, but being agile doesn’t mean being time-proof.

One might argue that, if you keep on refactoring your code, one day it’ll be perfect. That if you have unit tests, regression tests, usability test (and they’re also being constantly refactored), you won’t be able to revive old bugs. That if you have a team always testing your programs, building a huge infrastructure to assure everything is user proof, users will never get a product they can’t handle. It won’t happen.

It’s like general relativity, the more speed you get, the heavier you become and it gets more difficult to get more speed. Unlike physics, though, there is a clear ceiling to your growth curve, from where you fall rather than stabilize. It’s the moment when you have to let go, take out what you’ve learned and start all over again, probably making the same mistakes and certainly making new ones.

Cost

It’s all about cost analysis. It’s not just money, it’s also about time, passion, hobbies. It’s about what you’re going to show your children when they grow up. You don’t have much time (they grow pretty fast!), so you need to be efficient.

Being efficient is quite different on achieving the best quality possible, and being efficient locally can also be very deceiving. Hacking your way through every problem, unworried about the near future is one way of screwing up things pretty badly, but being agile can lead you to the same places, just over prettier roads.

When the team is bigger than one person, you can’t possibly know everything that is going on. You trust other peoples judgements, you understand things differently and you end up assuming too much about some of the things. Those little things add up to the amount of tests and refactoring you have to run for each and every little change and your system will indubitably cripple up to a halt.

Time

For some, time is money. For me, it’s much more than that. I won’t have time to do everything I want, so I better choose wisely putting all correct weights on the things I love or must do. We’re not alone, nor is all we do for ourselves, so it’s pretty clear that we all want our things to last.

Time, for software, is not a trivial concept. Some good software don’t even get the chance while some really bad things are still being massively used. Take the OS/2 vs. Windows case. But also some good software (or algorithms or protocols) have proven to be much more valuable and stable than anyone ever predicted. Take the IPv4 networking and the Unix operating system (with new clothes nowadays) as examples.

We desperately need to move to IPv6 but there’s a big fear. Some people are advocating for decades now that Unix is already decades deprecated and still it’s by far the best operating system we have available today. Is it really necessary to deprecate Unix? Is hardware really ready to take the best out of micro-kernel written in functional programming languages?

For how long does a software lives, then?

It depends on so many things that it’s impossible to answer that question, but there are some general rules:

  • Is it well written enough to be easy to enhance to users’ request? AND
  • Is it stable enough that won’t drive people away due to constant errors? AND
  • Does it really makes the difference to people’s lives? AND
  • Are people constantly being reminded that your software exists (both intentionally and unintentionally)? AND
  • Isn’t there something else much better? AND
  • Is the community big enough to make migration difficult?

If you answered no to two or more questions, be sure to review your strategy, you might already be loosing users.

There is another path you might find your answers:

  • Is the code so bad that no one (not even its creator) understand it anymore? OR
  • The dependency chain is so unbearably complicated, recursive and fails (or works) sporadically? OR
  • The creator left the company/group and won’t give a blimp to answer your emails? OR
  • You’re relying on closed-source/proprietary libraries/programs/operating systems, or they have no support anymore? OR
  • Your library or operating system has no support anymore?

If you answered yes to two or more questions, be sure to review your strategy, you might already be on a one-way dead-end.

Ad infinitum

One thing is for sure, the only thing that is really unlimited is stupidity. There are some things that are infinite, but limited. Take a sphere, you can walk on a great circle until the end of all universes and you won’t reach the end, but the sphere is limited in radius, thus, size. Things are, ultimately, limited in the number of dimensions they’re unlimited.

Stupidity in unlimitedly unlimited. If the universe really has 10 dimensions, stupidity has 11. Or more. The only thing that will endure, when the last creature of the last planet of the last galaxy is alive is his/her own stupidity. It’ll probably have the chance to propagate itself and the universe for another age, but it won’t.

In software, then, bugs are bound to happen. Bad design has to take part and there will be a time when you have to leave your software rest in peace. Use your time in a more creative way because for you, there is no infinite time or resources. Your children (and other people’s children too) will grow quick and deprecate you.

Calliper, chalks and the axe!

Years ago, when I was still doing physics university in São Paulo, a friend biochemist stated one of the biggest truths about physics: Physicist is the one that measures with a calliper, marks with chalk and cuts with an axe!.

I didn’t get it until I got through some courses that teaches how to use the mathematical tools available, extrapolate to the most infamous case, than expand in a series, take the first argument and prove the theorem. If you get the second argument, you’re doing fine physics (but floating point precision will screw up anyway).

Only recently I’ve learnt that some scientists are really doing a lot by following in the opposite direction. While most molecular dynamics simulation are going to the quantum level, taking ages to get to an averagely reasonable result (by quantum standards), some labs are actually beating them in speed and quality of results by focusing on software optimizations rather than going berzerk on the physical model.

It’s not like the infamous Russian pen (which is a hoax, by the way), it’s only the normal over-engineering that we normally see when people are trying to impress the rest of the world. The Russians themselves can do some pretty dumb simplifications like the cucumber picker or over-engineering like the Screw Drive that, in the end, created more problems than solved.

Very clear, in software development, the situation can be as bad as that. The complexity of over-designed interfaces or over-engineered libraries can render a project useless in a matter of months. Working around would increase the big ball of mud and re-writing from scratch would take a long time, not to mention include more bugs than it solves.

Things that I’ve recently seen as over-engineering were:

  • Immutable objects (as arguments or on polymorphic lists): When you build some objects and feed them to polymorphic immutable lists (when creating a bigger object, for instance) and then need to change that afterwards you have to copy, change and then write back.
    This is not only annoying but is utterly ineffective when the list is big (and thousands of objects need to be copied back and forth). The way out of it is to use the bridge pattern and create several (RW) implementations of your objects and lists and whatever you have but that also increases a lot on code complexity and maintenance.
    My view of the matter is: protect your stuff from other people, not from yourself. As in “Library Consistent” or “Package-wise Consistent”.
  • Abuse of “Standard algorithms“: Ok, one of the important concepts in software quality is the use of standards. I’ve written it myself, over and over. But, like water, using no standards will kill your project the same way as abusing of them.
    So, if you create a std::set that gives you the power of log(N) searches, why the heck you’d use std::find_if ( begin(), end(), MyComparator() );, that gives you linear searches? Worse, that find was actually before each and every insert! std::set guarantees at least N.log(N) speed on insertion, but the “standard fail-safe assurance” was giving it N².log(N). For what? To assure no duplicated entries were ever inserted in the set, what was yet another thing guaranteed by the default container in question.
    All in all, the programmer was only trying to follow the same pattern over the entire code. A noble cause, indeed.

Now, I’m still defining what’s worse: over-engineering or under-engineering… Funny, though, both have very similar effects on our lives…

Serial thinking

I wonder why the human race is so tied up with serial thinking… We are so limited that even when we think in parallel, each parallel line is serial!

What?

Take the universe. Every single particle in the universe know all the rules (not many) that they need to follow. On themselves, the rules are dumb: you have weight, charge and can move freely round the empty space. But join several particles together and they form a complex atom with much more rules (combined from the first ones) that, if combined again form molecules that form macro-molecules that form cells that form organs that form organisms that form societies etc. Each level makes an exponential leap on the number of rules from the previous one.

Than, the stupid humanoid looks at reality and says: “That’s too complex, I’ll do one thing at a time”. That’s complete rubbish! His zillions of cells are doing zillions of different things each, his brain is interconnecting everything at the same time and that’s the only reason he can breathe wee and whistle at the same time.

Now take machines. The industrialization revolutionized the world by putting one thing after the other, Alan Turing revolutionized the world again by putting one cell after the other in the Turing tape. Today’s processors can only think of one thing after the other because of that.

Today you have multi-core processors doing different things but still each one is doing things in serial (Intel’s HyperThreading is inefficiently working in serial). Vector processors like graphic cards and big machines like the old Crays were doing exactly the same thing over a list of different values and Quantum computers will do the same operation over an entangled bunch of qbits (which is quite impressive) but still, all of it is serial thinking!

Optimization of code is to reduce the number of serial steps, parallelization of code is to put smaller sets of serial instructions to work at the same time, even message passing is serial on each node, the same with functional programming, asynchronous communications, everything is serial at some point.

Trying to map today’s programming languages or machines to work at the holographic level (such as the universe) is not only difficult, it’s impossible. The Turing machine is serial by concept, so everything built on top of it will be serial at one point. There must be a new concept of holographic (or fractal) machine, where each part knows all rules but only with volume you can create meaningful results, where code is not done by organizing the high-level rules but by creating a dynamic for the simple rules that will lead to the expected result.

How then?

Such holographic machine would have a few very simple “machine instruction” like “weight of photon is 0x000” or “charge of electron is 1.60217646 × 10^-19” and time will define the dynamics. Functions would be a pre-defined arrangement of basic rules that must be stable, otherwise it’d blow up (like too many protons in the nucleus), but it wouldn’t blow up the universe (as in throw exceptions), it would blow up the group itself and it would become lots of smaller groups, up to the indivisible particle.

The operating system of such machine should take care of the smaller groups and try to keep the groups as big as possible by rearranging them in a stable manner, pretty much as a God would do to its universe when it goes crazy. Programs running on this operating system would be able to use God’s power (GodOS libraries) to manipulate the groups at their own discretion, creating higher beings, able to interact, think and create new things… maybe another machine… maybe another machine able to answer the ultimate question of Life, the Universe and Everything.

I know letting the machine live would be the proper way of doing it but that could take a few billion years or I’ll be quite tired of engineering the machine and it’s OS and I’ll just want to the the job done quickly after that…

Why?

There is a big fuzz about Non-Polynomial time problems (NP-complete), those that can’t be solved in a reasonable (polynomial) time. The classic example is the travelling salesman problem where a salesman has to go to each one of a number of cities. Which is the best path to follow to visit all of them in the smallest distance possible? With 3 or 4 it’s quite simple but when you have lots like 300 it becomes impossible for normal (serial) computers to solve.

Another problem quite fancy is the Steiner tree problem, where you have some points and you want to connect them using the least amount of strings. This is as complex as the problem above, can take forever (longer than the age of the universe) for relatively small sets of points, but if you use water and soap the problem is solved almost instantly.

Of course, soap films cannot calculate the last digit of PI but because every part of it know a small list of basic rules (surface tension increased by the soap molecules derived from opposite charges between atoms) every particle of the machine works together at the same time and the result is only achieved because the dynamic of the system has it’s least energy (least amount of strings) in that state.

It’s true that today’s computers are very efficient on working on a wide range of problems (thanks to Turing proving the classes of problems his tape could solve) but there are some that it can’t, given that we only have a few billion years yet of universe to spare. Such problems could be solved if there was a holographic machine.

UPDATE:

More or less what I said was practically applied here. Thanks André for the link, this video is great!

True wisdom from randomness

You can live a whole life and remain stupid but a stupid program using a pseudo-random number generator and a clever algorithm (Markov’s chain) can excel us quite easily:

  • Input: The GNU GPL license
  • Output:

    “GNU General Public License along with you add to sue for details.”

  • Input: man perl
  • Output:

    “PERL (higher numbers usually being affected by wraparound).”

  • Input: My own wiki
  • Output:

    “Bioinformatics is a physicist (definitions of enforcing standards but it puts wrong things can build complex information systems and nothing is totally unacceptable for every new piece of giving generic answers.”