CSAM Back to Basics: Software

Software makes the machines hum, whether the now-ubiquitous mobile apps, the enterprise-grade software suites, or the under-the-hood workings of the operating system. There are two classic types of software, compiled and interpreted (along with many other variations and groupings). Compiled code is written as a complete unit, whose instructions are then treated as data by a program called a compiler which converts or translates it to binary code. This is an expensive process, and not very flexible, so why does this approach persist? Good compilers can optimize the performance of the resulting binary code and catch or correct many errors, so code that is used often, consumes a lot of computing resources, or requires fewer errors is often compiled. Coding languages such as C++, FORTRAN, LISP, or Java are generally compiled languages. In theory you could “prove” these programs are correct, that is, do exactly what you want and nothing else, although this is rare in practice since it is very, very expensive.

Interpreted languages on the other hand support much faster implementations and work by simply executing instructions when an input is received. This is common in applications such as browsers that can wait for you to hit the “Return” key and then take you to a desired web page (makes you wonder what they are up to the rest of the time…) Perl, Python, Ruby and JavaScript are all examples of interpreted languages. Faster, more flexible, interactive code is a huge positive, but it is difficult or impossible to test the infinite, arbitrary set of possible inputs and so the code is more prone to behaving in unexpected ways. Some cybersecurity attacks in fact are specifically designed to find and exploit these unexpected behaviors. In modern software practice these distinctions are increasingly blurred, with just-in-time compilers or line-by-line debuggers that allow for optimization of the resulting code in accordance with the purpose.

Software errors are often the entry point for cyberattacks. So why are there so many errors? Sometimes it is simply bad programming. Error corrections against one type of attack, cross-site scripting, have been well-known and publicized for a number of years, and preventing them is simply a matter of using good coding practices; yet, they persist. Much more often however, it is systemic issues with the coding process that create the errors and occasionally trip up even the best software engineers. Here are a few:

– Complexity. Yes, there it is again. It is almost impossible to write error-free code for anything but the simplest of systems, much less the enormously complex systems commonly in use today. It’s like writing a long novel without any spelling errors. But worse, since not only the spelling and grammar have to be correct, so too do the logical constructs implemented in the code. Perhaps this is more akin to writing the perfect mystery novel, where all the clues both necessary and misleading have to be present in exactly the right order and proportion so that readers have the information required to determine “Whodunit”, but generally do not.

– Size. Software development is a team sport, often involving large, distributed teams over an extended period of time. One study looked at over 4,000 completed software projects since 1994 and analyzed the team sizes involved. On average, teams of 30 or more people took just under 9 months for a project of 100,000 lines of code. Astonishingly, teams of 5 or fewer people completed 100,000 lines of code in just over 9 months…only one additional week. The difference was in the rate that errors were both introduced, and discovered and fixed. Certainly a management issue, but also an indicator of the penalties that size and complexity introduce.

– Functionality. The first order of business when writing software is to get it to do the functions you intend, and this is often tricky enough. The negative, making sure that the resulting software doesn’t do anything you did NOT intend, is both far more complicated and often under-appreciated by businesses whose revenue depends on shipping working product.

– Test incompleteness. Basically, you can’t test quality into completed code. There are now good statistics on the expected number of errors per 1,000 lines of code, and the reduction (never to zero) of testing at various stages of development. The best practices are to write good code in the first place, and try to find errors early.

– Shared code. Almost nobody writes large applications from scratch these days. Open source code and code libraries such as GitHub are rational positive ways to reduce the cost of software development. But it means that nobody fully understands the code they deliver, or can be sure of the types of errors that may lurk within. Crowdsourced testing of open source components seems to result in better code, but the last couple years have also seen cyberattacks built around errors in some of the ubiquitous underlying modules form the open source libraries.

So what can a user do? Keep your computer’s code base current by implementing the patches routinely pushed by software vendors. Be careful where you get your software. For example, Apple has been well known for controlling the app publishing process much more closely that Android, which at least initially allowed almost anybody to publish code. The result is that Android phones are attacked much more frequently than Apple phones. And be aware of the unexpected ways in which software behaves. For example, a number of articles are now being written discussing how google maps tracks the location of users, and how to prevent this if you desire. Mobile devices allow you reasonable control over many of these types of functionality, so it is worth the time to configure them properly.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s