Wednesday, 23 January, 2019 UTC


Summary

In a library, there are various kinds of books on a very wide range of topics. There are books on Agriculture, Science, Arts, Politics, Nature, Software Engineering (as i must especially mention… 🙂 **) and so on. These books can be called data. To store this data effectively, libraries are organized/structured using various techniques **into departments, catalogs, shelves or other collections used to show relationship.
Examples of such relationships are:
  • similar genre
  • same author
  • same publisher
  • alphabetical order etc.
The aim here is to group related content together and show order and hierarchy so that items may be located quickly(time), books ordered neatly and optimally(space) and collections browsed efficiently.
Libraries and software are actually very similar. We shall now take a closer look at the relationship between data structures and algorithms. We would consider how essential they are to computers, and in the process we will find striking similarities between software and a library. Both very distinct, yet significantly similar.
Daily, we build software that stores data in some way(structure) and manipulates that data through various algorithms consistently. Perhaps you haven't noticed, but the same thing happens in the library.
How so?
First, lets get some terminologies out of the way.
Demystifying Terminologies

Data Structures

A data structure is simply an organized way of storing data for use by computer processes. It is a pattern of collecting and organizing data for performing various operations correctly and efficiently.
Just like books in a library are ordered in certain patterns like genre, difficulty, and fiction/nonfiction, computer programs are also organized in meaningful ways. They are designed to receive, process, organize and output data in meaningful and efficient ways.
Just like the shelves help us organize books better at the library, programs have their own in-built ways of keeping data organized and working with data easily. Some of them are:
  • variables
  • conditionals
  • iterators
  • functions
  • scope
  • arrays
  • objects
You may already be familiar with the above terms.
Imagine you were writing a program where you had to store details about a person. Intuitively, what may come to mind is very likely an array or an object as shown below:
// Array Implementation
let user = ['Obosi', 'Philip', 200, 'Nigeria', 'Frontend', ['English', 'Igbo','French']]

// Object Implementation
let user = {
    firstName: 'Philip',
    lastName: 'Obosi',
    contributions: 200,
    country: 'Nigeria',
    role: 'Frontend',
    languagesSpoken: ['English', 'Igbo','French']
}
Note: The snippet above is just sample code that may be used anywhere within an actual application to store data about a person.
In the code snippet above, we make use of various things that ensure that data is meaningfully handled within an application. Let us highlight a few:

Variables and Scope

In JavaScript, let and const are used to declare variables for use within the block of code where they exist. Notice how we use let in line 2 to scope that variable to a certain block while keeping it mutable. This at a very basic level provides context as to how this block of code should be handled. Naming the variable user also adds meaning/context to the data as it tells us that this is information about a specific user.

Arrays

A pretty technical definition of arrays would probably be that they are “linear collections of elements that are accessed via indices which are usually integers used to compute offsets”. Now, i do not consider myself very dumb, but this really doesn't sound simple to me.
let user =['Obosi', 'Philip', 200, 'Nigeria', 'Frontend', ['English', 'Igbo','French'] ]
In simpler terms, an array is a container object that is used to hold a list of items usually of same type. Looking at the snippet above, we declare a variable user and assign a list of information about the user to it in the form of an array. Just like a book from our library analogy will contain a list of pages numbered in ascending order for easy navigation.
Note: In counting the items in an array, by convention we always start from 0. This means that each element holds a position with a negative offset of one.

Objects

Objects are used to store a collection of related data or functionality in the form of key-value pairs. Unlike arrays, they are designed to hold more complex entities usually of varying types. Here, we assign an object with information about the user to the variable user.
let user = {
    firstName: 'Philip',
    lastName: 'Obosi',
    contributions: 200,
    country: 'Nigeria',
    role: 'Frontend',
    languagesSpoken: ['English', 'Igbo','French']
}
A striking similarity exists again as on the Engineering shelve in the library, we find lots of publications i.e books, journals, magazines and more(values) identified by their titles(key).
Now the big question; Should it be an array or an object?
As mentioned before, data structures ought to hold and manipulate data in a meaningful way that keeps the relationship between them. Looking at the array implementation above, you’d observe that it provides little insight into the relationship between the various chunks of data and thus isn't very meaningful to a person.
The object implementation on the other hand, shows the relationship with the use of hierarchy and key-value pairs. With one glance, you can tell what each piece of information means and how they are all connected.
This example goes to emphasize that computer programs at nearly every level store and manipulate data. However, key points of consideration in how they do this are whether the relationship is preserved and data is easily accessible. This is where data structures come in.
The best data structures consume minimal resources while storing data in a meaningful way for various operations.
As is the case for most Software engineering concepts though, it is a case of “different strokes for different folks”, hence what is fitting varies with the kind of data that is being handled.
Algorithms
A quick search on Google will almost definitely return a definition like this:
a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer
Let’s picture this in the context of a library. If you were to get a book at library and needed to pick the book up for study from the Engineering section, how would you go about this?
Would you
  • pick up every book on the shelve until you find what you want?
  • pick a book at random till you find it?
  • try to find it by alphabetical order?
  • use the library’s search guide as a reference?
Bear in mind that these are all ways to achieve the goal eventually. Also, each method highlighted above can be broken down into smaller steps that cumulatively achieve the task at hand.
For the first, you would have to:
  • walk up to the shelve
  • pick up the first book
  • check if it matches what you’re looking for
  • If it does, yayyyyy!!! (that was quick)
  • If it doesn't, you drop the book
  • pick up another and check again.
This process is continuously repeated until you find the desired book. This step by step break down of the whole process into actionable steps is what we called pseudocode in computer programming.
The four ways we considered however, can be examined carefully to determine the most efficient specifically by calculating/estimating time spent and energy consumed in the course of searching through each method.
The most optimal solution will thus be one that consumes minimal energy and takes little time to complete the task.
Algorithms are just like Granny’s recipe book that contains all the steps required to make that special delicacy. In more technical terms, they are “well-defined computational procedures that takes some value, or set of values, as input and produces some value, or set of values as output”.
This definition highlights the main properties of an algorithm which we will now consider:
  • Input- An algorithm must possess 0 or more well-defined inputs supplied externally to the algorithm.
  • Output- An algorithm should have 1 or more well-defined outputs as desired.
  • Correctness- Every step of the algorithm must generate a correct output.
  • Definiteness-Algorithms should be clear and unambiguous, thus every step of the algorithm should be clear and well defined.
  • Finiteness- The algorithm should have a finite number of steps that must be carried out to achieve the task at hand.
One thing that makes algorithms worth studying is that in most cases, there exists more than one algorithm capable of handling the task at hand as in the library case above. Again in this context, efficiency is key and is determined by considering Time complexity(the execution time of an operation) and Space complexity(memory usage of an operation) of an algorithm.
The Story so far
We can briefly summarize all we’ve learnt so far in two points:
  • Just as a library organizes books in a meaningful way for easy and efficient access, computer programs organize data in various ways too. This is so that computer processes can access and use them easily and efficiently.
  • At the library, there are various ways to search out a book, some more efficient than others. In computer programs there are various ways to manipulate data and get tasks done(algorithms). For these too, some are more efficient than others and efficiency here is determined by considering the time taken and the amount of memory that is used in the process.
Now we have gained an overview of data structures and algorithms. Why then should you worry about these things?
Modern Day Applications
Most applications today are data-driven. Intrinsically at the basic level they tend to do these four things:
  1. Accept data
  2. Store data
  3. Process data
  4. Output data
Handling such data efficiently in many cases goes beyond the capabilities of existing data structures that come packed within a language. It is a software engineer’s responsibility to create models of the problem domain at hand so that data is organized and manipulated in a consistent manner with respect to the problem at hand.
Why Data Structures?
Often, the need to handle complex data results in the creation of Abstract Data Types(ADT) which are designed to create a logical description of how data is viewed and the operations that can be carried out on them. These ADTs allow us handle data more efficiently such that we worry about what the data represents and not how it is constructed. This process is known as Data Encapsulation. Examples of such data types include the Array, List, Map, Queue, Set, Stack, Table, Tree, and Vectors.
Why Algorithms?
A fundamental understanding of various algorithms and data structures better equips a software developer to use them properly. Software Engineering is learnt and mastered through practice, so an examination of different problem-solving techniques better prepares you to handle challenges in the future. Eventually, you begin to recognize patterns and that helps you transfer your knowledge and understanding across various domains and problem sets.
Moreover, a primary concern when dealing with algorithms as earlier noted is efficiency. How is the efficiency of an algorithm determined?
Algorithmic Efficiency
There are two stages at which algorithmic efficiency is evaluated. You can think of this basically as one being a theoretical analysis and the other being practical.
  • Priori Analysis. This is the theoretical evaluation in which factors such as processor speed are assumed to be constant and are not considered as having any effect on the algorithmic implementation.
  • Posteriori Analysis. This is the practical evaluation carried out by implementing the algorithm using a selected programming language. The algorithm is then tested on a computer while observing the time taken and memore consumed.
Priori analysis makes use of asymptotic notation. It is a standard way of expressing an algorithm in terms of its time and space complexity. This is due to the fact that these are characteristics that cannot be measured or compared directly. The most commonly used type is David Knuth’s Big O Notation.
The Big O Notation also known as the upper bound of an algorithm or worst case tells us that a certain function will never exceed or take more than a specific time to execute irrespective of the value of the input(n). This is considered a very effective way to approach analysis as it is able to tell us how our algorithms will perform in the worse case, perhaps with very large input.
Conclusion
We have seen how data structures and algorithms are finely interwoven in the design of good software.
Data structures help us organize data meaningfully and algorithms manipulate data for us efficiently.
In this course, our study of algorithms will teach us analysis techniques that will allow us compare various solutions based on their own characteristics and not the characteristics of the overall program.
We will often do a posteriori analysis in order to ascertain the efficiency of the algorithms considered, however the results of a priori analysis will be considered in fitting cases.
In the end, there often exists multiple approaches to solving a problem. As a software engineer, it is your responsibility to explore these solutions and then test them to decide which is fitting in each situation and this is a recurring process indeed. This course is designed to help JavaScript developers do just that!
Further Reading
To learn more about data structures, algorithms and algorithmic efficiency, here are a few hand-picked resources:
  • Problem Solving with Data Structures and Algorithms- Interactivepython.org
  • The Importance of algorithms- Topcoder
  • Algorithmic Efficiency - Wikipedia