USER FRIENDLY EXPOSITION OF COMPUTER SCIENCE SUBJECTS FOR MASS EDUCATION
The rapid proliferation of computer science training and educational requirements in India to cater to the explosively growing outsourcing industry has led to many problems. A major problem has been the proliferation of engineering colleges without the requisite dedicated, competent and skilled faculty in core subjects. An experiment conducted over the last decade in computer science instruction for mass formal and informal education in a core subject is reviewed. Successful courseware and material has been created over the last few years for not for profit mass distribution and instruction in the core subject. This is for either formal instruction or self study of the subject at undergraduate and beginning graduate educational level in computer science. The target is a variety of engineering, computer applications colleges and computer training institutions of various degrees of quality in the interior regions of India in remote towns and villages. It is expected that with the growing global village, the rising and high costs of real estate and infrastructure in the overburdened cities a major shift is of outsourcing is evident to the interior regions of the Third World.
The Y2K problem and the dot com boom & bust led to the major revolutionary change in the international software scene of the Indian IT professional being identified with outsourcing at the lower levels and phases of the software development life cycle(SDLC). This led to an explosion in outsourcing to India over the last decade. In a parallel line this also resulted in engineering colleges mushrooming for formal education in computer science, information technology and computer applications. The insatiable needs of the software outsourcing industry for personnel has led to an acute shortage of trained and competent faculty in the engineering and computer applications colleges in core computer science subjects.
The rapid mushrooming of engineering and computer applications colleges has led to a dichotomy in the mass production of graduates oriented educational process over the last two decades in India. This is best explained by an example drawn from the area of Modeling and Simulation. In his visit to India in the seventies Prof C A R Hoare of Oxford pointed out the unreality of the simulation process. When the actual machine being simulated is actively and noisily doing the work with utmost vigor the simulator is silent, maintaining a ghostlike silence. When the machine is idle and silent the simulator grinds away furiously. Thus many of the formal educational and training institutions and the associated plethora of statutory bodies regulating the same are like the simulator concentrating, in all forums with enormous fanfare, on the academic requirements, formalities, paper work, administration and all formalities with utmost vigor without the students and teachers being really involved. At times the teachers and students may be mostly or partially not be present at all. When the simulator is totally silent like in the wee hours of the morning, evening and late hours of the day, holidays or off on a seasonal/ yearly vacation, the actual process of acquiring information and knowledge by the students takes place. Many coaching and training institutions have sprung up all over for imparting the actual educational requirements of teaching, coaching and all that goes into the basic requirements of actual information and knowledge being imparted to the student community. This is like the simulator closing its eyes, being disdainfully and totally silent while the machine is grinding away. Our study is to cater more to the machine consisting of the young generations and not the simulator which primarily draws personnel from the older generations. We have of course extensively and intensively used the material presented here in the framework of the formal educational processes and institutions.
The major skill requirements of a software professional for outsourcing in the IT industry have been determined almost 40 years ago. These requirements have been enunciated by Prof C A R Hoare of Oxford University in his visit to India in the seventies. Software outsourcing requires that at the higher levels of the software development life cycle(SDLC) the entire work to be done in the outsourcing situation has to be thought out to the last detail. Then the best possible persons with the highest IQ are to be employed, and they should implement the details determined upon without the slightest variation being allowed. It is this strategy that was initially proven in the history of outsourcing with enormous success. This was in the maintenance of tens to hundreds of millions of lines of legacy software code.
Software Engineering specialists add to the above requirements the personnel requirements of youth, competent education, good communication skills, ability to work in a team, some skill in programming, good command of English, a basic two years industrial experience and the ability to work for low wages. The BRIC countries satisfy the need for masses of personnel with these characteristics being available.
It appears that some industrial and academic environments that did not understand the crucial need for a mass of motivated, young, high IQ personnel for carrying out the software drudgery activities of large legacy software, for low wages, have failed to prevent outsourcing to the BRIC countries. It appears advanced economies underestimated the requirement of high IQ, high intrinsic skills and low wages. They decided it was not crucial for routine maintenance activity. It appears this wrong judgement caused outsourcing to the BRIC countries, but this is not clear.
In India the requirements of the outsourcing industry are met only by a small percentage of the students turned out by engineering colleges at present. A very large bulk of the graduates is not easily finding a place in the IT outsourcing industry. This can be attributed largely to the dichotomy of the educational process i.e. the actual knowledge acquired when one goes through the educational process for a degree versus the formal administrative and academic requirements to be satisfied to obtain the same. Another major reason is the fundamental requirement of high IQ or as high as the cut off point demanded by the software outsourcing industry.
Computer science has the doubtful distinction of having some core areas having over developed and over studied in the sixties and seventies of the last century. This has led to certain compilations which have remained classic text books at the undergraduate level for the last 40 years. Among these are the White Book for C, the Cinderella Book for Automata, the Dragon Book for Language Processors. The knowledge contained herein is more than sufficient. As far as the last two are concerned they not necessary for most jobs in the outsourcing industry. However it is essential to note that quite a few of these subjects have been ordained, by the powers that be, to be essential to the curriculum and the students all over have to clear the same. The masses of persons required for outsourcing is in millions and requires a mass of self-taught instructors.
In computer science education a core subject we have selected for which there is an acute shortage of faculty is the area of Design and Analysis of Algorithms, popularly called DAA by the student community. For a major portion of the subject in the Indian computer science and software area the classic text by Horowitz et al has remained the favorite with the boards of studies of various educational institutions. The international scene may have accepted other renderings of the area. The subject also has the difficulty of having an acute shortage of faculty who can impart instruction in the same. Combined with data structures it is essential in induction training programs in the training and outsourcing software industry.
The intake of the Indian engineering colleges over the entire country runs into lakhs of students. Assuming that the average teaching scope of one instructor is about 100-200 students, tens of thousands of instructors are required per subject. This is apart from the requirements of the computer training industry. This requires that for some core subjects like DAA mass methodologies must be in place for the mass creation of instructors. We suggest a scheme which we have created after extensive experimentation over the last two decades, and crystallized into working courseware over the last five years, at M J College of Engineering.
The basic contents of a DAA course can be defined to consist of about 30 basic algorithms. These algorithms can be described employing either by a programming language like C, C+ or JAVA or by informal algorithmic notations. We suggest a method used historically in dBase in the childhood days of the personnel computer. This was description by example called query by example. If a properly formulated instance of an algorithm is simulated in detail then we have by generalization the entire description of the algorithm. This reduces the core instruction in DAA to a simulation of about 30 algorithms in step by step detail with a properly chosen sample instance.
The instances of the 30 examples should be capable of wide reuse. It is seen that if examples are drawn from widely available resources or the standard texts then there is a wide uniformity in their use in the Indian environment. Algorithms not covered in the texts can be tackled by using GATE problems which can be considered a free resource. GATE is a competitive examination for entrance to IITs for postgraduate education and some leading institutions in India. The previous question papers are in the public domain and the problems on Algorithms there can also serve as a good, standard, widely available and acceptable resource.
The selection of the problems has also been influenced by the requirements of the SWITCH group of companies when conducting induction training programs in data structures and programming in C, C++ and JAVA. This is drawn from the requirements of a company in which one of the authors has been actively involved in outsourcing during the period 1996-2003. The number of trainees experimented upon was about 1500.
The requirement of a mass of instructors to be available in the engineering & computer applications colleges, and that too at short notice is a severe limitation. To be taken into account is the rapid attrition of this mass of instructors to the IT outsourcing industry. The major challenge is that the instructors should be made acceptably competent in teaching and instructional techniques in a small time frame and with the minimum effort on their part.
The first line of attack is the exposition of the algorithms itself. In his directives for the development of computer science education and the Indian software industry, Prof C A R Hoare had pointed out that there was a major need to do away with the unnecessary abstraction computer science has wound itself into. The basic concepts of computer science were simple, made unnecessarily complex. He suggested that common sense expositions are the way to proceed to tackle the results of topics like concepts and algorithms.
Experimentation in common sense expositions in the rendering of algorithms over the last two decades at the public sector undertakings like ECIL, various professional societies in Hyderabad like the IETE and CSI, educational institutions under Osmania University, JNTU, BITS, Kakateya University, Nagarjuna university etc., outsourcing industries like Satyam computers, social charitable and social expositions under the auspices of the Lion’s club and other forums were attempted by the authors. These common sense expositions were related to the social and cultural requirements in an Indian setting. The audience is drawn from feudal agrarian environments of towns, and villages without the necessary sophistication of environments the industrial and information revolutions bring. Over a period of time we have found that common sense expositions and rendering of the problems in the algorithms were possible and widely well accepted and received by the audiences.
Since the area of algorithms is basically mathematical it is possible to have an interpretation of an algorithm in a particular setting. Mathematical abstractions have an ubiquitous component in that common sense interpretations of a mathematical concept or problem can be found in any area of human thought and action. Thus regardless of the environment being feudal or modern, whether it belongs to the agrarian, industrial or information revolutions, whether it is urban or rural we can find common sense interpretations to algorithms. However, commonsense interpretations are culture dependent. Thus what may be a commonsense interpretation in a culture or in a particular geographical region may not be considered commonsense in another culture or in another geographical region. Thus common sense interpretations have to be localized like the search results in Google and other search engines.
It is a well known fact that visual methods are superior to text. In the ancient Chinese saying: a picture is worth a thousand words. Early expositions of Data Structures and Algorithms in the initial formative days of programming and computer science were highly mathematical. An example is the use of Iverson’s APL as a tool for DAA. The classic work of Knuth culminated in popularizing visual methods for data structures. The fundamental statement of almost all algorithms of practical interest using ADT’s is known. To this set may be added a few to complete the collection.
Completeness and rigor are claimed in the exposition of algorithms by some sources. However these may not be requirements in the mass educational requirements for the outsourcing software industry. Expositions where the formally stated algorithms and associated data structures can be naturally visualized and easily mapped to the C, C++ or JAVA programs seem more relevant to the outsourcing industry. It has been noticed that mediocre and above average students feel the expositions with rigor and formality give the impression of the last word in the algorithm or its formal treatment being said. Some expositions allows a continuous development, modification and varied implementation of the algorithm to be easily visualized. This allows easy implementation in major popular programming languages with varied data structures. This is more suitable for preparing the student for a professional career in the software outsourcing industry.
Algorithms are to be studied along with the data structures used for the algorithm. Even if ADT’s are used, they can be visually represented in colorful, rich user friendly using tools easily and widely available worldwide with MS-Office.
In our experimentation over the last five years a method initially used was to have custom plastic pieces of various colors for arrows and numbered colored round pieces for nodes to simulate algorithms using the tree data structures. Square numbered plastic pieces of various colors were used for the simulation of linked lists. This was necessitated as there was at times no electrical connection in the remote geographical area being considered. There was at times no access to paper or a blackboard. There was no access to the PC as the charge on the same lasting for a reasonable amount of time was not possible. Attempts to use these plastic pieces for instructional purposes in the areas of data structures and algorithms had the disadvantage that an error in the presentation with a detailed simulation of the algorithm can have catastrophic pedagogical effects. Also it was difficult to keep track of the instantaneous descriptions or snapshots of the computation, and one could not backtrack to a previous position.
The situation has changed over the last few years with the advent of netbooks with rapid growth of disk space in the ubiquitous personnel computer. (Note we have said personnel computer, rather than personal computer, as it is a shared resource in a class, a family or in a social circle, in most of India.) The use of presentations using the easily available MS-Office was restricted to megabytes of storage a decade ago. Large graphical images and content could not be used. The high prohibitive price in India of writing on a CD-ROM a decade ago, the low storage capacity of floppy drives, low disk capacities, small RAM memories and compared to today the low processing speeds did not allow bulky, highly graphic presentations to be used in the environment. The recent explosion in disk capacities extending to terabytes allows gigabytes of storage to be easily spared for a powerpoint presentation, presented in parts. The ability of many peripheral devices to draw power from the USB port has been a boon. The has been rapid rise in RAM capacity to gigabytes and the rapid increase in the processing speeds and methods of processing in current easily affordable first or second hand, personnel computers. This allows routine rich, colorful, user friendly, high graphic content to be created by persons of ordinary skill and talents, in a few hours or days. The availability and affordability of high capacity pendrives, by all levels of society, allows easy transportation and distribution of presentations requiring gigabytes of storage at very little cost or no cost at all.
The technology as available now allows multiple snapshots of the entire computation. This is akin to the instantaneous descriptions of the computation being recorded at each time unit by a slide, or a small ordered set of slides, in the presentation. The use of such snapshots is very common in theoretical computer science like the simulation of an automata like a Turing Machine. In fact it is useful for the entire simulation, in excruciating detail of a finite computation, or the sketch of some infinite computations, possible on many deterministic and nondeterministic computational devices.
Thus the technology widely, easily and cheaply available in India allows the medium of high graphics oriented powerpoint presentations for all computer science and software oriented subjects. We have found that in our presentations on algorithms each algorithm, with suitable sample data as input, can be described in detail in about 500 to 1000 power point slides using upto half a of gigabyte of storage, in either one presentation or a group of them. The exposition proceeds by noting that what an instructor should do in the class, in a period of one or two hours, is a step by step development of the algorithm by executing it for a suitably chosen sample input. This is translated into a slow step by step sequence of annotated power point slides in a method akin to the classic development of Walt Disney movies like Snow White and the Seven Dwarfs. Each step is described in excruciating detail by a new power point slide, thus requiring hundreds of slides per algorithm. This allows the student or faculty to reuse the presentations indefinitely for future recapitulation and revision. The normal development time is 20 manhours of work for an hour of presentation on the average. The basic requirement however is a mastery of an excruciatingly detailed simulation process of an algorithm and its data structures for a finely grained presentation to mirror the same.
Alternatives to the same are seen on the internet for a long time. Simulations by JAVA applets of algorithms abound on the internet. These perhaps give a visual representation. An algorithm has to be simulated by hand for an average human being to understand it. There is no escape from this requirement. The alternative we suggest is that it can be watched as the presentation gives this simulation in detail as if by hand. Historically this was done by pencil and paper and the use of the same was advocated, as opposed to just doing it mentally. Thus the JAVA simulations on the internet by applets do not result in the knowledge of how the algorithm works to be transmitted to the user. They do not also give the entire snapshots of the computation in excruciating detail, tailor made for use by human teachers of different cultures. It may be suggested that the presentations be automatically generated. The software may be developed but what is required is different. One human being should go through the excruciating process in great detail, with dynamically varying commentary suited to the audience, for every instantaneous description of the computation or slide of the powerpoint presentation. This involves stepping through the instantaneous descriptions and explain them to another human being in detail without error.
To make the presentations widely reusable and acceptable the data for the algorithms is drawn primarily from previous GATE papers and secondarily from standard texts. This allow the presentations to be used for notes as a supplement to the standard syllabus contents of DAA, in most colleges and universities in India.
The presentations have been used with a variety of audiences. They have been adapted to the curriculum of the DAA syllabus of about half a dozen universities so far. They have been experimented with instruction in GATE classes. The instruction of students in rural engineering colleges has proved to be a challenge. Some colleges have students who have ranks in excess of a lakh in EAMCET, the common entrance test for engineering in Hyderabad. In experimenting with the instruction of DAA in rural engineering colleges it was found that the low ranks of the students were not problems of IQ. The low ranks in most cases has been the effect of a sudden change to the medium of instruction changing from Telugu or Urdu to English at the college level. By generalization the situation is the same throughout India. The bulk of the population is educated in regional languages at the school level. A transition to English as a medium of education occurs at the college level.
In this connection a major motivation was the directive of the legendary Prof E W Dijkstra, both orally and in writing (through a EWD) to one of the authors. He held that the natural language used by a human being for elaboration in software and computer science did not matter. What was required was a mastery of the natural language. Thus the medium of instruction being in regional languages does not matter, in the long run. A temporary discomfort of about a year is felt by the student when a transition to English as the medium of education occurs at college level. In parts of India the regional language is employed by software engineers involved in outsourcing. The client outsourcing may be in Europe, America or Japan.
It was found that as the presentations are basically pictorial in nature. They have a high requirement of mastery of mathematical concepts which can be stated in common sense terms. They are basically natural language independent. They can be adapted to expositions in any natural language. Thus the use of the presentations aids in the understanding of other areas of computer science and software.
The rapid growth of the global village is evident around us. The last decade and especially the last five years have seen the global village effect reach remote villages and towns in India. On the other hand the rapid rise in real estate, the collapsing strain on the infrastructure of large cities is evident. The shift of the outsourcing industry to the interior regions of India and the Third World is being actively considered. In such a case the requirements of skilled and competent software professionals in the interior becomes essential. We feel that the use of the rapid growth of technology has to be used. This will allow rapid growth of the software outsourcing industry for employment of the graduates of the colleges. Our effort is a step in this direction to cater to mass development of instructors in computer science. As a parallel line of approach we tackle mass training and education in computer science subjects.
The aim is to distribute the presentations on a no profit basis to a wide variety of students and faculty and obtain feedback on their use for mass training and education. The total storage capacity requirements of the presentations runs into gigabytes. Wide distribution over the unreliable and not always available Web in India is not possible. Also the bandwith widely available economically on the Internet to the common person is limited. The alternative is free distribution via the medium of pen drives. Either of the authors can be contacted for a copy of the presentations. They can easily be rearranged to cater to most of the curriculum requirements in the subject for most educational and training institutions. Free modification, adaptation and expansion of the presentations can be to the choice and taste of the student or faculty. We would welcome a natural language translation of the presentations into the major regional languages of India, and experimentation with expositions at the primary and secondary school level. It may be noted that we have said even the primary school level. The classic exposition is that all algorithms can be reduced to about 30 easily stated problems.
The 30 problems considered by us have some aspects where a simple common sense rendering can be easily explained to primary school children. It may not be normally possible to explain the intricacies of the algorithm and data structures in full detail to this level of audience. We have found that graphic simulations of backtracking algorithms like the 4 queens problem, the Hamiltonian cycle problem and the graph coloring problem are easily grasped by primary school children when presented by the medium of our presentations. The problem statements of many of the algorithms can be introduced at this level with simple commonsense renderings. For this purpose algorithmic notation is not used. We follow the tradition used by innumerable cultures for centuries in the pre ALGOL-60 era and explain the algorithms in natural languages with commons sense explanations. The sacred constructs and concepts of ALGOL-60, PASCAL, JAVA, the development of algorithms using pidgin varieties of these and the associated venerable formal verification proofs are left for a much later stage in the life of the primary school children.
Our presentations span 30 problems which are basic to the DAA area and form the core of any curriculum in DAA at the undergraduate level of a computer science or information technology engineering baccalaureate course or a degree in computer applications in India. The presentations have been prepared with an effort of about 1000 person hours over a physical time duration of six months. They have been experimented upon with audiences of various backgrounds, drawn from the entire country, totaling about 3000 persons in various universities, colleges, coaching and training centers in the southern states of India.
For detailed understanding of the proof and time complexity of the algorithm common sense explanations can be given. This requires that the culture independent formal mathematical apparatus be dismantled. A particular suitable commonsense interpretation of the problem be taken suited to the local cultural environment. The construction then be recast in the local setting. The construction if properly chosen can be made to be very user friendly, intuitive, and common sense based. Care should be taken no to offend the cultural sensibilities of the local environment. Acceptable comparisons, statements, and comments on ways of life in one environment may not be acceptable at all when the environment, culture, geographical or economic situation of the audience varies even slightly. What is common sense in one part of India with one type of audience, may not be understood in another part of India or same part of India with slight variations in the type of audience.
A particular cultural environment that seems to be universal is to use the world of nursery and primary school children. Common sense examples from this world seem to have an universal appeal and are basically culture independent. They are also universal and powerful and can cater to the interpretation of any mathematical construct, concept or theorem. Thus they can be used for commonsense renderings of algorithms. The world of Winnie the Pooh and Tigger seem to have universal appeal and will not result in the instructor stepping on someone’s cultural toes.
For a soft copy of the presentations, the logistics of obtaining the same and tailoring the same to cater to a particular curriculum the authors may be contacted by e-mail. As pointed out earlier in the existing Indian environment, the Web cannot be used cost effectively for distribution, the presentations are too large for normal CD’s, perhaps DVD’s would do but the cost of cutting them involves administrative, economic and time overhead. Hence the ancient snail mail or its variants, with high capacity pen drives, the social network and the public transportation network have to be used.
We now proceed to exhibit that commonsense rendering of many concepts of computer science is possible. In particular we take twelve computationally related problems and restate them in a commonsense way.
In all areas of knowledge in Arts, Sciences, Engineering and Technology a common modeling device is combinatorial optimization. In the area of combinatorial optimization some 3000 hard problems have been isolated by Computer Scientists and mathematicians which though deceptively easy are extremely hard to solve by the computer or by hand. Among these problems are the NP-complete problems.
The NP-complete problems are problems practically arising in computation in industry, research and all activities of human endeavor. They have the interesting property that all attempts to solve them over the centuries have yielded problems of exponential time complexity. This restricts the input problem size to about 40 elements. Practical models of problems have 100 to 1000 elements.
To understand the concept of NP-complete problems let us consider the same from a user friendly point of view. Let us consider the actions of the early Christians and Jesuits in their pursuit of new converts to their religion. When they go round the world to a new kingdom or tribe consisting of a million or couple of thousand persons, a long, laborious and time consuming method is to convert each one of the million or thousands of persons. Alternatively a much simpler method may exist and this is what they followed. What they did was to isolate a kingpin person in the new kingdom or tribe. This could be the king, the chief himself or a very influential noble. If that person could be converted then all the million people of the kingdom or the entire tribe will fall in line and get converted. The king, the chief or the noble involved is called the “-complete” person. The same technique as per Sociology is followed when one religion wants to swallow up another, or one culture would like to swallow up another. Isolate a kingpin and convert him/her then all fall in line. This is what happened as per Russian history in Russia. The kinpin converted himself to European culture and everybody fell in line and switched.
The same problem is followed by a teacher in the class. If the class if unruly the long way is to convince all the students in the class to behave. For undergraduates this could be 200-300 students to be convinced. This is a long and laborious. Alternatively, get hold of one kingpin student, the leader and discipline him. Then all the students fall in line. This is what we mean when we say -- “make an example of”.
The method is followed by the police to control the goons in town, or break up an agitation led by some politician as the eye of the cyclone. Isolate the kingpin and discipline him. Everyone falls in line. This is the concept behind the NP-complete problems.
The existence of these kingpin problems was originally shown by Cook, who showed the existence of the first NP-complete problem, called satisfiability. Karp isolated 21 classic problems which were also shown to be kingpins or NP-complete problems. For centuries these problems had led countless mathematicians, of all calibers, to lifetimes of despair and found to be extremely hard to solve in a reasonable way in a short span of time.
In simple terms, we have two classes of problems: the goons and ordinary citizens. We want to reform the goons and make them ordinary citizens. Cook isolated the venerable kingpin goon, the satisfiability problem. If this goon can be reformed to an ordinary citizen then all goons can be reformed. If the kingpin goon is incorrigible and cannot be converted by any means to an ordinary citizen then the goons form a distinct class and will contain many goons who cannot be reformed.
The general process of showing that a problem is NP-complete can be described in a user friendly way. Let us take the kingdom of the goons. The basic requirement to be a kingpin goon is that one must be a bonafide goon in the kingdom. This is like some countries that allow anyone to become the President of the country but the basic requirement is that one must be a bonafide citizen of the country. The next step is that there is already a club of kingpin goons. Cook started the initial membership with the satisfiability problem. Karp further populated it by an aristocracy of 21 goons. This club has exclusive membership. Membership is by invitation. An existing and known member of the club, a bonafide kingpin goon, must go through the procedures of converting a goon from the kingdom to a kingpin goon. He must demonstrate that the goon chosen for elevation is a bonafide goon in the first place, then that he is a peer of the kingpin goons, thus elevate him to the exclusive membership of the club. After Karp large classes of goons were shown to be kingpins, and the club effectively became a democracy . Today thousands of goons in the kingdom are known to be wearing the ‘dunce cap’ of being kingpin problems. It was shown by Sahni that there are no approximations to the kingpin goons being reformed to behave almost like ordinary citizens. Once a kingpin goon always a kingpin goon.
We classify problems in two categories, decision problems and optimization problems. Decision problems have a binary answer, yes or no. A solution exists or does not exist. NP-complete problems are decision problems.
Optimization problems are slightly different. Let us consider a user friendly example. We would like to know what weight a weightlifter can lift. We call this an optimization problem. We convert it to a decision problem by giving the weightlifter two weights on small and one large. He easily lifts the small weight. Then if he cannot lift the heavier one then we try something midway. This partitions the weights between the heavy and light weight into a lower and upper half. We proceed to that half and repeat the process. At the end we end up with the weight he can lift. Thus we convert an optimization problem to a decision problem. This like the maximum height one can jump, the maximum number of doughnuts a person can polish off in one sitting, from how far away can one hear one’s teacher shout at a student, the minimum bribe to be given to the government employee for a routine job etc. are all examples of optimization problems. In principle they can all be converted to decision problems like in the weightlifting example, though in practice it may not work with a rapacious person in the bribe case.
From the set of 3000 NP-complete problems augmented by a couple of thousand approximations to the same, we select a basic set of six problems and an augmented set of another six problems, making a dozen overall. We list the problems below with both a user friendly rendering of the problem and a formal statement of the problem.
The First problem is the Boolean satisfiability problem. This can be related to circuit design where if we have a combination of AND, OR and NOT gates will we get an output signal. It can be related to a statement made up of clauses which all have to be true. A clause in turn is a combination of variables which can be true or false. The entire statement is made up of n variables. It is a decision problem as we only have to decide if a combination can yield the value true.
The Second problem is the clique problem. In any social network there will be some special individuals who form a clique. For example in the area of Information Science we have the concept of the Invisible College or High Priests in some area of knowledge who form a clique. In the area of Information Science we learn that all organizations have their Invisible Colleges, which keep track of all information in the organization. This can be modeled by a graph where the nodes are the people with one node per person. Two nodes have an edge or are connected if the two persons have a connection in the social network. We are to determine if a clique of some give size k exists in the graph and hence in the social network it models. This requires the determination of a subset of k nodes in the graph where each node is connected to the remaining nodes. This is a decision problem as we want to know if a clique of size k exists-yes or no.
The Third problem is the vertex cover problem. In a social network we isolate a number n of persons. These persons are key persons with a wide network in the social circle. The problem is to determine is a subset k of these n persons cover the entire population in the social circle. In formal terms we take a graph and have to determine if a subset k of the n nodes of the graph covers all the edges of the graph. This is a decision problem as we want to know is a vertex cover of size k exists—yes or no.
The Fourth problem is the Hamiltonian cycle problem. We are to start from home, visit the houses of n relatives one after another and reach home. No relative should be visited twice. This problem is modeled as a graph problem where we are to visit the n nodes of a graph, enter and leave each node exactly once. This naturally occurring problem is extremely difficult to solve. Consider the genome map assembly problem. We have millions of small pair lengths. Each pair length can be modeled as a node of a graph. If a pair length comes after another then put a directed edge between the former and latter. Finally, have an edge between the last pair and the first. If we can find a Hamiltonian cycle in the graph we have found the correct assembly. For another example consider the old Walt Disney cartoon movie, Snow White and the Seven Dwarfs. This movie is made up of thousands of laboriously hand painted slides, which vary slightly from each other. They are to be shown in the proper sequence. Suppose they got mixed up and we are to rearrange these thousands of slides in the proper order. Each slide can be modeled as a node of a graph. If a slide follows another then put a edge between the former and latter. Finally, we have an edge between the final slide and the first one. If a Hamiltonian cycle exists in the graph then we have found the proper order of assembling the mixed up slides. This is a decision problem as we want to know if a Hamiltonian cycle exists—yes or no.
The Fifth problem is the Travelling Salesman Problem(TSP). This is a generalization of the Hamiltonian cycle problem. If each bus stop in the city is the node of a graph, and a bus from one bus stop to another exists then introduce an edge between the nodes. The cost of travel between the stops is the weight of the edge. We are to make a circuit or ‘chakkar’ in the graph of least cost, starting from an initial bus stop visiting all the bus stops once and coming back to the first stop, with the total cost of travel a minimum over all the possible cycles. This problem has been used in USA for minimize the cost of travel of a person who had to collect all the money from the pay phones every day in a city or town. A milkman in the Third World will find it useful if he has to deliver milk in a hundred houses every day, or newspaper boy who has to deliver the newspaper in over a couple of hundred houses every day with the minimum recurring cost. Globalization and liberalization has burdened the milkmen and newspaper boys in the Third World with gas guzzling vehicles for transport. This requires them in due course to use the Travelling Salesman Problem. As another example consider the problem of donkeys. For centuries the washereman used to use these beasts of burden extensively in India. Lakhs of donkeys were around. Globalisation and liberalization have burdened the washerman with a gas guzzling vehicle. The population of donkeys has reduced to a couple of thousands. These poor things are now being eaten up by some humans who relish their meat and will become extinct shortly. The washerman on the other hand is now saddled with a personal computer and has to use the Traveling Salesman Problem to control his fuel costs. This is an optimization problem as we want the circuit of least cost. To convert to a decision problem we ask if a Hamiltonian cycle of cost less than k exists---yes or no. We proceed as in the weight lifting problem. Then we convert the problem to a decision problem.
The Sixth problem is the Graph Coloring problem. If the map of the world/universe is to be colored and two adjacent countries/worlds are to be colored differently, we are to determine if for N countries/worlds a palette of k colors is sufficient. Another rendering of the problem is the determination if k colors are sufficient for traffic lights, in a junction of N roads. The problem is formalized by a graph with one node for each country. Two adjacent countries are modeled by an edge between the nodes representing the countries. We are to color the nodes of the graph with the palette of k colors with the constraint that two adjacent nodes do not have the same color. This is a decision problem. We want to know if k colors will do—yes or no.
The Seventh problem is the Feedback Node Set(FNS) problem. The Indian Railways have one of the largest railway networks in the world. Let us count the railway stations and assign to them the number 1 to N. We are to determine if for a given number k, we can select k railway stations with the constraint that a person can make all the tours in the Indian Railways, originating from one of the k stations, taking connecting trains and ending up finally at the starting station. The problem has applications in code generation in compilers. A compiler translates a high level language program to the machine language of a computer. The computer has either locations in RAM or in registers. If data is stored in the registers the computation is ten times faster than if the data is stored in the RAM. Registers require extra circuitry and are expensive. The problem is to minimize the number of registers required to generate the code. This problem can be reduced to the feedback node set problem. The problem is modeled by having a graph of N nodes. We are to determine if for a given k, we can select k nodes such that all cycles in the graph contain at least one node from the k selected. This is a decision problem we want to know if a FNS of size k exists—yes or no.
The Eighth problem is the Feedback Edge Set(FES) problem. Consider again the Indian railway network. Let there be |E| trains and N stations. We are to determine if for a given k, we can select k trains such that they include all tours in the Indian railways. This is modeled by a graph of |E| edges and N nodes, and we are to determine for a given k, if we can select k edges such that all tours include at least one edge. This is posed as a decision problem. We want to know if a FES of size k exists---yes or no.
The Ninth problem is the Set Cover problem. We have N goons, who form themselves into gangs G1, G2,---,Gn, with the possibility one goon being in more than one gang. We are to determine a subset of gangs that contain all the goons. The problem is modeled by having a set S and a family of subsets of S. We are to determine from the family of subsets, if some of them cover the set S. This is a decision problem.
The Tenth problem is the Exact Cover problem. We have N goons, who form themselves into gangs G1 G2,---,Gn, with the possibility of one goon being in more than one gang. We are to determine a subset of gangs which are pairwise disjoint that contain all the goons. A collection of gangs is said to be pairwise disjoint if when we take any two gangs in the collection they do not have a goon in common. The problem is modeled by having a set S and a family of subsets of S. We are to determine disjoint subsets from the family of subsets to cover the set S. The problem is a decision problem. Does an exact cover exist? The answer is yes or no. The next step is to compute all the solutions.
The Eleventh problem is the Sum of Subsets problem. We are to determine from a given set of integers a subset that sums up to a sum M that is initially specified. This problem is akin to finding some money in hard times that adds up to M. This is with the bits and pieces of monies that have been tucked away over the years in various places. This problem has been called the easiest hard problem and has proved difficult to solve. Alternatively we want to pool up an amount M for a splurge from various sources. What are the sources that will add up to the amount? It can be efficiently solved by dynamic programming methods if the numbers are small in size but if the numbers have as many digits as the cardinality of the given set then the problem becomes difficult. Normal requirements are for 100 or 1000 digits wheras we can handle 40 digits easily. This is a decision problem. Does a subset exist? The answer should be yes or no.
The Twelfth problem is the 0/1 Knapsack problem. We are to pack up our suitcase for a long international air journey with some allowed N items. Each item has a profit if it is packed and transported but associated with it is a weight. We have to stay within the weight restriction of the airlines and maximize our profit. Let the n items have the profits p1,p2,…pn and weights w1,w2,--wn and the weight restriction is M. We are to determine a subset of the items 1..n, whose cumulative weight is less than or equal to M and the profit is maximal. This is an optimization problem. We convert it to a decision problem by asking whether a solution exists greater than for a lower bound of Ml and less than an upper bound of Mu. The answer is yes or no in each case. So it is a decision problem. Just as in the weightlifting example we proceed to obtain the solution, by considering a weight in between. Thus we convert the optimization problem to a decision problem.
The process of demonstrating that each one of the above problems is NP-complete lends itself to user friendly, graphical restatements of the constructions and proofs. For this we have found that we may have to recast the constructions and proofs with common sense explanations. Also we have to modify history a wee bit. We have to redo how ancient problems were determined to be NP-complete in the first place. Also it turns out that one specific problem, the exact cover problem, is some sort of a golden goose. Many NP-complete problems can naturally, visually, pictorially and hence graphically be expressed using the exact cover problem. This becomes easier if we augment the elements of the sets in the exact cover problem. One way is to allow some of the elements to be multisets. The other method is to attach semantics and semantic routines to the elements. This is akin to the strategy followed in syntax directed compilers. Such an augmented exact cover problem lends itself to nice graphical presentations of NP-complete problems. In fact we can digress and note that computations using the automata & grammars of the Chomsky hierarchy can naturally and neatly be expressed in this augmented exact cover problem. Thus the venerable satifiability problem can perhaps be retired. The more visually colorful, active, and younger exact cover problem and its variants can be used in user friendly presentations. The augmented exact cover problem with semantic routines is akin to a special case of constraint data bases.
 Horowitz , E. and Sahni, S. K., Computer Algorithms, Galgotia, 1993.
 Knuth, D. E., The Art of Computer Progamming, Vol I, Addison-Wesely, 1971.
Aho, V., Hopcroft, J. E., and Ullman, J.D., The Design and Analysis of Computer Algorithms, Addison-Wesely,1979.
 Iverson, K. E., A Programming Language, John-Wiley and sons, 1962.
 Kernighan, B. and Ritchie, D., The C programming language, 1968, THE WHITE BOOK.
 Hopcroft, J. E., and Ullman, J. D., An introduction to Automata, Languages and Computation, 1977, THE CINDERELLA BOOK.
 Aho, A.V., and Ullman, J.D., Principles of Compiler Design, 1977, THE DRAGON BOOK.
 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, Introduction to Algorithms.
 The Complexity of Theorem-Proving Procedures. Stephen A. Cook. University of Toronto. 1971.
 REDUCIBILITY AMONG COMBINATORIAL PROBLEMS. Richard M. Karp. University of California at Berkeley.1972.