



























draft
USER FRIENDLY EXPOSITION
OF COMPUTER SCIENCE SUBJECTS FOR MASS EDUCATION
ABSTRACT
The
rapid proliferation of computer science training and educational requirements
in India to
cater to the explosively growing outsourcing industry has led to many problems.
A major problem has been the proliferation of engineering colleges without the
requisite dedicated, competent and skilled faculty in core subjects. An experiment
conducted over the last decade in computer science instruction for mass formal
and informal education in a core subject is reviewed. Successful courseware and
material has been created over the last few years for not for profit mass distribution and instruction in the core
subject. This is for either
formal instruction or self study of the subject at undergraduate and beginning
graduate educational level in computer
science. The target is a variety of engineering, computer
applications colleges and computer training institutions of various degrees of quality in the interior
regions of India in remote towns and villages. It is expected that with the
growing global village, the rising and high costs of real estate and
infrastructure in the overburdened cities a major shift is of outsourcing is
evident to the interior regions of the Third World.
The Y2K problem and the
dot com boom & bust led to the major revolutionary change in the
international software scene of the Indian IT professional being identified
with outsourcing at the lower levels and phases of the software development life cycle(SDLC).
This led to an
explosion in outsourcing to India over
the last decade. In a parallel line this also resulted in engineering colleges
mushrooming for formal education in computer science, information technology
and computer applications. The insatiable needs of the software outsourcing
industry for personnel has led to an acute shortage of trained and competent
faculty in the engineering and computer
applications colleges in core computer science subjects.
The rapid mushrooming of
engineering and computer applications colleges has led to a dichotomy in the mass production of
graduates oriented educational process over the last two decades in India. This
is best explained by an example drawn from the area of Modeling and Simulation.
In his visit to India
in the seventies Prof C A R Hoare of Oxford pointed out the unreality of the
simulation process. When the actual machine being simulated is actively and noisily doing the work
with utmost vigor the simulator is silent, maintaining a ghostlike silence. When the
machine is idle and silent the simulator grinds away furiously.
Thus many of the formal educational and training institutions and the associated plethora of statutory
bodies regulating the same are like the
simulator concentrating, in all forums with enormous fanfare, on the academic
requirements, formalities, paper work,
administration and all formalities with utmost vigor without the students and
teachers being really involved. At times the teachers and students may be
mostly or partially not be present at all. When the simulator is totally silent
like in the wee hours of the morning, evening and late hours of the day, holidays or off on
a seasonal/ yearly vacation, the actual process of acquiring information and
knowledge by the students takes place. Many coaching and training institutions
have sprung up all over for imparting the actual educational requirements of
teaching, coaching and all that goes into the basic requirements of actual
information and knowledge being imparted to the student community. This is like
the simulator closing its eyes, being disdainfully and totally silent while the
machine is grinding away. Our study is to cater more to the machine consisting
of the young generations and
not the simulator which primarily draws personnel from the older
generations. We have of course extensively and intensively used the material
presented here in the framework of the formal educational processes and
institutions.
The major skill
requirements of a software professional for outsourcing in the IT industry have been determined
almost 40 years ago. These requirements have been enunciated by Prof C A R Hoare of Oxford University in his visit to India in the
seventies. Software outsourcing requires that at the higher levels of the
software development life cycle(SDLC) the entire work
to be done in the outsourcing situation has to be thought out to the last
detail. Then the best possible persons with the highest IQ are to be employed,
and they should implement the details determined upon without the slightest
variation being allowed. It is this
strategy that was initially proven in the history of outsourcing with enormous
success. This was in the maintenance of tens to hundreds of millions of lines
of legacy software code.
Software Engineering specialists add to
the above requirements the personnel requirements of youth, competent
education, good communication skills, ability to work in a team, some skill in
programming, good command of English, a basic two years industrial experience
and the ability to work for low wages. The BRIC countries satisfy the need for
masses of personnel with these characteristics being available.
It appears that some industrial
and academic environments that did not understand the crucial need for a mass
of motivated, young, high
IQ personnel for carrying
out the software drudgery activities of
large legacy software, for low wages,
have failed to prevent outsourcing to the BRIC countries. It appears advanced economies underestimated the requirement of high IQ, high intrinsic skills and low wages. They decided it was not crucial for routine maintenance activity. It appears this wrong judgement caused outsourcing to the BRIC countries, but this is not clear.
In India the requirements of
the outsourcing industry are met only by a small percentage of the students
turned out by engineering colleges at present. A very large bulk of the
graduates is not easily finding a place in the IT outsourcing industry. This can be
attributed largely to the dichotomy of the educational process i.e. the actual knowledge acquired
when one goes through the educational process for a degree versus the formal
administrative and academic requirements to be satisfied to obtain the same. Another major reason is the fundamental requirement of high IQ or as high as the cut off point demanded by the software outsourcing industry.
Computer science has the doubtful distinction
of having some core areas having over developed and over studied in the sixties and
seventies of the last century. This has led to certain compilations which have
remained classic text books at the undergraduate level for the last 40 years.
Among these are the White Book for C, the Cinderella Book for Automata, the
Dragon Book for
Language Processors. The knowledge contained herein is more than sufficient. As far as the last two are concerned they not necessary for most jobs in the outsourcing industry. However it is essential to note that quite a few of these subjects have been ordained, by the powers that be, to be essential to the curriculum and the students all over have to clear the same. The masses of persons required for outsourcing is in millions and requires a mass of self-taught instructors.
In computer science
education a core subject we have selected for which there is an acute shortage
of faculty is the area of Design and Analysis of Algorithms, popularly called
DAA by the student community. For a major portion of the subject in the Indian
computer science and software area the classic text by Horowitz et al[1] has remained
the favorite with the boards of studies of various educational institutions. The
international scene may have accepted other renderings of the area. The subject
also has the difficulty of having an acute shortage of faculty who can impart
instruction in the same. Combined with data structures it is essential in
induction training programs in the training and outsourcing software industry.
The intake of the Indian
engineering colleges over the entire country runs into lakhs
of students. Assuming that the average teaching scope of one instructor is
about 100-200 students, tens of thousands of instructors are required per
subject. This is apart from the requirements of the computer training industry.
This requires that for some core subjects like DAA mass methodologies must be
in place for the mass creation of instructors. We suggest a scheme which we
have created after extensive experimentation over the last two decades, and
crystallized into working courseware over the last five years, at M J College
of Engineering.
The basic contents of a
DAA course can be defined to consist of about 30 basic algorithms. These
algorithms can be described employing either by a programming language like C,
C+ or JAVA or by informal algorithmic notations. We suggest a method used
historically in dBase in the childhood days of the personnel computer. This
was description by example called query by example. If a properly formulated
instance of an algorithm is simulated in detail then we have by generalization
the entire description of the algorithm. This reduces the core instruction in
DAA to a simulation of about 30 algorithms in step by step detail with a
properly chosen sample instance.
The instances of the 30
examples should be capable of wide reuse. It is seen that if examples are drawn
from widely available resources or the standard texts then there is a wide uniformity in their use
in the Indian environment. Algorithms not covered in the texts can be tackled
by using GATE problems which can be considered a free resource. GATE is a
competitive examination for entrance to IITs for postgraduate education and
some leading institutions in India. The previous question papers are in the
public domain and the problems on Algorithms there can also serve as a good,
standard, widely available and acceptable resource.
The selection of the
problems has also been influenced by the requirements of the SWITCH group of companies when
conducting induction training programs in data structures and programming in C,
C++ and JAVA. This is drawn from the requirements of a company in which one of
the authors has been actively involved in outsourcing during the period
1996-2003. The number of trainees experimented upon was about 1500.
The requirement of a mass
of instructors to be available in the engineering & computer applications colleges, and that too at short notice is a severe
limitation. To be taken into account is the rapid attrition of this mass of
instructors to the IT outsourcing industry. The major challenge is that the
instructors should be made acceptably competent in teaching and instructional
techniques in a small time frame and with the minimum effort on their part.
The first line of attack
is the exposition of the algorithms itself. In his directives for the
development of computer science education and the Indian software industry,
Prof C A R Hoare had pointed out that there was a major need to do away with
the unnecessary abstraction computer science has wound itself into. The basic
concepts of computer science were simple, made unnecessarily complex. He
suggested that common sense expositions are the way to proceed to tackle the
results of topics like concepts and algorithms.
Experimentation in common
sense expositions in the rendering of algorithms over the last two decades at
the public sector undertakings like
ECIL, various professional societies in Hyderabad like the IETE and
CSI, educational institutions under
Osmania University, JNTU, BITS, Kakateya University, Nagarjuna university etc., outsourcing industries like
Satyam computers, social charitable and
social expositions under the auspices of
the Lion’s club and other forums were attempted
by the authors. These common sense expositions were related to the
social and cultural requirements in an Indian setting. The audience is drawn
from feudal agrarian environments of towns, and villages without the necessary sophistication
of environments the industrial and information
revolutions bring. Over a period of time we have found that common sense
expositions and rendering of the problems in the algorithms were possible and
widely well accepted and received by the audiences.
Since the area of
algorithms is basically mathematical it is possible to have an interpretation
of an algorithm in a particular setting. Mathematical abstractions have an
ubiquitous component in that common sense interpretations of a mathematical concept
or problem can be found in any area of human thought and action. Thus
regardless of the environment being feudal or modern, whether it belongs to the
agrarian, industrial or information revolutions, whether it is urban or rural
we can find common sense interpretations to algorithms. However, commonsense
interpretations are culture dependent. Thus what may be a commonsense
interpretation in a culture or in a particular geographical region may not be
considered commonsense in another culture or in another geographical region.
Thus common sense interpretations have to be localized like the search results
in Google and other search engines.
It is a well known fact
that visual methods are superior to text. In the ancient Chinese saying: a
picture is worth a thousand words. Early expositions of Data Structures and
Algorithms in the initial formative days of programming and computer science
were highly mathematical. An example is the use of Iverson’s APL as a tool for DAA[4]. The classic work of Knuth[2]
culminated in popularizing visual methods for data structures. The fundamental
statement of almost all algorithms of practical interest using ADT’s is known[3]. To this set may be added a few to complete the collection[8].
Completeness and rigor are
claimed in the exposition of algorithms by some sources[8].
However these may not be requirements in the mass educational requirements for
the outsourcing software industry. Expositions where the formally stated
algorithms and associated data structures can be naturally visualized and easily
mapped to the C, C++ or JAVA programs seem more relevant to the outsourcing
industry[1]. It has been noticed that mediocre and above average students feel
the expositions with rigor and formality give the impression of the last word
in the algorithm or its formal treatment being said. Some expositions allows a continuous
development, modification and varied implementation of the algorithm to be
easily visualized[1]. This allows easy implementation in major popular
programming languages
with varied data structures. This is more suitable for preparing the
student for a professional career in the software outsourcing industry.
Algorithms are to be
studied along with the data structures used for the algorithm[1]. Even if ADT’s
are used, they can be visually represented in colorful, rich user friendly using tools easily
and widely available worldwide with MS-Office.
In our experimentation
over the last five years a method initially used was to have custom plastic pieces of various colors
for arrows and numbered colored round pieces for nodes to simulate algorithms
using the tree data structures. Square numbered plastic pieces of various
colors were used for the simulation of linked lists. This was necessitated as
there was at times no electrical connection in the remote geographical area
being considered. There was at times no access to paper or a blackboard. There
was no access to the PC as the charge on the same lasting for a reasonable amount of time was not
possible. Attempts to use these plastic
pieces for instructional purposes in the areas of data structures and
algorithms had the disadvantage that an error in the presentation with a
detailed simulation of the algorithm can have catastrophic pedagogical effects.
Also it was difficult to keep track of the instantaneous descriptions or
snapshots of the computation, and one could not backtrack to a previous
position.
The situation has changed
over the last few years with the advent of netbooks
with rapid growth of disk space in the ubiquitous personnel computer. (Note we have said personnel computer, rather than personal computer, as it is a shared resource in a class, a family or in a social circle, in most of India.) The use
of presentations using the easily available MS-Office was restricted to
megabytes of storage a decade ago. Large graphical images and content could not
be used. The high prohibitive price in India of writing on a CD-ROM a decade
ago, the low storage capacity of floppy drives, low disk capacities, small RAM
memories and compared to today the low processing speeds did not allow bulky, highly graphic
presentations to be used in the environment. The recent explosion in disk
capacities extending to terabytes allows gigabytes of storage to be easily
spared for a powerpoint presentation, presented in parts.
The ability of many peripheral devices to draw power from the USB port has been
a boon. The has been rapid rise in RAM capacity to gigabytes and the rapid
increase in the processing speeds and methods of processing in current easily
affordable first or second hand, personnel computers. This allows routine rich, colorful, user
friendly, high graphic content to be created by persons of ordinary skill and
talents, in a few hours or days. The availability and affordability of high capacity
pendrives, by all levels of society, allows easy transportation and distribution of
presentations requiring gigabytes of storage at very little cost or no cost at
all.
The technology
as available now allows multiple
snapshots of the entire computation. This is akin to the instantaneous
descriptions of the computation being recorded at each time unit by a slide, or a small ordered set of slides, in
the presentation. The use of such snapshots is very common in theoretical
computer science like the simulation of an automata like a Turing Machine[6]. In fact it is useful for the entire simulation,
in excruciating detail of a finite computation, or the sketch of some infinite
computations, possible on many
deterministic and nondeterministic computational devices.
Thus the
technology widely, easily and
cheaply available in India allows the medium of high graphics oriented powerpoint
presentations for all computer science
and software oriented subjects. We have found that in our presentations
on algorithms each algorithm, with suitable sample data as input, can be described in detail in about 500 to
1000 power point slides using upto half a of gigabyte of storage, in either one
presentation or a group of them. The exposition proceeds by noting that what an
instructor should do in the class, in a period of one or two hours, is a step
by step development of the algorithm by executing it for a suitably chosen
sample input. This is translated into a slow step by step sequence of annotated
power point slides in a method akin to the classic development of Walt Disney
movies like Snow White and the Seven Dwarfs. Each step is described in
excruciating detail by a new power point slide, thus requiring hundreds of
slides per algorithm. This allows the student or faculty to reuse the
presentations indefinitely for future recapitulation and revision. The normal
development time is 20 manhours of work for an hour
of presentation on the average. The basic requirement however is a mastery of
an excruciatingly detailed simulation process of an algorithm and its data
structures for a finely grained presentation to mirror the same.
Alternatives
to the same are seen on the internet for a long time. Simulations by JAVA applets of algorithms abound on
the internet. These perhaps give a visual representation. An algorithm has to
be simulated by hand for an average human being to understand it. There is no
escape from this requirement. The alternative we suggest is that it can be
watched as the presentation gives this simulation in detail as if by hand.
Historically this was done by pencil and paper and the use of the same was
advocated, as opposed to just doing it mentally[2].
Thus the JAVA simulations on the
internet by applets do not result in the knowledge of how the algorithm works
to be transmitted to the user. They do not also give the entire snapshots of
the computation in excruciating detail, tailor made for use by human teachers of different cultures.
It may be suggested that the presentations be automatically generated. The
software may be developed but what is required is different. One human being
should go through the excruciating process in great detail, with dynamically
varying commentary suited to the audience, for every instantaneous description
of the computation or slide of the powerpoint
presentation. This involves
stepping through the instantaneous descriptions and explain them
to another human being in detail without error.
To make the presentations
widely reusable and acceptable the data for the algorithms is drawn primarily
from previous GATE papers and secondarily from standard texts. This
allow the presentations to be used for notes as a supplement to the
standard syllabus contents of DAA, in most colleges and universities in India.
The presentations have
been used with a variety of audiences. They have been adapted to the curriculum
of the DAA syllabus of about half a dozen universities so far. They have been
experimented with instruction in GATE classes. The instruction of students in
rural engineering colleges has proved to be a challenge. Some colleges have
students who have ranks in excess of a lakh in EAMCET,
the common entrance test for engineering in Hyderabad. In experimenting with
the instruction of DAA in rural engineering colleges it was found that the low
ranks of the students were not problems of IQ. The low ranks in most cases has
been the effect of a sudden change to the medium of instruction changing from
Telugu or Urdu to English at the college level. By generalization the situation
is the same throughout India. The bulk of the population is educated in regional languages
at the school level. A transition to English as a medium of education occurs at
the college level.
In this connection a major motivation was the directive of the legendary Prof E W Dijkstra, both orally and in writing (through a EWD) to one of the authors. He held that the natural language used by a human being for elaboration in software and computer science did not matter. What was required was a mastery of the natural language. Thus the medium of instruction being in regional languages does not matter, in the long run. A temporary discomfort of about a year is felt by the student when a transition to English as the medium of education occurs at college level. In parts of India the regional language is employed by software engineers involved in outsourcing. The client outsourcing may be in Europe, America or Japan.
It was found that as the presentations are
basically pictorial in nature. They have a high requirement of mastery of mathematical concepts which can be stated in
common sense terms. They are basically natural language independent. They can
be adapted to expositions in any natural language. Thus the use of the presentations aids in the
understanding of other areas of computer science and software.
The rapid growth of the
global village is evident around us. The last decade and especially the last
five years have seen the global village effect reach remote villages and towns
in India. On the other hand the rapid rise in real estate, the collapsing
strain on the infrastructure of large cities is evident. The shift of the
outsourcing industry to the interior regions of India and the Third World is
being actively considered. In such a case the requirements of skilled and
competent software professionals in the interior becomes essential. We feel
that the use of the rapid growth of technology has to be used. This will allow
rapid growth of the software outsourcing industry for employment of the
graduates of the colleges. Our effort is a step in this direction to cater to
mass development of instructors in computer science. As a parallel line of
approach we tackle mass training and education in computer science subjects.
The aim is to distribute
the presentations on a no profit basis to a wide variety of students and
faculty and obtain feedback on their use for mass training and education. The
total storage capacity requirements of the presentations runs
into gigabytes. Wide distribution over the unreliable and not always available
Web in India is not possible. Also the bandwith
widely available economically on the Internet to the common person is limited. The
alternative is free distribution via the medium of pen drives. Either of the
authors can be contacted for a copy of the presentations. They can
easily be rearranged to cater to most of the curriculum requirements in the
subject for most educational and training institutions. Free modification, adaptation and expansion of
the presentations can be to the choice and taste of the student or faculty. We would
welcome a natural language translation of the presentations into the major
regional languages of India, and experimentation with expositions at the
primary and secondary school level. It may be noted that we have said even the
primary school level. The classic exposition is that all algorithms can be
reduced to about 30 easily stated problems[3].
The 30 problems considered by us have some
aspects where a simple common sense rendering can be easily explained to primary
school children. It may not be normally possible to explain the intricacies of
the algorithm and data structures in full detail to this level of audience. We
have found that graphic simulations of backtracking algorithms like the 4 queens problem, the Hamiltonian cycle problem and the graph
coloring problem are easily grasped by primary school children when presented
by the medium of our presentations. The problem statements of many of the algorithms can be introduced at this level with simple
commonsense renderings. For this purpose
algorithmic notation is not used. We follow the tradition used by innumerable
cultures for centuries in
the pre ALGOL-60 era and
explain the algorithms in natural languages with commons sense
explanations. The sacred constructs and concepts of ALGOL-60, PASCAL, JAVA, the
development of algorithms using pidgin
varieties of these and the associated venerable formal verification proofs are
left for a much later stage in the life
of the primary school children.
Our presentations span 30
problems which are basic to the DAA area and form the core of any curriculum in
DAA at the undergraduate level of a computer science or information technology
engineering baccalaureate course or a degree in computer applications in India.
The presentations have been prepared with an effort of about 1000 person hours
over a physical time duration of six months. They have
been experimented upon with audiences of various backgrounds, drawn from the
entire country, totaling about 3000 persons in various universities, colleges, coaching and training
centers in the southern states of India.
For detailed understanding
of the proof and time complexity of the algorithm common sense explanations can
be given. This requires that the culture independent formal mathematical
apparatus be dismantled. A particular suitable commonsense interpretation of
the problem be taken suited to the local cultural
environment. The construction then be recast in the
local setting. The
construction if properly chosen can be made to be very user
friendly, intuitive, and common sense based. Care should be taken no to offend
the cultural sensibilities of the local environment. Acceptable comparisons,
statements, and comments on ways of life in one environment may not be
acceptable at all when the environment, culture, geographical or economic situation of
the audience varies even slightly. What is common sense in one part of India
with one type of audience, may not be understood in
another part of India or same part of India with slight variations in the type
of audience.
A particular cultural
environment that seems to be universal is to use the world of nursery and
primary school children. Common sense examples from this world seem to have an universal appeal and are basically culture independent.
They are also universal and powerful and can cater to the interpretation of any
mathematical construct, concept or theorem. Thus they can be used for
commonsense renderings of algorithms. The world of Winnie the Pooh and Tigger seem to have universal appeal and will not result in
the instructor stepping on someone’s cultural toes.
For a soft copy of the
presentations, the logistics of obtaining the same and tailoring the same to
cater to a particular curriculum the authors may be contacted by e-mail. As
pointed out earlier in the existing Indian environment, the Web cannot be used
cost effectively for distribution, the presentations are too large for normal
CD’s, perhaps DVD’s would do but the cost of cutting them involves
administrative, economic and time overhead. Hence the ancient snail mail or its
variants, with high capacity pen drives, the social network and the public
transportation network have to be used.
APPENDIX
We now proceed to exhibit
that commonsense rendering of many concepts of computer science is possible. In
particular we take twelve computationally related problems and restate them in
a commonsense way.
In all areas of knowledge
in Arts, Sciences, Engineering and Technology a common modeling device is
combinatorial optimization. In the area of combinatorial optimization some 3000
hard problems have been isolated by Computer Scientists and mathematicians
which though deceptively easy are extremely hard to solve by the computer or by
hand. Among these problems are the NP-complete problems.
The
NP-complete problems are problems practically arising in computation in
industry, research and all activities of human endeavor. They have the
interesting property that all attempts to solve them over the centuries have
yielded problems of exponential time complexity. This restricts the input
problem size to about 40 elements. Practical models of problems have 100 to
1000 elements.
To
understand the concept of NP-complete problems let us consider the same from a user friendly
point of view. Let us consider the actions of the early Christians and Jesuits
in their pursuit of new converts to their religion. When they go round the
world to a new kingdom or tribe consisting of a million or couple of thousand persons, a
long, laborious and time consuming
method is to convert each one of the million or thousands of persons.
Alternatively a much simpler method may exist and this is what they
followed. What they did was to isolate a
kingpin person in the new kingdom or tribe. This could be the king, the chief himself or
a very influential noble. If that person could be converted then all the
million people of the kingdom or the entire tribe will fall in line and get converted. The king,
the chief or the
noble involved is called the “-complete” person. The same technique as per
Sociology is followed when one religion wants to swallow up another, or one
culture would like to swallow up another. Isolate a kingpin and convert him/her
then all fall in line. This is what happened as per Russian history in Russia.
The kinpin converted himself
to European culture and everybody fell in line and switched.
The
same problem is followed by a teacher in the class. If the class if unruly the
long way is to convince all the students in the class to behave. For
undergraduates this could be 200-300 students to be convinced. This is a long
and laborious. Alternatively, get hold of one kingpin student, the leader and
discipline him. Then all the students fall in line. This is what we mean when
we say -- “make an example of”.
The
method is followed by the police to control the goons in town, or break up an
agitation led by some politician as the eye of the cyclone. Isolate the kingpin
and discipline him. Everyone falls in line. This is the concept behind the
NP-complete problems.
The
existence of these kingpin problems was originally shown by Cook[9],
who showed the existence of the first NP-complete problem, called satisfiability. Karp[10] isolated
21 classic problems which were also shown to be kingpins or NP-complete
problems. For centuries these
problems had led countless
mathematicians, of all calibers, to lifetimes
of despair and found to be extremely hard to solve in a reasonable way in a
short span of time.
In
simple terms, we have two classes of problems: the goons and ordinary citizens.
We want to reform the goons and make them ordinary citizens. Cook[9] isolated the venerable kingpin goon, the satisfiability problem. If this goon can be reformed to an
ordinary citizen then all goons can be reformed. If the kingpin goon is incorrigible and
cannot be converted by any means to an ordinary citizen then the goons form a
distinct class and will contain many goons who cannot be reformed.
The
general process of showing that a problem is NP-complete can be described in a
user friendly way. Let us take the
kingdom of the goons. The basic requirement to be a kingpin goon is that one
must be a bonafide goon in the kingdom. This is like
some countries that allow anyone to become the President of the country but the
basic requirement is that one must be a bonafide citizen
of the country. The next step is that there is already a club of kingpin goons.
Cook started the initial membership with the satisfiability
problem. Karp further populated it by an aristocracy of 21 goons. This club has
exclusive membership. Membership is by invitation. An existing and known member
of the club, a bonafide kingpin goon, must go through
the procedures of converting a goon from the kingdom to a kingpin goon. He must
demonstrate that the goon chosen for elevation is a bonafide goon
in the first place, then that he is a peer of the kingpin goons, thus elevate him to the exclusive membership of the club. After Karp
large classes of goons were shown to be kingpins[1],
and the club effectively became a
democracy . Today thousands of goons in the kingdom are known to be wearing the
‘dunce cap’ of being kingpin problems. It was shown by Sahni[1] that there are no
approximations to the kingpin goons
being reformed to behave almost like ordinary citizens. Once a kingpin goon always a kingpin goon.
We
classify problems in two categories, decision problems and optimization
problems. Decision problems have a binary answer, yes or no. A solution exists
or does not exist. NP-complete problems are decision problems.
Optimization
problems are slightly different. Let us consider a user friendly example. We
would like to know what weight a weightlifter can lift. We call this an
optimization problem. We convert it to a decision problem by giving the
weightlifter two weights on small and one large. He easily lifts the small
weight. Then if he cannot lift the
heavier one then we try something midway. This partitions the weights between
the heavy and light weight into a lower and upper half. We proceed
to that half and repeat the process. At the end we end up with the weight he
can lift. Thus we convert an optimization problem to a decision problem. This
like the maximum height one can jump,
the maximum number of doughnuts a person can polish off in one sitting,
from how far away can one hear one’s teacher shout at a student, the minimum
bribe to be given to the government employee for a routine job etc. are all
examples of optimization problems. In principle they can all be converted to
decision problems like in the weightlifting example, though in practice it may
not work with a rapacious person in the bribe case.
From
the set of 3000 NP-complete problems augmented by a couple of thousand approximations
to the same, we select a basic set of six problems and an augmented set of
another six problems, making a dozen overall. We list the problems below with
both a user friendly rendering of the problem and a formal statement of the
problem.
The First problem is the
Boolean satisfiability problem. This can be related
to circuit design where if we have a combination of AND, OR and NOT gates will
we get an output signal. It can be related to a statement made up of clauses
which all have to be true. A clause in turn is a combination of variables which
can be true or false. The entire statement is made up of n variables. It is a
decision problem as we only have to decide if a combination can yield the value
true.
The Second problem is the
clique problem. In any social network there will be some special individuals
who form a clique. For example in the area of Information Science we have the
concept of the Invisible College or High Priests in some area of knowledge who
form a clique. In the area of Information Science we learn that all
organizations have their Invisible Colleges, which keep track of all
information in the organization. This can be modeled by a graph where the nodes
are the people with one node per person. Two nodes have an edge or are connected
if the two persons have a connection in the social network. We are to determine
if a clique of some give size k exists in the graph and hence in the social
network it models. This requires the determination of a subset of k nodes in
the graph where each node is connected to the remaining nodes. This is a
decision problem as we want to know if a clique of size k exists-yes or no.
The Third problem is the
vertex cover problem. In a social network we isolate a number n of persons. These persons
are key persons with a wide network in the social circle. The problem is to
determine is a subset k of these n persons cover the entire population in the
social circle. In formal terms we take a graph and have to determine if a
subset k of the n nodes of the graph covers all the edges of the graph. This is
a decision problem as we want to know is a vertex cover of size k exists—yes or
no.
The Fourth problem is the
Hamiltonian cycle problem. We are to start from home, visit the houses of n relatives one after
another and reach home. No relative should be visited twice. This problem is modeled as a graph problem
where we are to visit the n nodes of a graph, enter and leave each node exactly
once. This naturally occurring problem is extremely difficult to solve. Consider
the genome map assembly problem. We have millions of small pair lengths. Each
pair length can be modeled as a node of a graph. If a pair length comes after
another then put a directed
edge between the former
and latter. Finally, have an edge between the last pair and the first. If we
can find a Hamiltonian cycle in the graph we have found the correct assembly.
For another example consider the old Walt Disney cartoon movie, Snow White and
the Seven Dwarfs. This movie is made up of thousands of laboriously hand
painted slides, which vary slightly from each other. They are to be shown in
the proper sequence. Suppose they got mixed up and we are to rearrange these
thousands of slides in the proper order. Each slide can be modeled as a node of
a graph. If a slide follows another then put a edge
between the former
and latter. Finally, we have an edge between the final slide and the first one.
If a Hamiltonian cycle exists in the graph then we have found the proper order
of assembling the mixed up slides. This is a decision problem as we want to know
if a Hamiltonian cycle exists—yes or no.
The Fifth problem is the
Travelling Salesman Problem(TSP). This is a
generalization of the Hamiltonian cycle problem. If each bus stop in the city
is the node of a graph, and a bus from one bus stop to another exists then
introduce an edge between the nodes. The cost of travel between the stops is
the weight of the edge. We are to make a circuit or ‘chakkar’
in the graph of least cost, starting from an initial bus stop visiting all the
bus stops once and coming back to the first stop, with the total cost of travel
a minimum over all the possible cycles. This problem has been used in USA for
minimize the cost of travel
of a person who had to collect all the money from the pay phones
every day in a city or town. A milkman in the Third World will find it useful
if he has to deliver milk in a hundred
houses every day, or newspaper boy who
has to deliver the newspaper in over a couple of hundred houses every day with
the minimum recurring cost. Globalization and liberalization has burdened the
milkmen and newspaper boys in the Third World with gas guzzling vehicles for
transport. This requires them in due course to use the Travelling Salesman
Problem. As another example consider the problem of donkeys. For centuries the washereman used to use these beasts of burden extensively
in India. Lakhs of donkeys
were around. Globalisation and
liberalization have burdened the washerman with a gas
guzzling vehicle. The population of donkeys has reduced to a couple of
thousands. These poor things are now being eaten up by some humans who relish
their meat and will become extinct shortly. The washerman
on the other hand is now saddled with a personal computer and has to use the Traveling Salesman Problem to
control his fuel costs. This is an optimization problem as we want the circuit
of least cost. To convert to a decision problem we ask if a Hamiltonian cycle
of cost less than k exists---yes or no. We proceed as in the weight lifting
problem. Then we convert the problem to a decision problem.
The Sixth problem is the
Graph Coloring problem. If the map of the world/universe is to be colored and
two adjacent countries/worlds are to be colored differently, we are to determine if for N
countries/worlds a palette of k colors is sufficient. Another rendering of the
problem is the determination if k colors are sufficient for traffic lights, in a junction of N roads. The problem is formalized by a graph with
one node for each country. Two adjacent countries are modeled by an edge
between the nodes representing the countries. We are to color the nodes of the
graph with the palette of k colors with the constraint that two adjacent nodes
do not have the same color. This is a decision problem. We want to know if k
colors will do—yes or no.
The Seventh problem is the
Feedback Node
Set(FNS) problem. The Indian Railways have one of the largest railway networks in the world. Let us
count the railway stations and assign to them the number 1 to N. We are to
determine if for a given number k, we can select k railway stations with the
constraint that a person can make all the
tours in the Indian Railways,
originating from one of the k stations, taking connecting trains and
ending up finally at the starting station. The problem has applications in code generation in compilers. A compiler
translates a high level language program to the machine language of a computer.
The computer has
either locations in RAM or in registers. If data is stored in the
registers the computation is ten times faster than if the data is stored in the
RAM. Registers require extra
circuitry and are expensive. The problem is to minimize the number of registers
required to generate the code. This problem can be reduced to the feedback node
set problem. The problem is modeled by having a graph of N nodes. We are to
determine if for a given k, we can select k nodes such that all
cycles in the graph contain at least one
node from the k selected. This is a decision problem we want to know if
a FNS of size k exists—yes or no.
The
Eighth problem is the Feedback Edge Set(FES) problem. Consider again the Indian
railway network. Let there be |E| trains and N stations. We are to determine if
for a given k, we can select k trains such that they include all tours in the
Indian railways. This is modeled by a graph of |E| edges and N nodes, and we
are to determine for a given k, if we can select k edges such that all tours
include at least one edge. This is posed as a decision problem. We want to know
if a FES of size k exists---yes or no.
The Ninth problem is the
Set Cover problem. We have N goons, who
form themselves into gangs G1, G2,---,Gn, with the
possibility one goon being in more than one gang. We are to determine a subset
of gangs that contain all the goons. The problem is modeled by having a set S
and a family of subsets of S. We are to determine from the family of subsets, if some of them
cover the set S. This is a decision problem.
The Tenth problem is the
Exact Cover problem. We have N goons,
who form themselves into gangs G1 G2,---,Gn, with the
possibility of one goon being in more than one gang. We are to determine a
subset of gangs which are pairwise disjoint that
contain all the goons. A collection of gangs is said to be pairwise
disjoint if when we take any two gangs in the collection they do not have a
goon in common. The problem is modeled by having a set S and a family of
subsets of S. We are to determine
disjoint subsets from the family of subsets to cover the set S. The
problem is a decision problem. Does an exact cover exist? The answer is yes or
no. The next step is to compute all
the solutions.
The Eleventh problem is the Sum of Subsets problem. We
are to determine from a given set of integers a subset that sums up to a sum M
that is initially specified. This problem is akin to finding some money in hard
times that adds up to M. This is with
the bits and pieces of monies that have been tucked away over the years in
various places. This problem has been called the easiest hard problem and has
proved difficult to solve. Alternatively we want to pool up an amount M for a
splurge from various sources. What are the sources that will add up to the
amount? It can be efficiently solved by dynamic programming methods if the
numbers are small in size but if the numbers have as many digits as the
cardinality of the given set then the problem becomes difficult. Normal
requirements are for 100 or 1000 digits wheras we can
handle 40 digits easily. This is a decision problem. Does a subset exist? The
answer should be yes or no.
The Twelfth problem is the
0/1 Knapsack problem. We are to pack up our suitcase for a long international
air journey with some allowed N items. Each item has a profit if it is packed
and transported but associated with it is a weight. We have to stay within the
weight restriction of the airlines and maximize our profit. Let the n items
have the profits p1,p2,…pn and weights w1,w2,--wn and the weight restriction is M. We are to determine a
subset of the items 1..n, whose
cumulative weight is less than or equal to M and the profit is maximal. This is
an optimization problem. We convert it to a decision problem by asking whether
a solution exists greater than for a lower bound of Ml and less than an upper
bound of Mu. The answer is yes or no in each case. So it is a decision problem.
Just as in the weightlifting example we
proceed to obtain the solution, by considering a weight in between. Thus
we convert the optimization problem to a decision problem.
The process of demonstrating
that each one of the above problems is NP-complete lends itself to user
friendly, graphical restatements of the
constructions and proofs. For this we
have found that we may have to recast the constructions and proofs with common sense explanations. Also we have to
modify history a wee bit. We have to
redo how ancient problems were determined to be NP-complete in the first place.
Also it turns out that one specific problem, the exact cover problem, is some sort of a golden goose. Many NP-complete
problems can naturally, visually, pictorially and hence graphically be
expressed using the exact cover problem. This becomes easier if we augment the
elements of the sets in the exact cover problem. One way is to allow some of
the elements to be multisets. The other method is to
attach semantics and semantic routines to the elements. This is akin to the
strategy followed in syntax directed compilers[7]. Such an augmented exact
cover problem lends itself to nice graphical presentations of NP-complete problems.
In fact we can digress and note that computations using the automata &
grammars of the Chomsky hierarchy can naturally and neatly be expressed in this
augmented exact cover problem[6]. Thus the venerable satifiability
problem can perhaps be retired. The more visually colorful, active, and younger
exact cover problem and its variants can be used in user friendly presentations.
The augmented exact cover problem with semantic routines is akin to a special
case of constraint data bases.
References:
[1] Horowitz , E. and Sahni, S. K., Computer
Algorithms, Galgotia, 1993.
[2]
Knuth, D. E., The Art of Computer Progamming, Vol I, Addison-Wesely, 1971.
[3]Aho, V., Hopcroft, J. E., and Ullman, J.D., The Design and Analysis of Computer
Algorithms, Addison-Wesely,1979.
[4]
Iverson, K. E., A Programming Language, John-Wiley and sons, 1962.
[5]
Kernighan, B. and Ritchie, D., The C programming language, 1968, THE WHITE
BOOK.
[6]
Hopcroft, J. E., and Ullman,
J. D., An introduction to Automata, Languages and Computation, 1977, THE
CINDERELLA BOOK.
[7]
Aho, A.V., and Ullman,
J.D., Principles of Compiler Design, 1977, THE DRAGON BOOK.
[8]
Thomas H. Cormen,
Charles E. Leiserson, Ronald L. Rivest,
and Clifford Stein, Introduction to Algorithms.
[9]
The
Complexity of Theorem-Proving Procedures.
Stephen A. Cook.
University of Toronto. 1971.
[10] REDUCIBILITY AMONG COMBINATORIAL
PROBLEMS. Richard M. Karp.
University of California at Berkeley.1972.



























