Key Limitations of Knowledge Base Systems (in 200 words of less)

Knowledge Based Systems (KBS) denotes a field of artificial intelligence research for the encoding of expert knowledge in computer logic as repository of “if-then” rules. Though successful instances of such systems are worthy of note (e.g. MYCIN, DENDRAL and PROSPECTOR) KBS have key limitations. Namely, an expert may establish semantic narrative to relate the rules that (s)he applies when addressing a problem, however when compiled into a machine the relationship between the rules becomes confused. When an expert applied rules, they are following an overall strategy for solving a specific class of problem, computer implementations take a more probabilistic approach to selecting which rules to ‘fire’. This leads to a second significant limitation that when choosing what rules to apply, the computer will attempt to exhaustively search the knowledge based. Experts on the other hand are able to focus the applicability of specific rules. Finally, KBS are not able to self acquire any semantic knowledge of the rule base, or even gain the experience needed to know when rules should be broken and why. KBS are expensive to develop, and require continuing maintenance to grow and evolve the way human experts naturally do.

Working things out backwards with Bayes

In this post, I’ll delve into Bayesian Inference which is used to determine the probably of cause and effect. That is; when I observe an event, with what certainty can I deduce the reason for it?

This is based on a probability theorem discovered by the English statistician  Thomas Bayes, though it would be wrong to attribute all the glory to him alone; Bayes’ notes were corrected, compiled and published posthumously by Richard Price (a Fellow of the Royal Society), and it was Pierre-Simon Laplace who understood the wide-ranging reach of (what is now called) Bayesian Logic.

Thomas Bayes

First a little review of the basics; probability is all about measuring the likelihood of events occurring, this can be calculated using the following ratio:

\textbf{probability of event occurring} = \frac{\textbf{number of times event can occur}}{\textbf{number of all possible events}}

In technical notation, the “probability of event occurring” is denoted P(\textbf{event}). So, for example, the probability of rolling a 1 on a 6-sided die. Would be calculated as follows:

P(\textbf{1}) = \frac{\textbf{number of sides on a die with value 1}}{\textbf{total number of sides on a die}} = \frac{\textbf{1}}{\textbf{6}} =  \textbf{0.1666666666666667}

Taking another example, the probability of someone having an Apple iPhone would be expressed as follows:

P(\textbf{iPhone}) = \frac{\textbf{number of people who have an iPhone}}{\textbf{total number of people}}

Similarly, the probability of owning an iMac would be:

P(\textbf{iMac}) = \frac{\textbf{number of people who have an iMac}}{\textbf{total number of people}}

This is all very interesting, but Bayesian logic allows us to take things a step further so that we can calculate the probability of something being true (e.g. that someone has an iPhone) if we already know something else is true (i.e. they already have an iMac).  The probability of something (A) being true given than something else (B) is already true, is denoted as P(\textbf{A} |\textbf{B}), or in our case P(\textbf{iPhone} |\textbf{iMac}).

In probability, the denominator (the thing on the bottom of the fraction) is the entire population. When we were just considering the probability of having an iPhone the denominator was the total number of all people. However, now our denominator will be the probability of the person having an iMac.

The nominator (the stuff on the top of the fraction) is the subpopulation we’re trying to focus on. In your case, this population has an iPhone and, having this iPhone, also has an iMac.

So the Bayesian theorem in full is written as follow:

P(\textbf{iPhone} |\textbf{iMac}) = \frac{P(\textbf{iMac} |\textbf{iPhone}) \textbf * P(\textbf{iPhone})}{P(\textbf{iMac})}

It seems a little backwards that in order to determine the probability that someone who has an iMac will also have an iPhone, you need to know the probability that someone who has an iPhone also has an iMac. However it it necessary since our numerator is really capturing two distinct events (each with its own probability of occurrence): first that he person has an iPhone (i.e. P(\textbf{iPhone}) ), and second that given someone has an iPhone what is the probability they also have an iMac (i.e. P(\textbf{iMac} |\textbf{iPhone}) ).

Unfortunately, there’s quite a lot of different variables in this theorem, which makes it difficult to apply. However this can be simplified (though it will appear that we’ve made it more complicated).

The total probability of having an iMac can be expressed as the probability of having an iMac if there already an iPhone and the probability of having an iPhone, plus the probability of having an iMac and no iPhone and the probability of not having an iPhone.

This can is written as follows:

P(\textbf{iMac}) = P(\textbf{iMac} |\textbf{iPhone}) \textbf * P(\textbf{iPhone}) +  P(\textbf{iMac} |\neg\textbf{iPhone}) \textbf * P(\neg\textbf{iPhone})

This would seem to be hyper pedantic, but note two things:

  1. We can express the probability of having an iMac while completely replacing P(\textbf{iMac})
  2. Also we know that P(\textbf{iPhone})  + P(\neg\textbf{iPhone}) = 1 , so if we know one we can easily calculate the other.

Now we can replace the denominator of Bayes’ theorem as follows:

P(\textbf{iPhone} |\textbf{iMac}) = \frac{P(\textbf{iMac} |\textbf{iPhone}) \textbf * P(\textbf{iPhone})}{P(\textbf{iMac} |\textbf{iPhone}) \textbf * P(\textbf{iPhone}) +  P(\textbf{iMac} |\neg\textbf{iPhone}) \textbf * P(\neg\textbf{iPhone})}

This may look more complicated than the original equation, but now that we’ve significantly reduced the number of variables in the formula and so it should (in theory) be easier to compute.

History of Artificial Intelligence – Raiders of the Lost Arts

Some previous posts provide a quick postcard from the early days of AI and the rise of the first commercial AI applications: Expert Systems. However for all the initial hype around expert systems, their domain of expertise was (by definition) limited, they were expensive to build and maintain, and impossible to formally prove complete or correct. Moreover their most lacking feature, one that threw serious doubt as to whether expert systems could be classified as intelligent machines, was their inability to learn from the problem domain or from experience.

Feeling as if they had all rushed down a blind alley, researchers once again looked to the functioning of the human brain for inspirations and resumed work on replicating its neural structure inside machines. Even though the work of early pioneers had documented the essential concepts of neural networks, they had lacked the powerful computing infrastructure needed to implement the theory. Furthermore, mathematical analysis of neural network models appeared to demonstrate the computing limitation of such structures.

However, with more modern computing technology and the renewed enthusiasm directed at neural networks, fresh breakthroughs seems to happen simultaneously. Important theoretical advances were made in the 1980s such a Adaptive Reasoning Theory (Grossberg), Hopfield Networks (Hopfield), Self-Organising Maps (Kohonen), Reinforced Learning (Barto), and the high influential Back-Propagation Learning Algorithm (Bryson and Ho).  All these resulted in a new breed of neural networks that could be trained and learn for themselves.

Other AI research postulated that since human intelligence emerged from evolutionary forces nascent in the natural world, i.e. Charles Darwin’s theory of evolution and natural selection, then intelligent machines would arise from an synthetic evolutionary environment. This approach to developing AI solution involves simulating a population of objects, allowing for evolutionary relationship to occur (selection, crossover and mutation), adding a healthy amount of entropy and letting generations evolve. The evolution based approach encapsulates three main techniques: genetic algorithms, evolutionary strategies, and genetic programming where the computer doesn’t produce answers but outputs programmes as the solution.

It would be unfair to say that neural networks, with their ability to learn from experience, discover patterns, and operate in the face of incomplete information, superseded expert systems. In fact the two technologies complement each other rather well. As was discussed in a previous post, knowledge elicitation from human expert is time consuming, can be expensive, and may lead to contradictions if multiple experts contribute to the knowledge base. Furthermore, experts may themselves make decision in the face of a great amount of uncertainty and are only able to explain their actions by vague explanations, lacking the preciseness that rules-based systems need.  Neural networks can be used to discover hidden knowledge in the system, manage vagueness in the rule definition, and also correct rules where the entered expertise is contradictory.

It seems that human experts are able to reason and make decisions in the face of uncertainty because the natural language used in the reasoning process supports the expression of concepts with are vague and subjective. And so the theory of fuzzy logic became of primary interest for expert system developers. Fuzzy logic and fuzzy set theory is not a new discovery and had been established by Lotfi Zadeh in 1965. However the concept of fuzzy logic had not been well received by Zadeh’s contemporaries, possibly because the word “fuzzy” was offensive to scientists who wanted to be taken seriously. However by the 1980’s, the idea had travelled east to Japan where it had been successfully implemented in consumer goods (such as air conditioners and washing machines). Hence fuzzy logic had a  proven commercial track record and  significantly reduced the development effort and complexity of expert systems.

Nowadays expert systems use fuzzy rules and neural networks to create more powerful AI solutions. The field has matured and new expert systems development is based on existing theories rather than the expression of new ones. But processing potential had taken an exponential leap forwards with the advent of cloud computing, resulting in new powerful AI frameworks and solutions (e.g. deep learning). Though it may take an infinity of computers to replicate the power of the human mind, such a superlative seems to be within our grasp and AI is now more relevant in society than ever.

This post is based on the first chapter of “Artificial Intelligence -A Guide to Intelligent Systems” (2nd Edition) by Michael Negnevitsky.

History of Artificial Intelligence – Bringing in the Experts

As explained in a previous post, the initial drive in Artificial Intelligence attempted to deliver generic reasoning machines with little or no domain knowledge. Finding a solution was reduced to a “searching problem”: the  program would try different permutations of incremental steps until a path to the answer was found. Even though this worked in practice for small “toy” domains, such an approach would not scale up to a real world situations. As soon as a problem could no longer be solved in polynomial time, but required exponential time to solve, such AI programs proved completely ineffective. As a consequence, in the early 1970s government funding for AI projects was cut.

However, the foundation of knowledge engineering concepts, along with very useful tools and AI programming languages had been inherited from the founding fathers of this new science. The next generation of researchers soon discovered that for generic problems domain only weak solutions would emerge but if the domain was sufficiently restricted, stronger heuristics could be built in to the system which would result in stronger solutions.

DENDRAL was one of the first implementation of such a system. NASA was planning to send an unmanned spacecraft to Mars, and needed a machine to analyse the molecular structure of the Martian soil. The brute force generate-and-test approach of producing all possible solutions given the potential molecular input and then comparing them with the field observations proved NP-Complete (a problem that can only be solve in exponential time, and becomes impossible even for modest sizes of input).

(This picture was taken by the Viking Lander 1 on February 11, 1978)

However, such tasks could be solved by human experts who were able to reduce the problem and make it tractable. As the project team set about encoding expertise to the system, it became apparent that the expert’s knowledge was not limited to the laws of chemistry, but relied on personal heuristics (rules of thumb), and guesswork. In fact, extracting knowledge from the expert became a significant bottleneck in the development of the system. Nonetheless, DENDRAL was a success which ended up being marketed commercially in the USA.

Following on from the success of DENDRAL, the research team at Stanford University set to work on an medical expert system for the diagnosis of infectious blood diseases. The project was called MYCIN, and started to formalise the approach to developing expert systems. Once completed it was equivalent to a human expert (in the narrow field of infectious blood diseases), and outperformed junior doctors. It incorporated some 450 rules which were clearly separated from the reasoning procedure. Using this software design, the team also developed a generic expert system, EMYCIN, which had all the features of the initial software, but none of the rules. New expert applications could be developed by just adding rules to the system.

One of MYCIN’s features was its ability to reason in the face of uncertainty. The approach was taken up in PROSPECTOR, an expert system for mineral exploration. In geological investigation, crucial decisions have to be made in face of uncertainty. To automate such decision making PROSPECTOR implemented Bayes’s rule of evidence to propagate uncertainties throughout the system. The system had over 1,000 rules and also featured a knowledge acquisition system.

Though, I’ve only presented three early examples, expert systems were very popular and successful in the 1980s and 1990s. However, such rule based AI proved to have significant limitation:

  • The restricted problem domain within which expert systems operate limit significantly their usefulness. For example, MYCIN would not perform well in situations where the patient suffered from multiple health conditions simultaneously.
  • Furthermore, the expert system would not be aware of the narrow boundaries limiting it and attempts to solve problems outside its domain would yield unpredictable results.
  • Being rule based, an expert system may have limited capacity to explain its findings, and may find it impossible to derive heuristic links between its rules to gain deeper insight into the problem domain.
  • The completeness, robustness and soundness of an expert system is thus far impossible to formally establish.
  • Finally, many early expert system were not able to learn from experience and had to be provided all the rules applicable to the problem domain by developers. The development effort in building useful expert system is thus prohibitively significant.

This post is based on the first chapter of “Artificial Intelligence -A Guide to Intelligent Systems” (2nd Edition) by Michael Negnevitsky.

History of Artificial Intelligence – Foundations of a Science

Philosophers have been considering how the human mind works, and whether non-humans can have minds, since the dawn of time (sorry about the platitude, but I have to start somewhere). On one side of the argument, some consider that machines can do everything that humans can, while the other side believes that the complex and sophisticated behaviour commonplace in humans (e.g. love, creativity, morality) will never be obtainable by machines.

The objective of the field of Artificial Intelligence is to create machines that can perform tasks which require intelligence when humans perform them. Consequently, the question of whether machines can think for themselves become a vitally important one. Fundamentally, considering this question forces us to define very precisely what is ‘intelligence’ and what is ‘thinking’, and wonder if these concepts only make sense in the context of the human brain.

An early yet significant paper on machine intelligence was proposed by Alan Turing in 1950 entitled “Computing machinery and intelligence”. Turing is the iconic father of AI and computer science, his achievement have shaped the disciple ever since. Turing wisely didn’t attempt to define what ‘intelligence’ is or how it could be embodied within a machine. He instead defined a test to see if a machine could fool a human in believing that it was human too. He also described an abstract machine able to manipulate symbols on a magnetic tape  as it migrates some state to state.


(Statue of Alan Turing, by Stephen Kettle)

The field of AI as we know it today was established by successive generations of groundbreaking scientists. Some of the key first generation fathers are introduced below:

  • Alan Turing
  • Warren McCulloch and Walter Pitts defined a neural network model where each neuron in the network would have binary state. They showed that neural networks were instances of Turing Machines, and hence could become the medium through which AI could be developed.
  • John von Neumann, a friend and colleague of Alan Turing, helped design some of the earliest  stored program machines and formalised computer architecture concepts which are still used to the present day.
  • Claude Shannon demonstrated (through the idea of a computer playing chess) that a brute force approach to AI would never be computationally possible, and that a heuristic approach was needed.
  • John McCarthy was instrumental in organising the 1956 AI workshop at Dartmouth College, sponsored by IBM. During this workshop the field of AI was established as a science. Though attendance to the workshop was sparse, the next 20 years of AI research would be dominated by its attendees and their students. He also designed LISP, one of the oldest computer language.
  • Marvin Minsky worked on a non-logic-based approach to knowledge representation an reasoning.

The early days of AI focused on simulating cognitive processes by defining general methods for solving a wide range for problems. The emphasis was on general-purpose researching and reasoning approaches. Unfortunately, this resulted in weak performance of the resulting AI programmes. However, the great minds attracted to the field of AI helped set the foundations of knowledge representation, learning algorithms, neural computing, and natural language computing.

This post is based on the first chapter of “Artificial Intelligence -A Guide to Intelligent Systems” (2nd Edition) by Michael Negnevitsky.