Search This Blog

Tuesday 22 February 2011

GENETIC ALGORITHM


                                   



GENETIC ALGORITHM

ABSTRACT:
This paper attempts to examine the way as how Genetic Algorithm (GA) can be employed  in finding the best solution amongst a given number of possible solutions, the basic principle being extracted from the evolution and adaptation of organisms. The thought of Genetic Algorithm primarily originates from the perception of Charles Darwin who thought that combination of selection and variation is what that makes evolution of organisms perfectly adaptable to the environment. As generations pass, better adaptable organisms are born and the system slowly reaches to a most favorable point. GA makes use of this principle and eventually converges to the best “optimal” solution amongst a set of given solutions. In searching for a solution, a population of candidate is generated, evaluated, selected, and reproduced with modification to produce the candidate population until no further improvement can be made or after certain numbers of generations have generations have evolved, as according to the need of the problem.



KEYWORDS:
Evolution,  Selection, Mutation, Crossover, Optimum, Adaptation, Fitness.

1. FOUNDATION IN SCIENCE:
As early as 1800, Charles Darwin thoughts on “evolution” and “survival” theory with the publication of “The Origin of Species” laid the groundwork for later developments; most notably, Darwin was also the one who proposed that all creatures, including humans were evolved from other creatures. Over time, creatures change to adapt to their environment to survive and thrive and those having a higher fitness, will survive longer and produce more offspring. This continues to happen and the individuals become more and more suited to their environment every generation. it was this constant improvement that inspired computer scientists, one of the most renowned being John Holland, writer of “Adaptation In Natural And Artificial Systems”, who presented the first ever seminal work in the field of Genetic Algorithms in 1975.

2.INTRODUCTION:
Genetic Algorithms are probabilistic search approach, which are founded on the ideas of evolutionary processes. The Genetic Algorithm procedure is based on Darwinian principle of the survival of the fittest. Genetic Algorithm makes use of the principles of selection and evolution to produce several solutions to a given problem. one must note that the underlying principle of Genetic Algorithm arises from the “evolution” theory only; it is the rigidity and flexibility of this principle which has made Genetic Algorithm so popular amongst optimization field and finding solution to real world problems. As against traditional methods, Genetic Algorithm are suited for many real world problems which involve finding optimal parameters. Not only does Genetic Algorithm provide an alternative method to solving problem, it consistently outperforms other traditional methods in most of the problems link. They do an excellent job of searching through a large and complex search space. Genetic Algorithms are more effective in finding “the optimum”, be it in a state-space search or in surfaces. Selection, Variation and Elimination are the three bricks upon which the whole structure of Genetic Algorithm is constructed.

3. WHO CAN BENEFIT FROM GENETIC ALGORITHM:
Genetic Algorithms are proven to be an enormously powerful and successful problem solving strategy, demonstrating the power of evolutionary principles. Moreover, the solutions they come up with are often more competent, more elegant, more complex as compared to other traditional problem solving techniques. Genetic Algorithm is beneficial to nearly everyone, once the correct mode of representation for the problem is chosen plus the relative fitness of solutions is compared correctly. Genetic Algorithm are useful and efficient when
1. The search space is large, complex or poorly understood.
2. Traditional non-linear search methods fail.
3. No mathematical analysis is available.
4. Fitness landscape is non-linear and changes over time.
5. Multi-modal or n-dimensional search space exists.

4.THE PRINCIPLE,THE ALGORITHM:
Genetic Algorithm is, in its true sense, associated with sexual reproduction in which the genes of two parents combine to form those of their children. When it is applied to problem solving, the basic logic is that we can create an initial population of individuals representing possible solutions to a problem we are trying to solve. Each of these individuals has an associated fitness measure (fitness function) that makes them more or less fit as members of the population. The most fit members will have a higher probability of mating than the less fit member, to produce offspring that have a significant chance of retaining the desirable characteristics of their parents. Therefore the algorithm identifies the individuals with the optimizing fitness values, and those with lower fitness will naturally get discarded from the population. Successive generations improve the fitness of individuals in the population until the optimal solution is met. Hence, the algorithm can be framed as…
1. Encode the problem in a binary string or an array of integers or decimal numbers.
2. Random generate the population representing a group of possible solutions.
3. Calculate the fitness value for each individual

5. METHODS OF REPRESENTATION:
Before employing Genetic Algorithms in finding solution to a problem, the problem (possible solutions) needs to be encoded in a computer recognizable form. There are many ways of representing the problem where each method has its own advantages and disadvantages, few of them being:
1. Encoding the problem in binary string i.e. sequences of 1’s and 0’s, where digit at each position makes up some part of the solution.
2. Encoding the problem in array of decimal numbers or integer numbers where again each digit represents some part of the solution.
3. Encoding the problem in array of strings. This method is used in “grammatical encoding” approach employed to generate neural networks.
4. Making use of “genetic programming” in which problem is represented in form of data structure known as tree.

6. GENETIC OPERATORS:
Genetic operators are applied to improve the performance of the population of behaviors. One cycle of testing all of the candidates is defined as a generation, and is repeated until a good behavior is then applied to the real world. The operators here are described with regard to the “evolution and adaptation” procedure i.e. parents are selected based on their fitness and they reproduce to create offspring which are better adapted to the next generation. The same concept is applied while dealing with extraction of a better solution amongst the given possible solution in the real world.

6.1) SELECTION
The selection procedure randomly selects individuals from current population for development of the next generation. Again, in the next generation the selected candidates are chosen for the next solution or cycle. Various types of selection procedures are available, from which any can be chosen as best suited to programmer, few of them being – Roulette-wheel selection, Generational selection, Hierarchical selection, Rank selection.

6.2) CROSSOVER
The crossover procedure involves combining two selected individuals about a crossover point thereby creating two new individuals which represent the next generation. In case of asexual reproduction, a single individual is replicated into the new population. There are many different kinds of crossover; the most common type is single point crossover. In single point crossover, the children take one section of the chromosome from each parent, thus inheriting good behavior from both each. Sometimes parents are copied directly to the new population when no crossover occurs. The probability of crossover occurring is usually 60% to 70%.
                       
1
0
1
1
0
0
1
0
1
1
1
0
0
1
0
1
1
0
1
1
0
1
0
1
1
0
1
1
0
0
1
0
1
0
1
1
1
0
1
0
Row 3 indicates a new hybrid with crossover (single point) taking place between 4th and 5th positions in the chromosomes of its parents. Row 5 indicates mutation taking place at 5th position.(Here the problem is coated in strings of binary numbers 0’s and 1’s).

6.3) MUTATION
After selection and crossover, there’s a new population full of selected candidates representing next generation (a merely good solution).When no crossover occurs, individuals are copied directly and others are produced by crossover. In order to ensure that the individuals are not exactly the same, a provision for mutation is made. The mutation procedure randomly modifies the genes of an individual subject to a small mutation factor, introducing further randomness into the population. In other words, it periodically makes random changes in one or more members of the current population, yielding a new candidate solution. The probability of mutation is usually between 1 and 2 tenths of a percent. Mutation is important in ensuring genetic diversity within the population.

7. BREEDING EXAMPLE:
The fact that Genetic Algorithm makes use of only good characteristics of each parent based on the fitness in producing new offspring can be best illustrated with the help of a Breeding example.
Consider a Fitness function consisting of a single parameter that takes value between 0.0 and 4.0, and we select two individuals based on their fitness and combine them to get a better solution. Let suppose in a certain run, the numbers 3.273672 and 3.173294 are two selected candidates that now undergo single point crossover to ‘genetically’ in order to produce more stable and secure offspring (solution).
The steps are as follows:
-Encode the two numbers into binary strings.
-Randomly determining the crossover point.
-Swap the two tails ends of the binary strings.
-Recombine the two binary strings to get two new offspring.
-Decode the binary strings back into floating point numbers.

1) Encoding the two numbers into binary strings
Parent 1=3.273672 =>11.0100011000001
Parent 2=3.173294 =>11.0010110001011
2) Randomly choose a crossover point; let suppose be it at bit 6, and we split the gene at position six.
Parent 1=>3.273672=>11.010---0011000001
Parent 2=>3.173294=>11.001---0110001011
3) Swapping the two tails ends of binary strings.
Child 1=>11.010---0110001011
Child 2=>11.001---0011000001
4) Recombining the two binary strings to get two new offspring.
Child 1=>11.0100110001011
Child 2=>11.0010011000001
5) Decoding the binary strings back into floating point numbers.
Child 1=3.298218
Child 2=3.148560
The resulting offspring are close to, but not exact replicas of the two parents. We note that the first three bits were going to be the same (11.0), so that the offspring of these two parents would have to be between 3.0 and 3.5.In this way good genes are passed down from generation to generation, while bad genes are bred out of the population because the parents with bad genes are not often given chance to breed.

8. WHY GENETIC ALGORITHM???
Genetic Algorithms can identify and exploit regularities in the environment, and converges on solutions (can also be regarded as locating the local maxima) that were globally optimal. This method is very effective at finding optimal or near optimal solutions to a wide variety of problems, because it does not impose limitations required by traditional methods such as gradient search, random search etc. The Genetic Algorithm technique have advantages over traditional non-linear solution techniques that cannot always achieve an optimal solution. The method is very different from “classical” optimization algorithms-
a) It uses encoding of the parameters, not the parameters themselves.
b) The search is more exhaustive in a given amount of time.
c) Due to its probabilistic nature rather than deterministic, it yields “different solutions on different runs”.
d) Explores the solution space in multiple directions rather than in single direction.

9.0) APPLICATIONS:
Genetic Algorithm can primarily be employed for problem solving and for modeling. Its plasticity and efficiency of solving problems makes it a more favorite choice among the traditional methods, namely gradient search, random search and others. Genetic Algorithm have been widely studied, experimented and can be applied to many scientific and engineering problems, in business and entertainment, including:

9.1) ARTIFICIAL LIFE:
Genetic Algorithm the most widely and prominently used computational models of evolution in artificial-life systems. Researches on Genetic Algorithm in given illustrative examples in which the genetic algorithm is used to study how learning and evolution interact, and to model ecosystems, immune system, cognitive systems, and social systems.

9.2) ARTIFICIAL NEURAL NETWORK APPLICATIONS:
Genetic Algorithm has been increasingly applied in ANN design in several ways; topology optimization, genetic training algorithms and control parameter optimization. In addition Genetic Algorithm have been used in many other innovative ways, for instance, creating new indicators based on existing ones, selecting good indicators and in complementing fuzzy logic.

9.3) INFORMATION SYSTEM APPLICATIONS:
Distributed computer network topologies are designed by a Genetic Algorithm, using three different objective functions to optimize network reliability parameters, namely diameter, average distance, and computer network reliability. The Genetic Algorithm has successfully designed networks with 100 order of nodes.

9.4) GENETIC ALGORITHM IN BUSINESS AND THEIR SUPPORTIVE ROLE IN DECISION MAKING:
Genetic Algorithms can be used to solve many different types of business problems in functional areas such as finance marketing, information systems, and production/operations. Within these functional areas, Genetic Algorithm can perform a variety of applications such as tactical asset allocation, job scheduling, machine-part grouping, and computer network design.

9.5)ASTRONOMY AND ASTROPHYSICS:
Genetic Algorithm are useful in solving 3 types of problems:
Fitting and rotation curve of a galaxy based on observed rotational velocities of its components, determining the pulsation period of a variable star based on time-series data, and solving for the critical parameters in a magneto hydrodynamic model of the solar wind.

9.6)MACHINE AND ROBOT LEARNING:
Genetic Algorithm can be used for many machine-learning applications, including classification and prediction, protein structure prediction. Genetic Algorithm have also been used to design neural networks, to evolve rules for learning classifier systems or symbolic production systems, and to design and control robots.

9.7)ECONOMIC AND FINANCIAL MARKETS:
Genetic Algorithm have been used to model processes of innovation, the development of binding strategies, and the emergence of economic markets. An investment firm California that manages over $2.2 billion uses Genetic Algorithm to make investment decisions for all their financial services. Genetic Algorithm can also be used to predict exchange rates of foreign currencies up to one month ahead.

9.8)MATHEMATICS AND ALGORITHMIC:
Genetic Algorithm can offer a great help in solving “pure” mathematical problems such as high-order nonlinear partial differential equations. They can also be used in “sorting network” which is a fixed list of comparisons performed on a set of a given size and each comparison; two elements are compared and swapped if not in order.
9.9) OPTIMIZATION:
Genetic Algorithm have been used in a wide variety of optimization tasks, including numerical optimization, and combinatorial optimization problems such as traveling  salesman problem(TSP),circuit design, job shop scheduling and video and sound quality optimization.
In addition to these, Genetic Algorithms find their application Aerospace engineering, Geophysics, Military and law enforcement, Data Mining, Cryptography, Immune system models, Ecological models, Machine and robot learning and can be employed with more intensiveness in many other fields.

10.LIMITATIONS
Although because of its simplicity and classiness, Genetic Algorithm have proven themselves as efficient problem solving strategy, yet they cannot be considered as universal remedy. Some limitations do persist with them
1) The method chosen for representing the problem must be strong and firm, it must withstand random changes or otherwise we may not obtain the desired solution.
2) Fitness function must be chosen carefully. It should be able to evaluate correct fitness level of each candidate. If the fitness function is chosen poorly or defined vaguely, the Genetic Algorithm may be unable to find a solution to the problem, or may end up solving the wrong problem.
3) Genetic Algorithms do not work well when the population size is small and the rate of change is too high.
4) Another drawback of Genetic Algorithm is that solution is “better” only in comparison to other, presently known solutions; it cannot make out “the optimum solution” of its own.
5) Sometimes if an individual more fit than its associated competitors arrives before it should have, it abruptly decreases the size of population, leading the algorithm to converge on the local optimum without examining the rest of the search space. This problem is popularly known as “Premature Convergence”.

11.CONCLUSION AND FUTURE WORK:
The power of evolution has surely refined each and every step it has undergone in its way, and one cannot reject its usefulness in anyway because without it none of the immeasurable advances will be indebt to genetic algorithms would have been possible, and of course, the main driving force being Charles Darwin’s simple, powerful intellectual: that the random chance of variation, together with law of selection, is a problem solving technique of massive and limitless application. The algorithm is one of the best problem solving “tool” in the present scientific and commercial world.
Though its theoretical journey, as research continued to be productive, genetic algorithms soon jumped into the commercial sector. Today, Genetic Algorithm are related to “solving problems of everyday interest” in many diverse fields. Due to its intrinsic parallelism, the comprehensiveness with which this algorithm is applied in so many areas is no less than astounding. However, several improvements can be made in order that Genetic Algorithm could be more generally applicable. Future work will continue in process of building robotic system through evolution and many more specific tasks and as research is on going, we would surely witness some of the most flawless advancements in Genetic Algorithm application fields.

12.REFERENCES AND RESOURCES:
[1] Introduction to Genetic Algorithms
     -Axcelis

[2] Functioning of a Genetic Algorithm

[3] Creating New Solutions through Mutation
      Selecting Solutions via “Survival of the Fittest”

[4] GA White Paper

[5] Introduction to Bioinformatics,
      by S. Sundararajan & R. Balaji

[6] Genetic algorithm Warehouse






           

Data Mining – Automated Pattern Discovery



Data Mining – Automated Pattern Discovery









       ABSTRACT


In this paper we propose to talk about Data Mining the methods in use, how it links with Data Ware house activity to ferret out hitherto unknown patterns in statistical data, its objectives and benefits and its distinction from and interdependency with OLAP and query/reporting.
Data Mining methods and systems are based on years of classical work on statistics, pattern recognition and information theory. The availability of on-line enterprise-wide historical and therefore frequently huge data and the increasingly inexpensive computing power and access has facilitated the process of probing the statistical data mine in search of new perspectives and patterns that were previously hidden.
The proper use of Data Mining methods is focused on finding the effects that existed so that the causes could be reasoned there from. Thus exposed trends could be analyzed and exploited to beneficial effect there by improving the quality of decision making to meet single or multiple objectives.























 INTRODUCTION

What is a Data Warehouse?
What do companies hope to gain from them?

Companies hope to find information that will help them remain viable in the business market by providing better services, more timely product introduction, better customer relations, or improve their business processes. A business would be able to discern customers tastes and predict trends in future buying decisions. They could analyze customer data and determine which are the most profitable and concentrate on them. Services such as travel packages, telephone banking, and others could be customized to specific customers to help keep their goodwill. Using buying-pattern data could use mailing, telephone, and other advertising campaigns to target specific groups. This would help companies get a better return for each dollar spent and remain competitive
Decision Support – Key components
Decision Support is a broad term referring to the use of information as a strategic corporate asset, enabling companies to utilize their databases to make better decisions.
There are three separate components to an enterprise-wide decision support system:
v     The Data Warehouse, where the mountain of corporate data is stored all in one
place. Here data volumes are very high as multi-terabit data warehouses are beginning to appear more frequently. These designs are usually either star-schemas (or snow-flakes, etc.) or highly normalized data structures.
v     The Data Mart where departmental data is stored, and often various external data
items are added. The data volumes are usually 15% to 30% of warehouse sizes, and the envelope is being pushed towards the terabit limit. These databases are also usually either 50based on star-schemas or are in a normalized form. They mostly deal with the data space, but at times some multi-dimensional analysis is performed.
v     The Data Mine, where the data is re-organized for analysis and information is
extracted from the data. The data volumes here are the same as the Data Mart, but the data is much more richly structured and is no longer just departmental. The data here refers to a specific business objective and is analyzed for the purpose of information extraction. While the data structures used within the warehouse and the mart may be similar, the data structures used within the data mine are significantly different. The data mine differs from the data warehouse not just in the size of data it manages, but in the structure of the data. The content of the data  in the mine is also often different from the data in the warehouse, because  it is often enriched by additional external data not found within the warehouse. However, content aside, the key issue about data mining architecture is that the existing theories of data structuring do not apply to it.
Decision support systems have traditionally relied on three types of analyses:
v     Query and Reporting: Where a user asks a question, e.g. "what were the sales for a
motorcycles in Bangalore..” OLAP (On Line Analytical Processing): which amounts to the processing of queries along multiple dimensions such as state, month, etc. For instance: "categorize sales by month by brand, by store-chain, by salesperson"
v     Data Mining: which provides influence factors and relationships in data, e.g. "what
impacts sales in Bangalore for motorcycles." Data Warehouse query and reporting, OLAP and data mining have often been viewed as related activities. However these take place in conceptually distinct computational spaces. Data access operations such as query and reporting deal with the data space, OLAP uses the multi-dimensional space and data mining takes place in the Influence space. With statistics and reports just summaries of data were available to business users. And, the data could only be obtained by request from an analyst, e.g. sales summaries per quarter. With data warehouses, some query and reporting could be performed by business users on their own, e.g. product and store performance reports, etc. With OLAP, multi-dimensional summary questions could be addressed by business users, e.g. the total of sales by product, by channel, by month. With data mining, analysts and a sophisticated subset of business users could gain insight into the influence factors and trends in data. But often significant analysis was needed before key questions could be answered. With knowledge access, almost all the relevant patterns in the data are found beforehand, and stored for use by business users such as marketing analysts, bank branch managers, store managers, etc. Business users get the interesting patterns of change every week or month or can query the pattern-base at will. Because large databases often provide too much of a good thing, approaches based on Query and OLAP usually encounter a problem known as "The Maze of a Million Graphs" – a user can build a million pie charts and yet not see the forest for the trees because there is so much data. Data mining, on the other hand, draws its power from the ability to search through the data with its own initiative, discovering key patterns by itself.
Function Asks the question
Querying and Reporting What happened?
On-line Analytical Processing (OLAP) What happened and why?
Executive Information Systems(EIS) What do I need to know now?
Data Mining What's interesting?
What might happen?
Source: Simon, Alan. "Better Clients, Better Decisions." Byte, January 1997: 91.

DATA MINING
Data Mining is the "automatic" extraction of patterns of information from historical data, enabling companies to focus on the most important aspects of their business -- telling them
what they did not know and had not even thought of asking. Industry surveys clearly indicate that over 80% of Fortune 500 companies view data mining as a critical factor for business success by the year 2000. Most such companies now collect and refine massive quantities of
data in data warehouses.
These companies realize that to succeed in a fast paced world, business users need to be
able to get information on demand. And, they need to be pleasantly surprised by unexpected, but useful, information. There is never enough time to think of all the important questions -- the computer should do this itself. It can provide the winning edge in business by exploring the database itself and brings back invaluable nuggets of information. Many organizations now view information as one of their most valuable assets and data mining allows a company to make full use of these information assets.
Data mining employs techniques such as rule tree induction, laws of probability, and neural networks. Even with the application of these techniques, false data can often result. For example, one data mining company found a connection between dog food sales and cold. drinks in a supermarket database. The result was written off as a quirk, but not all connections may be dismissed so easily

Data Mining Above, Beside and Within the Warehouse
Once we accept the fact that the data mine is distinct from the data warehouse, the next
logical question is: "Where does the data mine actually exist? Is it a separate repository next to the warehouse, a set of views above the warehouse, or just part of the warehouse?" We can answer this question in each of these three ways and get a different architecture for the data mine.
The data mine can exist in three basic forms:
Ø      Above the warehouse, as a set of conceptual views.
Ø      Beside the warehouse, as a separate repository.
Ø      Within the warehouse, as a distinct set of resources.
Data mining "above the warehouse" provides a minimal architecture for the discovery and
analysis. It is suitable only in cases where data mining is not a key objective for the warehouse. In this approach SQL statements are used to build a set of conceptual views above the warehouse tables. And, additional external data from other tables may be merged as part of the views. The views built above the warehouse may either be materialized (i.e. saved to disk as new tables), or not. Therein lies the fundamental problem (if not contradiction) built into this approach. If the views are not of significant size, then serious data mining can not take place. However, if the views are of a significant size, then without materialization the effort in computing them again and again will require very large amounts of processing power – in some cases significantly affecting the availability of the warehouse resources and interfering with other applications performing indexed retrievals.
On the other hand, if the views are of significant size and they are materialized, we are no longer data mining "above" the warehouse and will be using a disorganized form of the third approach, i.e. data mining within the warehouse. If the views are materialized, the third approach will almost always work better, because it can utilize a suitable data distribution approach and a specific processor allocation strategy, as well as using different data structures for data mining. Without these precautions, the number of potential pitfalls increase rapidly, sacrificing both performance and functionality. Hence data mining above the warehouse should be restricted to applications in which data mining is only of peripheral business interest, and not a key objective. However, holding this view is often a big a business mistake in itself -- i.e. why have so much data in a warehouse and not understand it?
In most cases, data mining is effectively performed beside the warehouse, with data structures that lend themselves to detailed analyses. And, in most cases additional data suitable for the analyses is merged with the warehoused data in order to perform specific analyses for focused business needs.
Data mining beside the warehouse both overcomes and sidesteps several problems at once. To begin with, it allows data mining to be done with the right data structures, avoiding the
problems associated with the structures of the data space. Moreover, the paradox of warehouse patterns can be avoided by selecting specific data segments, corresponding to specific business objectives. And, the interactive exploratory analyses that are often performed in the data mine with "wild rides" through the data no longer interfere with the warehouse resources that are responsible for routine processes such as query and reporting. In fact, different business departments can use their own data mines that address their specific needs, e.g. direct marketing vs. claim analysis. The data is then moved from the large warehouse to the mine, is restructured during the transformation and is analyzed. It is, however, important to design the transfer and transformation methods carefully, in order to allow for optimal "refresh methods" that require minimal computing. For instance, as we bring new data into the data mine every day or every week, the over-head for re-aggregation should be minimized.
In some cases, where the warehouse is a very large, massively parallel processing (MPP)
computer, the data mine may actually reside as a separate repository within the large warehouse. This is very similar to a data mine beside the warehouse, where the mine uses a portion of the physical warehouse, but is independent of the warehouse structures, in effect being a "republic within a republic". In this scenario, the disk apace and the processors for the data mine are specifically allocated and separately managed. For instance, on a "shared nothing" MPP machine with 32 processor, the disk space for the data mine is separately allocated on 8 of the 32 nodes and 8 processors are dedicated to data mining, while the other 24 processors manage the rest of the warehouse. And, when needed, additional processing capability may be directed towards the needs of data mining.
Although this idea may sound attractive based on arguments for centralization and scalability, in practice it usually leads to loss of flexibility, without providing any significant benefits for data mining. In most cases, when we consider at the technical, marketing and business issues, the problems with mining within the warehouse multiply quite rapidly, and the data mines planned for use within the warehouse will eventually find themselves beside it.
The key point is that the likelihood of serving the needs of many people within the data space is much higher that the likelihood of serving their needs within the multi-dimensional and
influence spaces. While the data elements may be almost the same for several departments, the dimensions, the influence relationships and the predictive models they all need will vary far more than their simple data needs. Hence, the data mine within the warehouse will soon become the lowest common denominator of all designs. Therefore, while the design of the data space may be subject to compromises to please the various user groups, there should be no compromises in design of the data mine where serious and detailed analyses take place. The data mine should be optimized to deliver effective results by focusing on specific business needs, because influence analysis is so much harder than data access.
Steps in Data Mining
Data Mining may involve some or all of the following steps, depending on the amount of data,
condition of the data and results desired.
Data Selection
This involves choosing the types of data to be used. A database may contain data about
customer purchases, lifestyles, demographics, census, state taxes etc. If a retailer wanted to decide how to lay out the display shelves in the store they may only need to use purchase
and demographic data.
Data Transformation
Once the data has been selected it often needs to be cleaned up or possibly transformed into values that can be operated on by the type of data mining operation to be performed and the
technique to be used. Data may need to be converted into numeric values to be used in a neural network, new attributes may need to be defined or derived. In one case the database included 500 different ways of identifying which state of the U.S. the information came from.


Data Mining
The data is then mined using the desired technique in an effort to extract the information.
There are many methods of mining for data. The method used is often based on the type of information you are seeking and the type of data that you have. Some of the methods are: association, sequence-based analysis, clustering, classification, estimation, fuzzy logic, neural networks, fractal-based transforms, and genetic algorithms.
To develop a symbolic classification model to predict if a magazine subscriber will renew their subscription, you first need to use clustering to segment the database and then apply rule induction to create a classification model for each desired cluster.
Data mining can also be:
Ø      Multilevel - able to compare between texts of any size.
Ø      Multimedia - supports analysis of text and images.
Ø      Multimode - operation includes interactive client-side and automatic server-side
            processing.
Ø      Multilingual - analysis is designed to support direct comparison of texts in different
           languages.
Result Interpretation
Once the information has been extracted, it is analyzed based on the end users requirements, and the information is identified and presented to the decision maker via the decision support system. The purpose of interpretation is to visualize the output (logically or graphically) and filter the information to be presented to the decision maker. It is not uncommon to find during the interpretation step that the rules of data selection need to be modified. Some of the decisions to be made may involve large amounts of money and management is not very enthusiastic about embracing ideas that they cannot understand or analyze for themselves. If management cannot understand the rules it is hard to explain to a client how they reached the decision.
Who is using Data Mining?
The Washington State Dept. of Health and Social Services, National Basketball Association, U. S. Treasury, scientists, banks, retailers, and service providers are some of the organizations that have implemented Data Mining and hope to benefit from its use. Because the cost of implementing Data Mining systems with massively parallel computers can run from $350,000 to a few million dollars, some companies like IBM are offering data mining services. A company sends their data to the vendor, who than cleans and processes the data. The results could then be sent back to the company or posted on the Internet where the company can access them. Service fees have not been set yet, but the software costs are expected to be around $40,000. Data mining is not just for large corporations, but can be powerful for small businesses also. A restaurant owner who serves 500 meals a week may want to know what new dish to recommend to a customer based on past selections.
What have they actually found?

Washington State Dept. of Social and Health Services

They had no idea of how many children were in the foster care system, how long they were in the system, or if there was any abuse. Their concern was in finding the answers to these  questions without basic data or adequate analysis. One of the data analysts posed the question to KDD Nuggets electronic newsletter and were then given a contact at Columbia University. Working with the researchers at Columbia University and a 500-MB database of foster caregiver payment data they were able to find out how many children stay in long term care and who they are. They also found that 30% of the children that leave the system come back. Using this new information the State of Washington is working to improve resources so that children who enter the foster care system as infants get adopted into permanent homes rather than staying in the foster care system. One of their other goals is to install support services for children when they go home to help keep them from returning to the foster care system. They continue to look for more information to help improve the foster care system.

National Basketball Association

The NBA has teamed up with IBM to deploy a KDD system, called Advanced Scout, that helps coaches strategize for each upcoming game. Last season 14 of the 29 teams used Advanced Scout and the rest were slated to be given copies for this season. During games the NBA collects all of the data on shots, fouls, free throws, turnovers, and other game activities. This data is then posted to a bulletin board so that each team can have access to it and download it. The Orlando Magic used it to better understand rebounding problems and find the best team combination for rebounding. They then used this new information when substituting players. It used to take days to interpret data that can now be analyzed in minutes. In some cases just analyzing the data can lead to misinterpretation, but when tied in with viewing the game videos, it can be used to make strategic decisions. The Mavericks and Magic are both working on systems that will tie the data to digital video with some form of time stamping. This will allow the coaching staff to view the footage instantly. Their plan is to implement this on a laptop computer which will make it very easy to analyze.

U.S. Treasury

The Treasury's Financial Crimes Enforcement Network (FinCEN) uses and Artificial Intelligence Tracking System, a data mining application, to detect suspicious monetary transactions which could point to money laundering. AITS processes about 200,000 large cash transactions (> $10,000) posted by casinos, banks and other organizations each week under federal laws. The results have been turned over to various local, state and federal agencies to assist in thousands of investigations. Because of the volume of data they currently use large computer systems to break down the data and then data analysts do further extraction to drill down farther. They are working on a next generation system with an automated explorer. Because the designer of the system made it user friendly, new workers are able to become productive within a couple of days, and what used to be impossible to analyze can now be searched in times running from a few seconds to 30 minutes.

California Institute of Technology

The Palomar Observatory (San Diego County, Ca.) is taking pictures that will result in a 3- terabyte data set. Knowing that the amount data would be so large, about the equivalent of six million books, George Djorgovski and his team decided to digitize the pictures. They realized that with all of this data they would need to use new software to turn the raw pixel data into stars, galaxies and other objects. He and one of his doctoral students hooked up with the JPL. This collaboration resulted in Skicat, a KDD system that automatically measures and classifies the millions of objects detected in the survey images. Because of the new system they have been able to catalog about 250,000 stars, where the typical study is about 2,000. Djorgovski commented that they now have a living, breathing catalog and that with the huge databases they need to explore new ways to structure, store and use them. He feels that where astronomy used to be data poor they are now becoming data rich.

Watson Research Center

Using the same 40 financial indicators as those used by stockbrokers at the American Securities Market, they tried to find the simplest rules for spotting the best investment bet each month. The researchers used a technique known as disjunctive normal logic, which is a way of connecting the descriptions of data so that contradictions can be quickly found. By following the simple rules they were able to average a return of 270 per cent over a five year period compared to the market average of 110 per cent.
The Paradox of Warehouse Patterns
Data warehouse is a natural place for storing "data" a data mine is the natural place for
performing influence related analyses. To understand the paradox, let us note that the concepts of "large warehouse" and "useful pattern" often interact in a seemingly contradictory way. On one hand, the larger a warehouse, the richer its pattern content, i.e. as the warehouse grows the more patterns it includes. On the other hand, after a point, if we analyze "too large" a portion of a warehouse, patterns from different data segments begin to dilute each other and the number of useful patterns begins to decrease! So the paradox may be stated as follows: "The more data in the warehouse, the more patterns there are, and after a point the more data we analyze the fewer patterns we find!"
A few simple examples easily clarify this further. First, consider a vehicle warranty database. In order to find patterns for customer claims it is essential to store details of each claim in a large data warehouse. But does it make sense to analyze all of the warehouse at the same
time? Does it make sense to ask: "what causes brake problems?" No. In practice, cars are built at different plants and different models of cars use different parts -- and some parts are now discontinued. Moreover, over the course of years the parts used in cars change, so analyzing the entire warehouse may tell us less than analyzing part of it. What works best in practice is to analyze the claims for a given model year for cars built at a given plant – again a segmentation task. Once again, the paradox of the warehouse comes into play here in that by analyzing all of the warehouse at once we reduce the number of useful patterns we are likely to get!
As another example, consider a large data warehouse that includes details of bank's customer accounts, marketing promotions, etc. There can be several business objectives for mining this data, including campaign analysis, customer retention, profitability, risk assessment, etc. To begin with, these are distinct business tasks and it does not make sense to mix the analyses -- hence each of the data mining exercises needs to be performed separately, and will require different data structures as well, because some are association analyses, some are clustering, etc.
However, even the campaign analysis task itself should often not be performed on the entire warehouse. The bank may have undertaken 30 different marketing campaigns over the years, and these campaigns will have usually involved different products and gone to different
customer segments -- some of the products are even discontinued now. To understand who responds best to marketing promotions, we need to analyze each campaign (or group of campaigns) separately because each case will involve patterns with distinct signatures. Mixing the analyses into one data mining exercise will simply dilute the differences between these signatures. And the campaigns are often different enough that mixing them simply may not make sense. So we need to have a separate "Analysis Session" for each group of campaigns. To demonstrate this with a simple example, let us assume that those customers who are over 40 years old and have more than 2 children have a high response rates to credit card promotions. Now, let us also assume that customers who are less than 40 years old and have only 1 child are good prospects for new checking accounts. If we combine these campaigns within the same data mining study and simply look for customers who have a high response rate, these two patterns will dilute each other.
Of course, we can get a rule that separates these campaigns and still display the patterns, but in a large warehouse so many of these rules will appear that they will overwhelm the user.
Thus, the smaller patterns may be found in the warehouse if we are prepared to accept large amounts of conditional segment information, e.g. "If Campaign = C12 and ... Then ...". However, in a large warehouse, there are so many of these that the user will be overloaded with them. The best way is to analyze each group of campaigns separately.
The need for segmentation is even more clear when we consider predictive modeling. When trying to predict the response to a new campaign, it simply does not make sense to base the predictions on all previous campaigns that have ever taken place, but on those campaigns which are most similar to the one being considered. For instance, responses to campaigns for a new checking account may have little bearing on responses to campaigns for a new credit card or a refinancing a home. In this case, the paradox of the warehouse patterns comes into play in that by considering more data, we lose accuracy. This is, of course, because some of the data will not be relevant to the task we are considering. But what happens if there are one or two key indicator that are common to all of the campaigns? Will they be lost if we just analyze the campaigns a few at a time? Of course not. If a pattern holds strongly enough in the entire database, it will also hold in the segments.
For instance, if the people with more than 5 children never respond to campaigns, this fact will also be true in each individual campaign. Hence, most of the time it does not make sense to analyze all of a large warehouse because patterns are lost through dilution. To find useful patterns in a large warehouse, we usually have to select a segment (and not a sample) of data that fits a business objective, prepare it for analysis and then perform data mining. Looking at all of the data at once often hides the patterns, because the factors that apply to distinct business objectives often dilute each other. Hence, the thirst for information can go unquenched by looking at too much data.
The Concept of an Analysis Session
When using a data mine, we bring a segment (and not a sample) of data from a warehouse (or
other sources) to the data mine and perform discovery or prediction. The process of mining this data segment is called an Analysis Session. For example, we may want to predict the response to a proposed direct mail campaign by analyzing previous campaigns for similar products, or we may want to know how customer retention has varied over various geographic regions, etc.
An analysis session may be either "structured" or "unstructured". A structured session is a
more formal activity in which we set out with a specific task, e.g. analyzing profitability by customer segments and/or products. In fact, structured sessions are often performed in a routine manner, e.g. we may analyze costs, revenues or expenses every quarter, and understand the reasons for the trends. Or we may routinely perform forecasting for various items such as product demand in various markets. Or, we may look for unusual transactions that have taken place in the past 30 days. In fact, a structured analysis session usually is of three forms: a discovery, prediction or forensic analysis activity where we perform a specific task.
An unstructured session is a "wild-ride" through the database, where the user wanders around without a goal, hoping to uncover something of interest by serendipity -- or by help from a "exploration-agent". This type of abstract wild-ride usually uncovers some very wild facts hidden in the data. And the mine is a natural place for this activity because the unexpected nature of queries may interfere with the more routine tasks for which the warehouse was designed, e.g. looking up the history of a specific claim. The data in the data mine often needs to be enriched with aggregations. Again, let me emphasize that these are not just summaries, but additional elements added to the data. How these aggregations are built is partly decided by a business analysis. For instance, we may need to look at the number of credit cards a customer has as an item. And we may want to look at the "volume" of transactions the customer has had. We may also want to look at the number of claims a customer has had in an insurance setting, etc. These aggregations enrich the data and co-exist with the atomic level data in the mine.

































CONCLUSION

Data Mining offers great promise to organizations in helping uncover hidden patterns in their historical data and using the patterns discovered in analyzing trends that can be exploited to
enable executive decision making.
However the proper use of data mining tools has to be ensured by letting the users who
understand the business, the data, the pros and cons of the analytical methods used and the advantages and disadvantages of the tools used.
Building the right model is one step and it makes sense to try on various models and tools
and choose the tool that suits the specific business problem as well as one that matches the skill set of the end-user. It is equally important to begin early and ensure that the data warehouse be built to facilitate data mining. The data has to be properly collected and cleaned and prepared. Thus ensured one can expect rewarding results from improving revenues to reducing costs.



REFERENCES
Ralph Kimball, “The Data Warehouse Lifecycle Toolkit”.
Parsaye, K., "New Realms of Analysis". Database Programming & Design, April 1996.
Parsaye, K., Chignell, M.H.: "Intelligent Database Tools and Applications". New York: John
Wiley and Sons, 1993 .
Parsaye, K.: "Data Mines for Data Warehouses". Database Programming & Design,
September 1996.
Parsaye, K., "OLAP and Data Mining: Bridging the Gap". Database Programming & Design,
February 1997.
Parsaye, K. "Machine-Man Interaction" DM Review, September 1997.
Matthews, Robert. "Panning for Data Gold". New Scientist. May 25, 1996:30
"Large Scale Data Mining in Parallel", DBMS Magazine, March 1995.
http://www.datamining.com
http://www.datamine.inter.net/datamine

Artificial intelligence (AI)







AI AND EXPERT SYSTEMS
 


                                                                    


Abstract
Is it possible yto automate human intelligence. It is very difficult but not impossible. The subject that deals with such difficult tasks is AI.      

Expert systems are meant to solve real problems, which normally would require a specialised human expert

Today's expert systems deal with domains of narrow specialization. For expert systems to perform competently over a broad range of tasks, they will have to be given very much more knowledge. ... The next generation of expert systems ... will require large knowledge bases. How will we get them?
           
The paper gives a brief description of AI, its tasks and techniques and also shows you how a computer-based expert system emulates the behavior of a human advisor, presents terminology unique to the field and introduces the activities that must be accomplished to build expert systems.
           
Designing of expert systems, issues in designing of expert systems are described.
We have to develope a set of guidelines to determine whether a problem is suitable for an expert system solution or not. Such guidelines are clearly mentioned.

 Expert systems have many applications. These have been used to solve a wide range of problems in domains such as medicine, mathematics, engineering, geology, computer science, business, law, defence, and education. Within each domain, they have been used to solve problems of different types. Types of problem involve diagnosis (e.g., of a system fault, disease or student error); design (of a computer systems, hotel etc); and interpretation (of, for example, geological data)

















Introduction
Artificial intelligence (AI) is a broad field, and means different things to different people. It is concerned with getting computers to do tasks that require human intelligence. However, having said that, there are many tasks which we might reasonably think require intelligence - such as complex arithmetic - which computers can do very easily. Conversely, there are many tasks that people do without even thinking - such as recognising a face - which are extremely complex to automate. AI is concerned with these difficult tasks, which seem to require complex and sophisticated reasoning processes and knowledge.

People might want to automate human intelligence for a number of different reasons. One reason is simply to understand human intelligence better. For example, we may be able to test and refine psychological and linguistic theories by writing programs which attempt to simulate aspects of human behaviour. Another reason is simply so that we have smarter programs. We may not care if the programs accurately simulate human reasoning, but by studying human reasoning we may develop useful techniques for solving difficult problems.

AI is a field that overlaps with computer science rather than being a strict subfield. Different areas of AI are more closely related to psychology, philosophy, logic, linguistics, and even neurophysiology. However, as this is a CS course we'll emphasise the computational techniques used, and put less emphasis on psychological modelling or philosophical issues. We'll just briefly touch on some of the widely discussed philosophical issues below:

Some AI Tasks
Human intelligence involves both ``mundane'' and ``expert'' reasoning. By mundane reasoning we mean all those things which (nearly) all of us can routinely do (to various abilities) in order to act and interact in the world. This will include:
·        Vision: The ability to make sense of what we see.
·        Natural Language: The ability to communicate with others in English or another natural language.
·        Planning: The ability to decide on a good sequence of actions to achieve your goals.
·        Robotics: The ability to move and act in the world, possibly responding to new perceptions.

By expert reasoning I mean things that only some people are good at, and which require extensive training. It can be especially useful to automate these tasks, as there may be a shortage of human experts. Expert reasoning includes:
·        Medical diagnosis.
·        Equipment repair.
·        Computer configuration.
·        Financial planning.

Expert Systems are concerned with the automation of these sorts of tasks.
AI research is concerned with automating both these kinds of reasoning. It turns out, however, that it is the mundane tasks that are by far the hardest to automate.

AI Techniques
There are various techniques that have evolved that can be applied to a variety of AI tasks. These techqniques are concerned with how we represent, manipulate and reason with knowledge in order to solve problems.
Knowledge representation
Search
Artificial Intelligence And Expert Systems
Expert Systems are computer programs that are derived from a branch of computer science research called Artificial Intelligence (AI). AI's scientific goal is to understand intelligence by building computer programs that exhibit intelligent behavior. It is concerned with the concepts and methods of symbolic inference, or reasoning, by a computer, and how the knowledge used to make those inferences will be represented inside the machine.

Of course, the term intelligence covers many cognitive skills, including the ability to solve problems, learn, and understand language; AI addresses all of those. But most progress to date in AI has been made in the area of problem solving -- concepts and methods for building programs that reason about problems rather than calculate a solution.

AI programs that achieve expert-level competence in solving problems in task areas by bringing to bear a body of knowledge about specific tasks are called knowledge-based or expert systems. Often, the term expert systems is reserved for programs whose knowledge base contains the knowledge used by human experts, in contrast to knowledge gathered from textbooks or non-experts. More often than not, the two terms, expert systems (ES) and knowledge-based systems (KBS), are used synonymously. Taken together, they represent the most widespread type of AI application. The area of human intellectual endeavor to be captured in an expert system is called the task domain. Task refers to some goal-oriented, problem-solving activity. Domain refers to the area within which the task is being performed. Typical tasks are diagnosis, planning, scheduling, configuration and design.

Building an expert system is known as knowledge engineering and its practitioners are called knowledge engineers. The knowledge engineer must make sure that the computer has all the knowledge needed to solve a problem. The knowledge engineer must choose one or more forms in which to represent the required knowledge as symbol patterns in the memory of the computer -- that is, he (or she) must choose a knowledge representation. He must also ensure that the computer can use the knowledge efficiently by selecting from a handful of reasoning methods. The practice of knowledge engineering is described later. We first describe the components of expert systems.
The Building Blocks of Expert Systems
Every expert system consists of two principal parts: the knowledge base; and the reasoning, or inference, engine.
The knowledge base of expert systems contains both factual and heuristic knowledge.
Factual knowledge is that knowledge of the task domain that is widely shared, typically found in textbooks or journals, and commonly agreed upon by those knowledgeable in the particular field.
Heuristic knowledge is the less rigorous, more experiential, more judgmental knowledge of performance. In contrast to factual knowledge, heuristic knowledge is rarely discussed, and is largely individualistic. It is the knowledge of good practice, good judgment, and plausible reasoning in the field. It is the knowledge that underlies the "art of good guessing."

Knowledge representation formalizes and organizes the knowledge. One widely used representation is the production rule, or simply rule. A rule consists of an IF part and a THEN part (also called a condition and an action). The IF part lists a set of conditions in some logical combination. The piece of knowledge represented by the production rule is relevant to the line of reasoning being developed if the IF part of the rule is satisfied; consequently, the THEN part can be concluded, or its problem-solving action taken. Expert systems whose knowledge is represented in rule form are called rule-based systems.

Another widely used representation, called the unit (also known as frame, schema, or list structure) is based upon a more passive view of knowledge. The unit is an assemblage of associated symbolic knowledge about an entity to be represented. Typically, a unit consists of a list of properties of the entity and associated values for those properties.

Since every task domain consists of many entities that stand in various relations, the properties can also be used to specify relations, and the values of these properties are the names of other units that are linked according to the relations. One unit can also represent knowledge that is a "special case" of another unit, or some units can be "parts of" another unit.
The problem-solving model, or paradigm, organizes and controls the steps taken to solve the problem. An example will be given later.

One common but powerful paradigm involves chaining of IF-THEN rules to form a line of reasoning. If the chaining starts from a set of conditions and moves toward some conclusion, the method is called forward chaining. If the conclusion is known (for example, a goal to be achieved) but the path to that conclusion is not known, then reasoning backwards is called for, and the method is backward chaining. These problem-solving methods are built into program modules called inference engines or inference procedures that manipulate and use knowledge in the knowledge base to form a line of reasoning.

The knowledge base an expert uses is what he learned at school, from colleagues, and from years of experience. Presumably the more experience he has, the larger his store of knowledge. Knowledge allows him to interpret the information in his databases to advantage in diagnosis, design, and analysis.

Though an expert system consists primarily of a knowledge base and an inference engine, a couple of other features are worth mentioning reasoning with uncertainty, and explanation of the line of reasoning.

Knowledge is almost always incomplete and uncertain. To deal with uncertain knowledge, a rule may have associated with it a confidence factor or a weight. The set of methods for using uncertain knowledge in combination with uncertain data in the reasoning process is called reasoning with uncertainty. An important subclass of methods for reasoning with uncertainty is called "fuzzy logic," and the systems that use them are known as "fuzzy systems."

Because an expert system uses uncertain or heuristic knowledge (as we humans do) its credibility is often in question (as is the case with humans). When an answer to a problem is questionable, we tend to want to know the rationale. If the rationale seems plausible, we tend to believe the answer. So it is with expert systems. Most expert systems have the ability to answer questions of the form: "Why is the answer X?" Explanations can be generated by tracing the line of reasoning used by the inference engine.

The most important ingredient in any expert system is knowledge. The power of expert systems resides in the specific, high-quality knowledge they contain about task domains. AI researchers will continue to explore and add to the current repertoire of knowledge representation and reasoning methods. But in knowledge resides the power. Because of the importance of knowledge in expert systems and because the current knowledge acquisition method is slow and tedious, much of the future of expert systems depends on breaking the knowledge acquisition bottleneck and in codifying and representing a large knowledge infrastructure.

Knowledge engineering is the art of designing and building expert systems, and knowledge engineers are its practitioners. Gerald M. Weinberg said of programming in The Psychology of Programming: "'Programming,' -- like 'loving,' -- is a single word that encompasses an infinitude of activities." Knowledge engineering is the same, perhaps more so. We stated earlier that knowledge engineering is an applied part of the science of artificial intelligence, which, in turn, is a part of computer science. Theoretically, then, a knowledge engineer is a computer scientist who knows how to design and implement programs that incorporate artificial intelligence techniques. The nature of knowledge engineering is changing, however, and a new breed of knowledge engineers is emerging.

Today there are two ways to build an expert system. They can be built from scratch, or built using a piece of development software known as a "tool" or a "shell." Before we discuss these tools, let's briefly discuss what knowledge engineers do. Though different styles and methods of knowledge engineering exist, the basic approach is the same: a knowledge engineer interviews and observes a human expert or a group of experts and learns what the experts know, and how they reason with their knowledge. The engineer then translates the knowledge into a computer-usable language, and designs an inference engine, a reasoning structure, that uses the knowledge appropriately. He also determines how to integrate the use of uncertain knowledge in the reasoning process, and what kinds of explanation would be useful to the end user.
Next, the inference engine and facilities for representing knowledge and for explaining are programmed, and the domain knowledge is entered into the program piece by piece. It may be that the inference engine is not just right; the form of knowledge representation is awkward for the kind of knowledge needed for the task; and the expert might decide the pieces of knowledge are wrong. All these are discovered and modified as the expert system gradually gains competence.

The discovery and cumulation of techniques of machine reasoning and knowledge representation is generally the work of artificial intelligence research. The discovery and cumulation of knowledge of a task domain is the province of domain experts. Domain knowledge consists of both formal, textbook knowledge, and experiential knowledge -- the expertise of the experts.
Tools, Shells, and Skeletons
Compared to the wide variation in domain knowledge, only a small number of AI methods are known that are useful in expert systems. That is, currently there are only a handful of ways in which to represent knowledge, or to make inferences, or to generate explanations. Thus, systems can be built that contain these useful methods without any domain-specific knowledge. Such systems are known as skeletal systems, shells, or simply AI tools.
Building expert systems by using shells offers significant advantages. A system can be built to perform a unique task by entering into a shell all the necessary knowledge about a task domain. The inference engine that applies the knowledge to the task at hand is built into the shell. If the program is not very complicated and if an expert has had some training in the use of a shell, the expert can enter the knowledge himself.

Although shells simplify programming, in general they don't help with knowledge acquisition. Knowledge acquisition refers to the task of endowing expert systems with knowledge, a task currently performed by knowledge engineers. The choice of reasoning method, or a shell, is important, but it isn't as important as the accumulation of high-quality knowledge. The power of an expert system lies in its store of knowledge about the task domain -- the more knowledge a system is given, the more competent it becomes.
Bricks and Mortar
The fundamental working hypothesis of AI is that intelligent behavior can be precisely described as symbol manipulation and can be modeled with the symbol processing capabilities of the computer.
In the late 1950s, special programming languages were invented that facilitate symbol manipulation. The most prominent is called LISP (LISt Processing). Because of its simple elegance and flexibility, most AI research programs are written in LISP, but commercial applications have moved away from LISP.
In the early 1970s another AI programming language was invented in France. It is called PROLOG (PROgramming in LOGic). LISP has its roots in one area of mathematics (lambda calculus), PROLOG in another (first-order predicate calculus).
PROLOG consists of English-like statements which are facts (assertions), rules (of inference), and questions. Here is an inference rule: "If object-x is part-of object-y then a component-of object-y is object-x."
Programs written in PROLOG have behavior similar to rule-based systems written in LISP. PROLOG, however, did not immediately become a language of choice for AI programmers. In the early 1980s it was given impetus with the announcement by the Japanese that they would use a logic programming language for the Fifth Generation Computing Systems (FGCS) Project. A variety of logic-based programming languages have since arisen, and the term prolog has become generic.
Designing an Expert System
In this section we will go into more detail on how expert systems are designed and written. First the basic architecture of an expert system will be reviewed, then we will discuss the issues in designing the expert systems.

Expert System Architecture
         The figure shows the most important modules that make up a rule-based expert system. The user interacts with the system through a user interface, which may use menus, natural language or any other style of interaction. Then an inference engine is used to reason with both the expert knowledge (extracted from our friendly expert) and data specific to the particular problem being solved. The expert knowledge will typically be in the form of a set of IF-THEN rules. The case specific data includes both data provided by the user and partial conclusions (along with certainty measures) based on this data. In a simple forward chaining rule-based system the case specific data will be the elements in working memory
       
            Almost all expert systems also have an explanation subsystem, which allows the program to explain its reasoning to the user. Some systems also have a knowledge base editor, which help the expert or knowledge engineer to easily update and check the knowledge base.
         
One important feature of expert systems is the way they (usually) separate domain specific knowledge from more general-purpose reasoning and representation techniques. The general purpose bit (in the dotted box in the figure) is referred to as an expert system shell. As we see in the figure, the shell will provide the inference engine (and knowledge representation scheme), a user interface, an explanation system and sometimes a knowledge base editor. Given a new kind of problem to solve (say, car design), we can usually find a shell that provides the right sort of support for that problem, so all we need to do is provide the expert knowledge. There are numerous commercial expert system shells, each one appropriate for a slightly different range of problems. (Expert systems work in industry includes both writing expert system shells and writing expert systems using shells.) Using shells to write expert systems generally greatly reduces the cost and time of development (compared with writing the expert system from scratch).



Issues in the design of expert systems
The typical architecture of an expert system is as shown in the fig.
This Applies particularly to the early systems, and the many commercial systems that have copied them
The inference engine and knowledge base are separated because:
·        the reasoning mechanism needs to be as stable as possible;
·        the knowledge base must be able to grow and change, as knowledge is added;
·        this arrangement enables the system to be built from, or converted into, a shell.



An expert system shell is simply an expert system stripped of its knowledge base, so that a different knowledge base, probably concerned with a different knowledge domain, can replace it. Example: EMYCIN, made from MYCIN.
Choosing a Problem
Writing an expert system generally involves a great deal of time and money. To avoid costly and emabarrasing failures, people have developed a set of guidelines to determine whether a problem is suitable for an expert system solution:
1.      The need for a solution must justify the costs involved in development. There must be a realistic assessment of the costs and benefits involved.
2.      Human expertise is not available in all situations where it is needed. If the ``expert'' knowledge is widely available it is unlikely that it will be worth developing an expert system. However, in areas like oil exploration and medicine there may be rare specialised knowledge which could be cheaply provided by an expert system, as and when required, without having to fly in your friendly (but very highly paid) expert.
3.      The problem may be solved using symbolic reasoning techniques. It shouldn't require manual dexterity or physical skill.
4.      The problem is well structured and does not require (much) common sense knowledge. Common sense knowledge is notoriously hard to capture and represent. It turns out that highly technical fields are easier to deal with, and tend to involve relatively small amounts of well formalised knowledge.
5.      The problem cannot be easily solved using more traditional computing methods. If there's a good algorithmic solution to a problem, you don't want to use an expert system.
6.      Cooperative and articulate experts exist. For an expert system project to be successful it is essential that the experts are willing to help, and don't feel that their job is threatened! You also need any management and potential users to be involved and have positive attitudes to the whole thing.
7.      The problem is of proper size and scope. Typically you need problems that require highly specialized expertise, but would only take a human expert a short time to solve (say an hour, max).
It should be clear that only a small range of problems are appropriate for expert system technology. However, given a suitable problem, expert systems can bring enormous benefits. Systems have been developed, for example, to help analyse samples collected in oil exploration, and to help configure computer systems. Both these systems are (or were) in active use, saving large amounts of money.

Problems with using several experts to build a knowledge base in a particular domain.
·        Different experts may use different discriminations to arrive at the same conclusion.
·        Therefore, they are likely to produce different rules (or objects), and these are liable to conflict.
One way round this problem: get one expert to provide the knowledge in the prototype, and get others to refine it.
A Simple Example
This is much better explained through a simple example. (You should maybe look back at the notes on rule-based system if it is unclear.) Suppose that we have the following rules:
1.      IF engine_getting_petrol
AND engine_turns_over
THEN problem_with_spark_plugs
2.      IF NOT engine_turns_over
AND NOT lights_come_on
THEN problem_with_battery
3.      IF NOT engine_turns_over
AND lights_come_on
THEN problem_with_starter
4.      IF petrol_in_fuel_tank
THEN engine_getting_petrol

Our problem is to work out what's wrong with our car given some observable symptoms. There are three possible problems with the car: problem_with_spark_plugs, problem_with_battery, problem_with_starter. We'll assume that we have been provided with no initial facts about the observable symptoms.
In the simplest goal-directed system we would try to prove each hypothesised problem (with the car) in turn. First the system would try to prove ``problem_with_spark_plugs''. Rule 1 is potentially useful, so the system would set the new goals of proving ``engine_getting_petrol'' and ``engine_turns_over''. Trying to prove the first of these, rule 4 can be used, with new goal of proving ``petrol_in_fuel_tank'' There are no rules which conclude this (and the system doesn't already know the answer), so the system will ask the user:
Is it true that there's petrol in the fuel tank?
Let's say that the answer is yes. This answer would be recorded, so that the user doesn't get asked the same question again. Anyway, the systom now has proved that the engine is getting petrol, so now wants to find out if the engine turns over. As the system doesn't yet know whether this is the case, and as there are no rules which conclude this, the user will be asked:
Is it true that the engine turns over?
Lets say this time the answer is no. There are no other rules which can be used to prove ``problem_with_spark_plugs'' so the system will conclude that this is not the solution to the problem, and will consider the next hypothesis: problem_with_battery. It is true that the engine does not turn over (the user has just said that), so all it has to prove is that the lights don't come one. It will ask the user
Is it true that the lights come on?
Suppose the answer is no. It has now proved that the problem is with the battery. Some systems might stop there, but usually there might be more than one solution, (e.g., more than one fault with the car), or it will be uncertain which of various solutions is the right one. So usually all hypotheses are considered. It will try to prove ``problem_with_starter'', but given the existing data (the lights come on) the proof will fail, so the system will conclude that the problem is with the battery. A complete interaction with our very simple system might be:
System: Is it true that there's petrol in the fuel tank?
User: Yes.
System: Is it true that the engine turns over?
User: No.
System Is it true that the lights come on?
User: No.
System: I conclude that there is a problem with battery.
Note that in general, solving problems using backward chaining involves searching through all the possible ways of proving the hypothesis, systematically checking each of them. A common way of doing this search is the same as in Prolog - depth first search with backtracking.

 THE APPLICATIONS OF EXPERT SYSTEMS
The spectrum of applications of expert systems technology to industrial and commercial problems is so wide as to defy easy characterization. The applications find their way into most areas of knowledge work. They are as varied as helping salespersons sell modular factory-built homes to helping NASA plan the maintenance of a space shuttle in preparation for its next flight.
Applications tend to cluster into seven major classes.
Diagnosis and Troubleshooting of Devices and Systems of All Kinds
This class comprises systems that deduce faults and suggest corrective actions for a malfunctioning device or process. Medical diagnosis was one of the first knowledge areas to which ES technology was applied (for example, see Shortliffe 1976), but diagnosis of engineered systems quickly surpassed medical diagnosis. There are probably more diagnostic applications of ES than any other type. The diagnostic problem can be stated in the abstract as: given the evidence presenting itself, what is the underlying problem/reason/cause?
Planning and Scheduling
Systems that fall into this class analyze a set of one or more potentially complex and interacting goals in order to determine a set of actions to achieve those goals, and/or provide a detailed temporal ordering of those actions, taking into account personnel, materiel, and other constraints. This class has great commercial potential, which has been recognized. Examples involve airline scheduling of flights, personnel, and gates; manufacturing job-shop scheduling; and manufacturing process planning.
Configuration of Manufactured Objects from Subassemblies
Configuration, whereby a solution to a problem is synthesized from a given set of elements related by a set of constraints, is historically one of the most important of expert system applications. Configuration applications were pioneered by computer companies as a means of facilitating the manufacture of semi-custom minicomputers (McDermott 1981). The technique has found its way into use in many different industries, for example, modular home building, manufacturing, and other problems involving complex engineering design and manufacturing.
Financial Decision Making
The financial services industry has been a vigorous user of expert system techniques. Advisory programs have been created to assist bankers in determining whether to make loans to businesses and individuals. Insurance companies have used expert systems to assess the risk presented by the customer and to determine a price for the insurance. A typical application in the financial markets is in foreign exchange trading.
Knowledge Publishing
This is a relatively new, but also potentially explosive area. The primary function of the expert system is to deliver knowledge that is relevant to the user's problem, in the context of the user's problem. The two most widely distributed expert systems in the world are in this category. The first is an advisor which counsels a user on appropriate grammatical usage in a text. The second is a tax advisor that accompanies a tax preparation program and advises the user on tax strategy, tactics, and individual tax policy.
Process Monitoring and Control
Systems falling in this class analyze real-time data from physical devices with the goal of noticing anomalies, predicting trends, and controlling for both optimality and failure correction. Examples of real-time systems that actively monitor processes can be found in the steel making and oil refining industries.
Design and Manufacturing
These systems assist in the design of physical devices and processes, ranging from high-level conceptual design of abstract entities all the way to factory floor configuration of manufacturing processes.

Conclusion
In the 1980s, expert systems were seen as a very important new technology, and the governments of industrialised nations responded by investing heavily in expert systems research. While the confident predictions made at this time have not been entirely fulfilled, there has nevertheless been a flowering in the field, with the number of successful commercial applications increasingly rapidly, year by year. Expert systems are easily the most commercially important area of applied artificial intelligence technology.

Expert system technology is widespread and deeply imbedded

Early in its history, commercial expert systems tools were written primarily in LISP and PROLOG, but more recently the trend has been to conventional languages such as C.

            Now expert systems are present in almost all the areas of industry and government trying to solve part, or perhaps all, of the practical, significant problems that previously required sparce human expertise.

                                                                                                                                                                                                                                                                                                                                                                                                              Reference:
                                                                                      Introduction to AI and Expert Systems
     by Carol E. Brown and Daniel E. O'Leary.
                                                                                      And
                                                                                      Expert systems
                                                                                         by  George F. Luger's