Data Mining and its environmental applications
Data Minig : the practice of examining large pre-existing databases in order to generate new information, it is sorting through data to identify patterns and establish relationships. Data mining techniques are used in a many research areas, including mathematics, cybernetics, genetics and marketing.The predominant activity within the method of knowledge discovery in databases, is anxious with discovering patterns in data. Most commonly, the input to a data mining algorithm is a single table comprising a wide variety of attributes and records. When data from a few tables in a database desires to be taken into consideration, it is left to the user to manage the important tables. In most cases, this effect in a single table, which is then used as input to a data mining algorithm. The output of a data mining algorithm is a pattern or a set of patterns which possibly legitimate within the given data. A pattern is defined as a statement in a given language, that describes the information in a subset of the given data and is less complicated than the enumeration of all suggestions within the subset . Exact guides of sample languages are viewed in data mining: they depend on the data mining enterprise at hand. Natural representatives are equations; classification and regression; and organization, classification, and regression concepts.
The term Knowledge Discovery in Databases, or KDD for short, refers to the broad process of finding knowledge in data, and emphasizes the ”high-level” application of particular data mining methods. KDD has been defined as: ”the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable pattens in data” (Fayyad, et al., 1996, p. 6). It is of interest to researchers in machine learning, pattern recognition, databases, statistics, artificial intelligence, knowledge acquisition for expert systems, and data visualization. Data mining (DM) is a step in the KDD process concerned with applying computational techniques (i.e., data mining algorithms implemented as computer programs) to actually find patterns in the data. In a sense, data mining is the central step in the KDD process. The other steps in the KDD process are concerned with preparing data for data mining, as well as evaluating the discovered patterns. The mentioned definition contain very imprecise notions, such as knowledge and pattern. To make these more precise, some explanations are necessary concerning data, patterns and knowledge, as well as validity, usefulness, and understandability. For example, the discovered patterns should be valid on new data with some degree of certainty .The patterns should potentially lead to some actions that are useful .Patterns can be treated as knowledge: according to Frawley “a pattern that is interesting and certain enough is called knowledge.” Data is facts and statistics collected together for reference or analysis.A database is a collection of information that is organized so that it can easily be accessed, managed, and updated.The output of a data mining algorithm is a pattern or a set of patterns that are valid in the given data. A data mining algorithm is a set of heuristics and calculations that creates a data mining model from data. To create a model, the algorithm first analyzes the data you provide, looking for specific types of patterns or trends.A given data mining algorithm will typically have a built-in class of patterns that it considers: the particular language of patterns considered will depend on the given data. Many come form the fields of machine learning and statistics. Environmental science is an interdisciplinary academic field that integrates physical, biological and information sciences to the study of the environment, and the solution of environmental problems.Ecology is a typical representative of environmental sciences, which studies the relationships among members of living communities and between those communities and their non-living environment.
Data mining parameters include
This section defines the main data mining tasks and parameters addressed when different type of data which is typically considered by data mining algorithms, is given.