سعید تمر

آینده خوشی برای همه میخواهم به روشی منظم

سعید تمر

آینده خوشی برای همه میخواهم به روشی منظم

این دفتر شلوغ و بینظم من است. دفتر شخصی.
شاید چند پیوند مفیدی برایتان وجود داشته باشد.
به زودی افتتاح سایت رسمی خودم را در اینجا اعلام خواهم کرد.
tamar.ir
---------------
خوش به حال آنانی که چهل سال اول زندگی خود را به خودشان اختصاص می دهند و باقی را وقف مردم می کنند
و
بدا به حال آنانکه جابه جا

تبلیغات
Blog.ir بلاگ، رسانه متخصصین و اهل قلم، استفاده آسان از امکانات وبلاگ نویسی حرفه‌ای، در محیطی نوین، امن و پایدار bayanbox.ir صندوق بیان - تجربه‌ای متفاوت در نشر و نگهداری فایل‌ها، ۳ گیگا بایت فضای پیشرفته رایگان Bayan.ir - بیان، پیشرو در فناوری‌های فضای مجازی ایران
طبقه بندی موضوعی
پیوندها

Data Mining: Input and Output

The definition given in the previous item implies data mining needs inputs and outputs.

Inputs could be:

  1. concept, the things that is to be learned;
  2. instance, an individual and indipendent example of concept to be learned (classified, associated or clustered). Each instance is characterized by the values of a set of predetermined attributes;
  3. attribute, feature of instances which assumes a value choosen by a predetermined set.

The row of a database represents an instances while the colomn an attribute. If the whole set of instances has the attribute values not null then the database is called dense, while if at least one instance has at least one attribute value null then the database is called sparse.

The mmost common way of representing outputs is reported below.

  1. Decision table. It is a schema of row and column just the same as the input but reduced to instance and attribute of interest. The figure reported below shows an example of decision table.

  2. Decision tree. This structure derived from graph theory. It has nodes, which contain information, and edgeswhich link two nodes. If the edge is directed, as an arrow, it links the head node to the tail. The root is the head of the whole tree, while the leaf is one of the tail of the whole tree. The figure reported below shows a decision tree in which "A" is the root and "C", "D", "D" are the leafs.

  3. Rule. A rule has the antecedent or precondition, a logical expression, and a consequent or conclusion which represents predicate.

    There exist two kind of rule: classification rule and association rule.

    In the classification rules the consequent gives the class or classes that apply to instances covered by the rule and all the tests must succeed if the rule is to fire. Rules can be gruoped in a set a fired in order as a decision list. Each rule seems to represent an indipendent nugget of knowledge, so that new rule can be added to an existing rule set without disturbing ones already there, but it is important to consider how the set of rules is executed. The figure below illustrates two examples of classification rules.

    Different from classification rules, association rules can predict any attrivute not just the class, and this allows to predict combinations of attribute too.

    Beacuse so many different association rules can be derived from even a tiny database, interest is restricted to those that apply to a reasonably large number of instances and have and high accuracy.

    The two most important measures are: support, or coverage, is the number of instances in which the rule occurs;confidence, or accuracy, is the number of instances in which the rule occur with respect to those instances which contains just only the antecedent of the same rule.

    Tipically an association rule must have a minimum value of support and confidence.

    The figure below illustrates two examples of association rules.


  4. Cluster. It grupos similar instances according to one or more characteristics. The output takes the form of diagram that shows how the instances fall into the cluster. Some algorihtms allow one instance to belong to more than one cluster. The figure reported below illustrates how 5 instances have been clustered in 2 different clusters.


Data Mining e Machine Learning

The amount of data in the world is going on and on increasing. Computers allow us to save it. Decisions, needs, activities and so on coulb re reduced to a record in a database. As the amount of data incereases, the proportion of if that people understand decreases. Lying hidden in all this data is potentially useful information that is rarely made explicit.

According to Piatesky-Shapir "Knowledge Discovery is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data". Data mining is the automatic, or semiautomatic, process of discovering patterns in data. People have been seeking patterns since human life began. Hunters seek patterns in animal migration behavior; resellers seeks patterns in buyer behavior.

It has been estimated that the amount of data stored in databses doubles evey 20 months, thus data mining is the only hope for elucidating pattern. Furthrmore data mining can lead to new insight and, in commercial settings, to competitive advantages.

Pattern have to:
  1. be meaningful in that they lead to some advantage, usually economic advantage;
  2. allow non trivial forecasting;
  3. infer previously unknown information.

There are two extremes for the expression of a pattern: black box and white box. Innards of a black box pattern are effectively incomprehensible, while construction of white box reveals the structure of the pattern so that it could be examined and reasoned about.

Most of the tecniques for searching and reason about pattern belongs to machine learning. The main learning tecniques are:
  1. classification, the learning scheme is a set of classified examples from which it is expected to learn a way of classifying unseen examples;
  2. association, relations beetween instances are sought;
  3. clustering, instances belonging to the same category are grouped in a cluster;
  4. numeric prediction, the outcome of prediction is a numeric quantity.
  5. Most of time the terms machine learning and data mining vengono are commonly confused, as they often employ the same methods and overlap significantly. Machine learning focuses on prediction, based on known properties learned from the training data while data mining focuses on the discovery of unknown properties on the data. The two areas overlap in many ways: data mining uses many machine learning methods, but often with a slightly different goal in mind. On the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy.
سعید تمر

data mining

نظرات  (۰)

هیچ نظری هنوز ثبت نشده است

ارسال نظر

ارسال نظر آزاد است، اما اگر قبلا در بیان ثبت نام کرده اید می توانید ابتدا وارد شوید.
تجدید کد امنیتی