Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. The topmost decision node in a tree which corresponds to the best predictor called root node.

The core algorithm for building decision trees called **ID3** which employs a top-down, greedy search through the space of possible branches with no backtracking. It uses **Entropy** and **Information Gain** to construct a decision tree.

**Entropy:** ID3 algorithm uses entropy to calculate the homogeneity of a sample. If the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has entropy of one.

**Information Gain:** The information gain is based on the decrease in entropy after a dataset is split on an attribute. Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous branches).

It converts dataset into query statements and then draw the tree.

### Advantages:

- It does not require any domain knowledge.
- It is easy to assimilate by human.
- Learning and classification steps of decision tree are simple and fast
- Decision trees can handle both categorical and numerical data.
- Performs well on large datasets
- Robust

### Disadvantages:

- Such algorithms cannot guarantee to return the globally-optimal decision tree
- Decision-tree learners can create over-complex trees that do not generalize well from the training data (Over fitting)
- Information gain in decision trees is biased in favor of those attributes with more levels
- It has issues when there are many missing values in dataset.