Data
Collection of raw facts and figures
Basically everything we can consume and process on.
Attribute
An attribute is a property or characteristic of an object
Dimensions or Columns which define a data instance
Example
- Color
- Model
- Gender
Nominal
- Categorical
- ID numbers, eye colour, zip codes
Ordinal
- Categorical
- Rankings (e.g., taste of potato chips on a scale from 1-10), grades, height as {tall, medium, short}
Interval
-
Numerical
-
Has an arbitrary zero point
- This zero point does not represent the absence of the attribute but it represents some actual meaning
-
Differences between values are meaningful but not necessarily their ratios
-
Calendar dates, temperatures in Celsius or Fahrenheit.
Thus, Fahrenheit and Celsius temperature scales differ in terms of where their zero value is and the size of a unit (degree). This is the reason their conversions involves addition / subtraction of 32.
Ratio
-
Numerical
-
Has a true zero point
- This zero point refers to the absence of the attribute
-
Difference between values as well as ratio is meaningful
-
Temperature in Kelvin, length, counts, elapsed time (e.g., time to run a race)
Measured in units such that each unit can be converted to another just by dividing or multiplying
Take for example, length can be measured in meters or feet and thus conversions can easily be done done my dividing or multiplying
The key differences between interval and ratio scales can be summarized as follows:
- Zero Point: Interval scales have an arbitrary zero point, whereas ratio scales have a meaningful zero point.
- Negative Values: Interval scales can include negative values, while ratio scales typically do not have negative values because you cannot have less than nothing of the measured attribute.
- Mathematical Operations: Ratio scales allow for a wider range of mathematical operations, including the calculation of ratios.
Learn to classify these in: Attribute Classification
Data Object
The collection of attributes is called an object
Can also be referred to as An Instance
DataSet
Collection of data objects and their attributes
Induction
The process of training the model
Deduction
The process of mapping the input variable to target variable using the pre-trained model