2.2 Non-spatial Attribute Data
Attributes are the non-spatial characteristics that describe spatial objects. Attributes are commonly arranged in tables were a row is equivalent to one entity and a column is equivalent to one attribute, or descriptor, of that entity. Typically, each row relates to a single object and a geospatial data model. Typically, each object will have multiple attributes that describe the object, usually in what is called an attribute table.
Let’s say we have a spatial data model that stores the location of fire hydrants. For each fire hydrant, to represent the object, we would store the position. In addition to the positional information, we will also store attributes that will describe those fire hydrants. In this example, we are storing color, service state, and flow as three attributes that describe with this particular fire hydrant at this particular position on earth. The position, color, service state, and flow will be stored as one row in an attribute table that will contain four columns because there are four descriptors for this fire hydrant.
Attributes can store all kinds of different descriptive statistical information, which can be broken down into four different categories: nominal, ordinal, interval, and ratio.
A nominal attribute data provides descriptive information about the object such as the color of the object, the name of an object so for instance a city name, or the type of an object. What’s important here is that this descriptive information does not imply any order, size, or any other quantitative information. That means that you cannot state that one attribute is greater than or less than another attribute or you cannot multiply attributes together, so for instance, it does not make sense to multiply the color blue by the color red. The only comparisons you can do with nominal attributes are to check whether to attributes are equal or not equal.
In addition to text descriptions, the nominal attribute category includes descriptive information such as images, movies, and sounds, for example.
The next attribute category is ordinal attribute data, which imply a ranking or order based on their values. These values can be descriptive text, or numerical. For example, I can describe an object as having a high/medium/low ranking, or a ranking of 100/50/1. In either case, these ordinal attributes allow us to specify rank only, and not scale. So for instance, we can state that high is ordered higher than low, and high is ordered higher than medium, and low is ordered lower than high, but we cannot say that high is twice as high as medium, and medium as twice as high as low. Additionally, if the numerical attributes are of the ordinal attribute category, again we can say that 50 is ordered higher than 20 and 20 is ordered higher than 10 but we cannot say that 50 is twice as high as 25 and 25 is twice as high as 12 ½. Even though we are using numbers to describe a rank, do not let that confuse you into thinking that a scale is implied.
The third entry category is interval attribute data. Interval attributes imply a rank order and magnitude or scale. Interval attributes use numbers, however, those numbers do not have a natural zero, and use an arbitrary zero point instead. For instance if we look at temperature on the Fahrenheit scale, 0°F is not a natural zero point for temperature, it is a human defined zero point. Therefore, while we can say that 50°F is 10°F more than 40°F, we cannot say that 50°F is twice as hot as 25°F, again, because 0°F is a human created zero, and not a natural phenomenon. With an interval attribute, addition and subtraction to make sense but not multiplication since values are relative from that arbitrary zero.
The fourth and final category is the ratio attribute data. A ratio attribute implies both rank order and magnitude about a natural zero. Ratio data, unlike interval attribute data, use numerical attributes of addition, subtraction, multiplication, and division where there is an absolute natural zero. So for example, if we are measuring speed in miles per hour, then a car not moving at all is moving at zero miles per hour. In terms of temperature, the only measurement that uses a natural zero is Kelvin, which has absolute zero. At that point, molecular movement ceases to exist.
Now you know the four different attribute categories, let’s take a look at an example data set and its related attribute table, and try to identify each column as holding nominal, ordinal, interval, or ratio data. The data set we are looking at contains four objects, and each one of those objects represents a tree. Each object has four attributes showing in the attribute table: ID, height in feet, type, and class.
Let’s finish talking about attribute data types. Computers fundamentally “think” differently than humans. While humans see numbers, letters, pictures, and sounds, a computer only sees zeros and ones, or binary data. Therefore, we need a way to translate the numbers, sounds, and videos, as humans know it, to a form in which a computer can understand, and store the information. Computer scientists have created data structures that can be used by us to translate information into a format which the computer can store in its memory, called a data type. There are four typical data types that we use in GIS: integer, float/real, text/string, and date. It is important that we specify which data type we are going to use to store information in the computer’s memory so that we may use the memory in the most efficient manner and let the computer know which operations are allowed for each data point stored in that memory location using that the data type.
The first data type is the integer, which is a whole number, such as the number one, the number 2458, and the number -54. Integers can be used for mathematical calculations; however, any resulting fraction of a whole number will be rounded, or truncated.
The float, or real, data type holds a decimal number such as the number 1.452, the number 254,783.1, or -845.157. Like the integer data type, the float or real data type can be used for mathematical calculations. No rounding or truncation will take place when using float or real numbers, depending on the number of significant digits you have specified.
The text, or string, data type contains characters such as character “A”, the characters “GIS”, the characters “125 Main St.”, or the number “9”. Even though the text may contain numbers, it is important to note that they cannot be used for mathematical calculations. However, strings can be manipulated to find substrings, or to cut strings and locations.
The last common data type is date. The date data type holds time and date information such as 12/10/2010, or 10/12/10, or December 10, 2010. The date data type cannot be used for mathematical calculations however, it can be used to determine and calculate lengths of time between two different dates or times. Additionally, the computer stores the date information in its own internal data structure, but can be formatted to output the date in many different ways, as shown in these examples.
Let’s say we have a spatial data model that stores the location of fire hydrants. For each fire hydrant, to represent the object, we would store the position. In addition to the positional information, we will also store attributes that will describe those fire hydrants. In this example, we are storing color, service state, and flow as three attributes that describe with this particular fire hydrant at this particular position on earth. The position, color, service state, and flow will be stored as one row in an attribute table that will contain four columns because there are four descriptors for this fire hydrant.
Attributes can store all kinds of different descriptive statistical information, which can be broken down into four different categories: nominal, ordinal, interval, and ratio.
A nominal attribute data provides descriptive information about the object such as the color of the object, the name of an object so for instance a city name, or the type of an object. What’s important here is that this descriptive information does not imply any order, size, or any other quantitative information. That means that you cannot state that one attribute is greater than or less than another attribute or you cannot multiply attributes together, so for instance, it does not make sense to multiply the color blue by the color red. The only comparisons you can do with nominal attributes are to check whether to attributes are equal or not equal.
In addition to text descriptions, the nominal attribute category includes descriptive information such as images, movies, and sounds, for example.
The next attribute category is ordinal attribute data, which imply a ranking or order based on their values. These values can be descriptive text, or numerical. For example, I can describe an object as having a high/medium/low ranking, or a ranking of 100/50/1. In either case, these ordinal attributes allow us to specify rank only, and not scale. So for instance, we can state that high is ordered higher than low, and high is ordered higher than medium, and low is ordered lower than high, but we cannot say that high is twice as high as medium, and medium as twice as high as low. Additionally, if the numerical attributes are of the ordinal attribute category, again we can say that 50 is ordered higher than 20 and 20 is ordered higher than 10 but we cannot say that 50 is twice as high as 25 and 25 is twice as high as 12 ½. Even though we are using numbers to describe a rank, do not let that confuse you into thinking that a scale is implied.
The third entry category is interval attribute data. Interval attributes imply a rank order and magnitude or scale. Interval attributes use numbers, however, those numbers do not have a natural zero, and use an arbitrary zero point instead. For instance if we look at temperature on the Fahrenheit scale, 0°F is not a natural zero point for temperature, it is a human defined zero point. Therefore, while we can say that 50°F is 10°F more than 40°F, we cannot say that 50°F is twice as hot as 25°F, again, because 0°F is a human created zero, and not a natural phenomenon. With an interval attribute, addition and subtraction to make sense but not multiplication since values are relative from that arbitrary zero.
The fourth and final category is the ratio attribute data. A ratio attribute implies both rank order and magnitude about a natural zero. Ratio data, unlike interval attribute data, use numerical attributes of addition, subtraction, multiplication, and division where there is an absolute natural zero. So for example, if we are measuring speed in miles per hour, then a car not moving at all is moving at zero miles per hour. In terms of temperature, the only measurement that uses a natural zero is Kelvin, which has absolute zero. At that point, molecular movement ceases to exist.
Now you know the four different attribute categories, let’s take a look at an example data set and its related attribute table, and try to identify each column as holding nominal, ordinal, interval, or ratio data. The data set we are looking at contains four objects, and each one of those objects represents a tree. Each object has four attributes showing in the attribute table: ID, height in feet, type, and class.
Let’s finish talking about attribute data types. Computers fundamentally “think” differently than humans. While humans see numbers, letters, pictures, and sounds, a computer only sees zeros and ones, or binary data. Therefore, we need a way to translate the numbers, sounds, and videos, as humans know it, to a form in which a computer can understand, and store the information. Computer scientists have created data structures that can be used by us to translate information into a format which the computer can store in its memory, called a data type. There are four typical data types that we use in GIS: integer, float/real, text/string, and date. It is important that we specify which data type we are going to use to store information in the computer’s memory so that we may use the memory in the most efficient manner and let the computer know which operations are allowed for each data point stored in that memory location using that the data type.
The first data type is the integer, which is a whole number, such as the number one, the number 2458, and the number -54. Integers can be used for mathematical calculations; however, any resulting fraction of a whole number will be rounded, or truncated.
The float, or real, data type holds a decimal number such as the number 1.452, the number 254,783.1, or -845.157. Like the integer data type, the float or real data type can be used for mathematical calculations. No rounding or truncation will take place when using float or real numbers, depending on the number of significant digits you have specified.
The text, or string, data type contains characters such as character “A”, the characters “GIS”, the characters “125 Main St.”, or the number “9”. Even though the text may contain numbers, it is important to note that they cannot be used for mathematical calculations. However, strings can be manipulated to find substrings, or to cut strings and locations.
The last common data type is date. The date data type holds time and date information such as 12/10/2010, or 10/12/10, or December 10, 2010. The date data type cannot be used for mathematical calculations however, it can be used to determine and calculate lengths of time between two different dates or times. Additionally, the computer stores the date information in its own internal data structure, but can be formatted to output the date in many different ways, as shown in these examples.