Dissertation Topic

THE MODIFIABLE AREA UNIT PROBLEM:
EMPIRICAL ANALYSIS BY STATISTICAL SIMULATION

By Harold David Reynolds
Doctor of Philosophy
Graduate Department of Geography
University of Toronto
1998


Abstract

    The Modifiable Area Unit Problem (MAUP) has been discussed in the spatial analysis literature since the 1930’s, but it is the recent surge in the availability of desktop computing power and Geographical Information Systems software that have caused both a resurgence of interest in the problem and a greater need to learn more about it.  Many spatial datasets are collected on a fine resolution (i.e. a large number of small spatial units) but, for the sake of privacy and/or size concerns, are released only after being spatially aggregated to a coarser resolution (i.e. a smaller number of larger spatial units).  The chief example of this process is census data which are collected from every household, but released only at the Enumeration Area or Census Tract level of spatial resolution.  When values are averaged over the process of aggregation, variability in the dataset is lost and values of statistics computed at the different resolutions will be different; this change is called the scale effect.  One also gets different values of statistics depending on how the spatial aggregation occurs; this variability is called the zoning effect.  The purpose of studying the MAUP is to try to estimate the true values of the statistics at the original level of spatial resolution.  Knowing these would allow researchers to attempt to make estimates of the data values using either synthetic spatial data generators like the one described in this thesis or by other techniques.

    Many studies of the MAUP have been made using specific datasets and examining various statistics, such as correlations.  Although interesting properties have been documented, this approach is ultimately unsatisfactory because researchers have had no control over the various properties of the datasets, all of which could potentially affect the MAUP.  This research has focused on the creation of a synthetic spatial dataset generator that can systematically vary means, variances, correlations, spatial autocorrelations and spatial connectivity matrices of variables in order to study their effects on univariate, bivariate, and multivariate statistics.

    Even though the MAUP has traditionally been written off as an intractable problem, results from the various experiments described in this thesis indicate that there is a degree of regularity in the behaviour of aggregated statistics that depends on the spatial autocorrelation and configuration of the variable values.  If the MAUP can be solved, however, it is clear that it will likely be a complex procedure.


The above is the abstract of my thesis. I have converted the files into PDF format for ease of reading and printing, should you be interested enough to want to read the whole thing. If you do not happen to have the Adobe Acrobat reader, follow this link.

I hope that this is useful to anyone interested in the topic. Please let me know!

Added to my site July 23, 2001. Far too late, as far as I'm concerned...

NOTE: The diagrams files may take some time to come up in your browser, especially if your computer is a bit slow. You may even have to select the link in the location bar and hit Enter again to make any of these load up, depending on your system. (Mine is a bit cranky about pdfs for some reason...)


Errata

In the course of rewriting the dataset creation code in VBA for Excel, I discovered that the original worked example in my thesis had a couple of errors, and also needed some clarifications. I should have had more zones in the sample area, or used fewer variables, because the use of the zero eigenvalue's eigenvector was impossible (all the same values), and one has to use all different eigenvectors to construct the vectors vi. In addition, the last MC of 0.13 was a bit too small - the correct required value of the last MC was less than the smallest eigenvalue. I added two more zones and modified others, and boosted the MC of the last variable to 0.131. There are now enough eigenvectors to allow construction of the variables. Normally, this is not a problem, since the number of regions is usually far greater than the number of variables to create. I sincerely apologize if this has caused anybody any confusion!

Harold Reynolds, August 27, 2004.

The Spatial Dataset Generator

I do not know why I did not include the Excel workbook that I mentioned above on the website! Only just now (September 24, 2009) in the course of looking to help another student with the topic did I remember that I had this program. The workbook is here: DatasetGenerator.xls. The first worksheet contains detailed instructions, and the code behind it is (I hope) reasonably clearly written and commented. There's no real interactive component (i.e. you can generate the eigensystem in one step and then experiment with it), but I'm sure an enterprising programmer could make one (or I could at some point, if I can find the time...). It would be interesting to see the effects of combining 3 or 4 or more eigenvectors, instead of just 2... but of course you need a GIS program to properly display the results.

Legal Stuff: The workbook and code are provided as is. I'm pretty sure everything works as it's supposed to, but you should always check for yourself!


File Name Size
Chapters 1, 2 and 3 122 K
Chapter 4 60 K
Chapter 4 diagrams 476 K
Chapter 5 54 K
Chapter 5 diagrams 548 K
Chapter 6 42 K
Chapter 6 diagrams 316 K
Chapter 7 4 K
Errata, Section 3.1.4 20 K

Return to my Home Page

This page last updated .