Word Count: 1901
Rapid urbanization is a central characteristic of the current world’s globalization and human development. The world’s population is now over 7.8 billion, and the adding of one billion of them will merely takes the world twelve years; What is more astonishing is that half of world’s population live in cities that only cover three per cent of the earth’s land surface. It is indisputable that the booming population has brought enormous economic prosperities and social changes; however, the skyrocketing number of people living in cities has also caused many problems that affected the urban environment, lowering the living quality of the citizens. It includes but is not limited to noise, traffic congestion, pollution, homelessness, and littering. Obviously, there are no simple solutions for these stubborn problems as we continue to witness them every single day in cities like New York, London, Tokyo, Shanghai, and almost every metropolitans of the world. What lies in the center of the problem is the complicated nature of city: Since city is a complex system where millions behave and act independently every day, with massive amount of information produced every second, it is extremely difficult for scientists and policy makers to extract the correct and right type of data from the population body in order to serve as references in policy making, which is necessary to tackle the urbanization problems and optimize urban living environments.
So far, we have identified the underlying problem with cities, and it is crystal clear why should we make every effort to solve it — to improve people’s well-being and citizen’s happiness, furthermore to increase the efficiency of cities as a complex system. It also corresponds to the Sustainable Cities and Communities goal adopted as one of the 17 goals in United Nation’s The 2030 Agenda for Sustainable Development.
We have answered what and why, now we need to address the most important how. In the following paragraphs, I will examine some of the existing geospatial technologies that are considered applicable under the urbanization context, identify the pros and cons, and discuss future improvements of the technologies in order to address the board research question that how can data science methods and technologies be practically applied to the field of urbanization study, help us understand the population trends and behaviors in this urban complex adaptive system. Finally, the research proposal will further investigate the problem, proposing a finer approach that not only develops a more comprehensive data science model but also introduces an inclusive research plan that outlines the steps to make the proposal come true in reality.
The first introduced existing data science research method that fits the research question well is Machine Learning. By definition, machine learning uses computer algorithms that is able to improve itself through the use and experience of data. It constructs a model based on training data and is capable to make predictions or decisions without explicit programming for such functions. In its specific applications that utilizes geospatial datasets including VIIRS Nighttime Light and MODIS Daytime MDVI data, machine learning is able to measure the extents of urban areas with a high accuracy > 95% (Liu 2019). Since the dynamic of machine learning requires the researchers to build different models which classify the data by taking a weighted vote of the individual classifier predictions, in the study the author utilizes Random forest (RF), gradient boosting machine (GBM), neural network (NN), and their ensemble (ESB), with the same datasets of nighttime and daytime data as inputs, explores and compares their effectiveness. Below is the entire workflow of this study, the sections highlighted in blue are the machine learning applications of it. As we can see, the three trained models give their own predictions of the degree of urban extent, then, a weighted vote (ESB) will be taken into account, generating modified result; finally, it will be compared with the sample to determine the accuracies of machine learning in this study.
Now we have understood the mechanisms of machine learning’s role in urban data extraction. Another important research method that is really popular in urban-related studies is Agent-Based Modeling (ABM). An agent-based model is a class of computational models that simulating the actions and behaviors of autonomous individual to assess their effects on the system as a whole. In other words, it models behaviors of citizens and observe the consequence effects of their daily actions on the city itself. In a study that utilizes ABM to simulate human exposure to urban environmental stresses (Yang 2018). Although it does not completely align with our original research question of urban data extraction for city improvement, we are still able to gain valuable insights of the dynamics, mechanism, and unique characteristics of Agent-Based Modeling. It first constructs the framework with three overlapping layers: “spatial data of the concerned urban environment, concentrations of environmental stress sources, and human activities.” Within this framework, environmental stress sources that vary by times of a day are seen as factors that influence the exposures of individual agents, who “dynamically follow their daily life according to predetermined rules that are set according to empirical studies and specific surveys.” During the simulation, the model collects and summarizes “both individual and collective exposure and inform relevant exposure reduction strategies,” so the pollution exposures to human beings can be measured and analyzed. Due to the fact that the urban population is extremely diversified and they all behave in their unique ways, the researchers “group people with similar attributes and behaviors” with all kinds of personal characteristics including “age, gender, work, income, education, living and working location, and access to cars or public transport, as well as the environmental conditions.” Below is an example of a 35 years old employed female’s daily routine, developed by the researchers that seeks to mimic the schedule of that specific group of citizens, in which p is the priority of that action compared with other actions.
So far, we have recognized and examined existing data science research methods as well as their mechanisms. But what might be some gaps or shortcomings of the current methodologies? One possible limitation of the proposed research methods is the inability to fully verify the accurateness of the commutated outcome. For example, the ABM model was successfully deployed to measure citizen’s exposure to environmental stressors in the city of Hamburg, but only reached an conclusion in an idealized situation; Although the accuracy rate for the Machine Learning study is high, it is calculated through inputting decades-old data and comparing the outcome with the current city development extent; The accuracy improved significantly compared with the 20-years-older GRUMP dataset, however the new dataset’s ability to predict future remains unknown.
Based on the identified gaps as well as the characteristics of the central problem and different methodologies, I would like to propose a new model that dynamically combines both the data science research method Machine Learning and Agent-Based Modeling. Through this innovation, the scientists and city planners will be able to create a digital copy of the target city, simulating the exact environment of the city, actions of citizens, as well as outside factors such as weather and pollution. It includes the best features of both methods: the realistic simulations from agent-based modeling plus the superior predictability from machine learning. In the most idealized situation, this powerful model will be able to mirror the effects of an external factor, such as a policy of increasing the parking fines, the construction of a new bridge, or a sandstorm, on the city itself as well as the consequent change in behaviors of citizens. The researchers and planners will be able to observe the effects first and foremost; additionally, the power of machine learning, with the training from sample data, will be able to make accurate predictions that reflect the consequences of such actions in the future. In other words, it has the ability to calculate the resulting influences of the factor on the city within the time range of one day, several weeks, or even a month and beyond.
Basically, it is an experimental version of cities; however, thanks to the digitalization and the power of data science, unlike realistic cities, it allows infinite number of mistakes. The most outstanding benefit and significance of this research is the minimization of the costs of deficient policy implementations; people no longer need to consider the time and money wasted on the possibly vain policies, as every policy can now be tested and analyzed online, drastically decreasing the cost.
As I have stated in the paragraph above, this is a research project that necessitates investment now, but returns a major saving once the model is realized by minimizing testing costs on real cities. It holds the strength with both data science models but exclude the weaknesses of each. Furthermore, it will not be a money-consuming project, as most of the development process is conducted digitally, including the integration of system, input of data, programming codes and maintenance. To sum up, it is a worthy-investing research project that not only makes much economic sense, but also is a valuable innovation for data science, as it explores the integration of different research methods.
However, it is also important to acknowledge the possible setbacks of this project. First of all, because it is first of its kind, it is unknown that how much time precisely do we need to seamlessly incorporate the two distinct methodologies together. Furthermore, It is highly likely for people to increase their reliance on these types of technologies due to its convenience, which in turns reduce the incentives for further innovation and research awfully since the current ones are useful enough.
This argument is similar to an opinion provided by Chris Anderson, who in his article The End of Theory: The Data Deluge Makes the Scientific Method Obsolete argues that the rapid development of big data will be able to replace traditional experiment methods completely. Instead of focusing solely on the outcome, we should weigh the outcome and the exploration process equally in order to encourage new researches and discoveries.
Last but not least, this proposal will introduce a strategic research plan that has a budget of $100,000 and an exploratory phase of one year. Of the $100,000, the majority will be devoted for the software development and research of this model. Around $60,000 will be spent in this criterion alone; another important part would be the hardware support, as well as the necessary purchase of equipment, collections of datasets and databases; an estimated amount of $25,000 will be spent in this category. After the initial development, a field test is necessary to experiment the actual functionality of the model. The researchers will select an adequate city, and the planning, mapping, and testing will cost the project around $7,000. The use of the rest $8,000 will be determined as the project proceeds, to supplement any parts that requires additional funding.
This is my research proposal to solve the central question, how can data science methods and technologies be practically applied to the field of urbanization study, help us understand the population trends and behaviors in this urban complex adaptive system. It provides an innovative and dynamic approach to the persisting problems in urbanization. My research project will be a meaningful one that everyone will benefit from its outcome in the long run. We are learning from the cities; and we are using what we have learned to optimize them.
Anderson, C. (2018, January 26). The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired. https://www.wired.com/2008/06/pb-theory/
Liu, X., de Sherbinin, A., & Zhan, Y. (2019). Mapping Urban Extent at Large Spatial Scales Using Machine Learning Methods with VIIRS Nighttime Light and MODIS Daytime NDVI Data. Remote Sensing, 11(10), 1247. doi:10.3390/rs11101247
United Nations. (n.d.). THE 17 GOALS | Sustainable Development. United Nations. https://sdgs.un.org/goals. |
Yang, L., Hoffmann, P., Scheffran, J., Rühe, S., Fischereit, J., & Gasser, I. (2018). An Agent-Based Modeling Framework for Simulating Human Exposure to Environmental Stresses in Urban Areas. Urban Science, 2(2), 36. doi:10.3390/urbansci2020036