Project 1 - Geospatial Data for Social Distancing

CIS 422 Software Methodologies - A. Hornof - April 8, 2020

Addendums

Please see the Project 1 Addendums for new information on the project.

Deadlines

Initial Project-Plan/SRS/SDS documents are due at 9AM on April 16, 2020.

The final project is due at 10PM on April 26, 2020. Each student should then use their group's system to collect data for one week, for seven twenty-four hour periods, from April 27 to May 3. Each group's data should then be aggregated and submitted by 10PM on May 4.

The course-related links at the end of this document provide guidance on what exactly to submit for each deadline.

Overview

In Project 1, you will build systems that will collect geospatial data from cell phones. Group members will then use these systems to collect geospatial data of their movements for a week. The class will then pool (aggregate) the data to create a real-world dataset that could be used, in Project 2, as a set of real-world data for a social distancing (or "spatial distancing") application.

Looking to the future, possible Project 2s could include, for example, systems to assist with the "load balancing" of people visiting grocery stores and other essential business, or exercising using jogging paths, hiking trails, and parks. Project 2s could reveal the best times to visit stores to avoid lines, and perhaps recommend frequency of visits to grocery stores (such as perhaps one visit every two weeks) and time spent in the store, to collectively reduce wait times outside of grocery stores.

Figures 1 shows people standing in line outside of a grocery store in Spain on March 16, 2020. Figure 2 shows how similar lines have more recently formed outside of stores in the U.S. (including here in Eugene, Oregon).


A "social distancing" line outside of a grocery store in Spain on March 16, 2020. (Photo from LATimes.com)


A similar line outside of a hardware store in Shelbyville, Kentucky, on April 4, 2020. (Photo from www.wdrb.com)

Learn About (a) the Context-of-Use of the Target System and (b) Existing Technologies for Social Distancing

Your first task in completing this assignment would be familiarize yourself with the human problem, and with existing technology solutions to the problem. The two are somewhat wrapped together. Note that doing this reading is part of conducting a requirements analysis.

Read these two short articles on social distancing during the COVID-19 pandemic:

     Coronavirus tips: The dos and don’ts of social distancing (PDF)
     Target, Lowe's, Home Depot to limit the number of customers inside to combat spread of COVID-19 (PDF)

Read about systems that were built prior to COVID-19 to assist with social distancing.

Here is one, though looking at the comments on the Apple Store and Google Play, it appears that this app has not been widely used, perhaps because it requires users to actively enter their location data.

     TrailCheck is an app allowing users to crowdsource their location information (PDF)
     TrialCheck website

Here is another social-distancing system, also built prior to COVID-19, and one that you are probably with—the Google MAP 'traffic view'. It is a sort of a "social distancing" app because, ultimately, your goal is to avoid people who happen to be stuck in traffic. Note, in this article, how the system works by collecting cell phone data:

     The Google Maps traffic layer uses the geospatial data from the cell phones of people using Google Maps

Consider that you do not have access to the Google Maps data sets. But note that other people have gained access to similar data sets. This piece from December, 2019, discusses how the NYTimes acquired cellphone data down to what is quite possibly the geospatial data of a Secret Service Agent protecting the President of the United States. Companies do sell this data, such as to bounty hunters:

     Twelve Million Phones, One Dataset, Zero Privacy
     The cellphone movements purportedly of a Secret Service Agent with the President

Here is perhaps a more positive use of cellphone geospatial data, a map that reveals social distancing patterns, ultimately with a goal of encouraging safe behavior.

     Where America Didn’t Stay Home Even as the Virus Spread (PDF)

These data sets are, arguably, a more positive use of cell-phone geospatial data because (a) the data are (at least purportedly) published to assist medical workers, scientists, and governments in monitoring whether people are complying with the need for social distancing and (b) the data are published in an aggregate summary form that does not reveal any individual's geospatial data.

Google is currently making such data available, at various levels of aggregation, across the planet, such as for many countries, but also down to the level of individual counties in the U.S. The entire data set is here:

     Google COVID-19 Mobility Reports

The data set for Oregon from March 29, 2020, including the data specific to Lane County (the location of the University of Oregon), is archived here:

     Oregon Mobility Report from March 29, 2020 (PDF) (Lane County is on Page 12.)

Note how there was a drop in movements to workplaces around March 15 (the tick mark on the x-axis just after the tick mark for March 8), around the time the UO began reducing its campus activity. (The UO Provost announced on March 11 that there would be no in-person exams.) Note that there is another drop in workplace activity around March 22 (the next tick mark on the x-axis). (The Oregon Governor's "Stay Home, Save Lives" Executive Order was issued on March 23.) This mobility data was collected using cell phones.

After reading all of the above articles, note the relationship between the (a) human need to social distance and yet engage in essential activities such as buying groceries and (b) the technology solutions that can assist in both reporting and enhancing people's ability to do the social distancing, and to do it without waiting in long lines for essential products. For example, the data needed to reduce the time spent waiting outside of grocery stores can be provided by everyone else going to that grocery store. What we need, effectively, is the traffic feature in Google Maps, but for pedestrians, not cars. Google Maps needs a new layer. It could be called a pedestrian, shopping, waiting-in-line, or social-clustering layer.

This Project 1 Aims at Collecting the Data

With the ultimate goal of using the data to support social distancing in parallel with essential activities, we first need to gain access to some sample real-world data. This is typically done by (a) purchasing the data from a company (such as a cellphone company, or a reseller of that company's data) or (b) running a large-scale experiment in which volunteers provide that data to you, typically for some kind of service in return.

This Project 1 will provide us with a small set of sample data. It is, effectively, a small-scale version of an experiment with volunteers. Students can certainly pursue other means of gaining such data in parallel with developing a project that meets these specifications, but all groups must build a system. Commercial data could probably be purchased, for example, from Carto, Tectonix, Xnspy, or Xmode. I note that Carto website permits people to "Apply for grant", possibly to provide access to some geospatial data at low cost, but I have not looked into this.

System Requirements

This section discusses a minimum of system requirements. Your projects should generate many more than just these.

1. It must be possible to use the system to record geospatial data for seven straight days without interruption. To accomplish this, it might be necessary to have a means of leaving the phone in an always-on mode, and it might require users to leave the phone plugged in and charging for seven straight days, except when there are trips outside of the home such as to go grocery shopping or hiking.

2. The geospatial data that is collected in real-time needs to be stored somewhere. It would certainly make sense for the data to be collected and stored in real-time, and perhaps even backed up in real time, so that any sort of crash or service outage will not cause the loss of data that were already collected.

3. The system should support intermediary backups such that if any component of the system goes down (a process gets stopped, a hard drive fails, etc.), no more than one day of data gets lost.

4. The delivered system should provide a means of viewing the geospatial data that was collected. This could be done with a COTS (commercial off-the-shelf) system, but the delivered system must explain how to gain access to such a system, and how to use it.

5. Seeing as how the bulk of the data should be each individual's home, it would seem reasonable (and perhaps desirable) to permit users to shift this particular subset of location data by a fixed offset (of say a quarter mile or so) to preserve privacy in the collective data set.

6. The system should produce tab-delimited text files with the following fields. The first line in each data file should be the the field names, exactly as follows: User I.D.\tDate\tTime\tLatitude\tLongitude\tTime at Location\n
The fields should be populated as follows:

7. The TSI—the temporal sampling interval, or how frequently each geospatial location is recorded— for all data collected should be 5 minutes. The Time-at-Location field, which is in minutes, should thus be a multiple of 5. (Five minutes is selected as the TSI based on Zhao et al. (2019) DOI: 10.1080/13658816.2019.1584805.)

Technical Requirements

This section presents a minimum set of technical requirements. Your projects should generate many more than just these.

1. The delivered system should be complete. For example, if the system will use a server to collect data, the system must include full instructions on how to set up such a server. The steps involved, and the instructions must be at least as simple as this: MySQL - Using mysqlctl

2. Systems should use standard libraries to the extent possible. For example, if written in Python, the systems should primarily use the Python Standard Library. Written permission must be obtained to use any libraries or packages beyond the standard libraries for a language. Permission to use the Google Maps API is granted to all.

3. Installing and running the system should require little or no software to be installed. To this end, no virtual environments, and no gaming engines such as Unity, may be used.

4. Instructions must be provided to compile, run, and install all of the code necessary to use the system.

Some Technical Suggestions

It is possible that this project could be accomplished using Google Maps and the Google Maps API, such as using the Geolocation Javascript code that is available in the Google Maps Platform Javascript API Geolocation documentation.

Here is an example of the Google Maps API showing your current location. The code was copied from the Google Maps Platform Javascript API Geolocation documentation, and slightly modified, such as to include my Google Maps API Key. Providing public visibility to this personal API Key may violate API Key Best Practices. However, consistent with best practices, the API credentials have, however, been secured. It might also be possible to use the Google Geolocation API with languages other than Javascript.

If you use Google Maps for this project: (a) You will need to obtain a Google API Key. It appears that you will need to submit a credit card number but also that your credit card will not actually be charged. (b) You should secure your API credentials.

Note this comment in Google's source code: "This example requires that you consent to location sharing when prompted by your browser. If you see the error 'The Geolocation service failed.', it means you probably did not give permission for the browser to locate you."

Students not in Eugene

For (the six or so) group members that are not in Eugene, these students should run the system on their phones for the week, but these groups should recruit someone locally to run the group's system, and collect that user's geospatial data, for the week, and submit that user's data in place of the remote group member.

Relevant Documents from the Course Website

Evaluation Criteria
How to Present
How to Submit
Initial Group Meetings
Instructor Meetings
Document Templates

NRL Dual Task SRS
System Documentation
UML Notation (Kieras) (PDF)
UML Notation (Fowler) (PDF)

Reflections on this Document

This document provides an initial requirements analysis that is based on an understanding of peoples' needs and technology available. The understanding was gained by studying articles from reputable online newspapers, peer-reviewed journal and conference papers, and technical documentation.