During this coronavirus lock down, IGIS has set out to revamp its data infrastructure to address our growing needs for big data storage and management moving forward. In particular, over the past few years we have accumulated over a dozen terabytes of drone data and associated mapping products, constituting tens of thousands of project files, and the quantity of this data is only expected to keep growing. Typically this drone data has been processed on a number of local desktop computers, and then backed up onto RAID hard drives or the cloud for cold storage; however, this is far from ideal in terms of consistent organization, versioning and ease of distribution.
As a solution to the problem, IGIS purchased a web server, equipped with multiple virtual machines (for processing, analysis and web services) along with a 30TB RAID data store/repository. The repository was networked to our various IGIS computers and RAID storage devices, so that all of our drone data could be transferred over to it. After much consideration, we settled on a standardized file structure, which could accommodate both datasets from past and future drone projects, with room for growth as needed. A python script was written to automatically generate this file structure, with some metadata inputs for each project, and our previous projects' data were then moved into their appropriate slots in the new structure, while jettisoning unwanted intermediary processing files; freeing up a ton of storage space. It would be correct to assume that this process of moving data was quite time consuming. However, moving forward, it will be easy to automatically set up our projects' file structures right from the inception of every new project, beginning with running the python script in ArcGIS Pro's Jupyter Notebook utility in the field, to eventually be delivered to the server repository down the pipeline, in a nicely organized package (similar to what we would provide to our non-IGIS project collaborators).
That alone is a big step in the right direction, but it gets better. Because all of this data is now in a standardized file structure, with standardized folder naming conventions, scripting our ArcGIS portal to automatically connect with the data via the imager server was only a small step away. With this complete, now any IGIS team member can access our entire post-processed, GIS-ready, drone data inventory of layers via ArcGIS Online or ArcGIS Pro.
Ultimately this has been a big leap forward, in terms of IGIS's informatics infrastructure; to compliment our significantly evolved pipeline for drone data collection and processing, depicted below.
The Problem: An Outdated and Unwieldy Site
The Informatics and GIS (IGIS) Statewide Program was established in 2012 to meet UCANR's growing need for geospatial research and technical training. We created our website using ANR's Site Builder content management system, which was state of the art at the time. Like many programs, our website grew organically over the years, until we woke up one day and discovered we had over 70 subpages!!
The size of our site made it not only unwieldy to navigate, it was also difficult to maintain and keep relevant. You know the drill: multiple subpages that you forgot about; no one updating; a general sprawl of information that only a few knew how to navigate. With feedback from UCANR Strategic Communications, we came to realize that our site was highly focused on our own needs and that of an internal (UCANR) audience. Clearly, we needed to redesign the website with a more external focus, and make it easier to navigate and get information. We did a collective “Sprint” during the COVID-19 shutdown, and revamped the site.
Goals for the New Site
Rather than merely do a superficial makeover, we decided to reorganize our site from the ground-up. We started out thinking about our communication goals, and the needs and interests of our clientele. Some people come looking for a specific resource, such as a software license or Tech Note. Others come looking for info about our upcoming training programs, or to learn more about our GIS and Drone Services. Still others just want to see what we do, and how we fit into ANR's overarching program umbrella.
Our new site is organized around four themes: Research, Services, Training, and Resources. To convey the breadth of what we do, we decided to develop “cards”, or nuggets of information about our work. A common thread in everything we do is connecting-the-dots, so we decided to make heavy use of tags and hyperlinks that connect our work to ANR's Strategic Goals, Public Value Statements, and our collaborators around the state.
Solution: Customization via Site Builder
Results: Dynamic and Focused Content
Custom Section Dividers
To help visitors navigate the home page and find the content they're looking for quickly, we developed attractive page dividers that split the landing page into sections. Under the hood these are simple DIV tags with custom background and color attributes, and are super-easy to make.
Another visual enhancement that makes a web page look less generic is achieved by using custom icons as buttons. We modified some standard clipart images to use as links on the ‘Client Services' toolbar which take users to different parts of our site. For a little branding flourish, we used colors that match the ANR palette, and created an ‘inverse' version of each images which appears when you hover over it. This is all very standard HTML and easy to implement.
Integration with Google Sheets
Over the past couple years, we've migrated the bulk of our program planning and tracking into Google Sheets. Nearly everything we do - workshop offerings, drone flights, service projects, publications, surveys - are recorded in a collection of easy-to-update Google Sheets. For this project, we created a new sheet to hold content specifically for certain parts of our website - including the project cards, metrics about our training and GIS services, and even quotes from our clientele.
Keeping Things Fresh through Randomization
To avoid stale content, we built in some randomization. Every time you come back to the home page, or refresh it, the video loop, project cards, and other content changes. As we add more content on the backend, the selection will be even more varied, making every visit seem new and fresh.
We've been looking for a way to show off some of our drone video for a long time, but the standard embedded YouTube player just wasn't cutting it. After creating a custom space for a video loop on the homepage, we used a command line utility called
ffmpeg to clip, resize, fade, and encode some of our favorite drone video clips at a low bandwidth (see if you can guess which RECs they come from!). Adding the videos to the page was super easy using the standard HTML5 video tag, and getting them to auto-play and loop simply involved a couple of extra arguments. The video files live on our server, and the file names are randomized, so every time you refresh the page a new video starts playing.
Encoding video for webpages is standard practice these days, so to make life easier for others (and ourselves!) we typed up this workflow in a new TechNote entitled Encoding Drone Video for the Web. In it. you'll find a link to a Google Sheet that reduces the pain of using ffmpeg by generating the command line for you. Simply substitute your own video file name, start & stop times, and crop parameters, and the command is generated for you ready to copy-paste into a command window.
Modularizing Content for Flexible Placement
Good site design starts with thinking about your visitors, your content, and your communication goals. Transitioning to the new Site Builder template was a good excuse for us to jettison eight years and 70 pages of accumulated web content, and think about what really matters. We really like the new Site Builder template, and managed to get the look and feel we were hoping for with a modest amount of HTML customization and Google Sheets integration. Web pages are never ‘done', but future development will be a lot easier now that we have the building blocks in place. And we'd love feedback! Check out our new site if you're interested, and contact us if you have thoughts or would like to learn more.
Interested in knowing how people are using their geospatial skills in the era of COVID-19?
Last week, Harvard hosted a 10-panelist webinar (Center for Geographic Analysis Virtual Forum: Responding to the COVID-19 Pandemic with Geospatial Research and Applications) in which experts explored real-time datasets, displayed transmission models, and discussed data ethics of the current pandemic.
Innovative uses of datasets (such as mobile phone data and social media Tweets) exemplified how we can instantaneously map the spread of this disease at very high temporal and spatial scales. Yet, as many geospatial fanatics and drone pilots know, with high resolution comes high risk (of privacy concerns, in this case).
Data privacy emerged as a recurring theme throughout this webinar. Several panelists discussed confidentiality issues of pandemic mapping, which involve the spatial resolution at which data are analyzed – at the individual level or aggregated to a larger and more anonymized level: a scale at which houses, faces, and identities cannot be recognized in detail.
Dr. Caroline Buckee (Harvard), who has been building a research network in collaboration with data companies to aggregate, anonymize, and analyze COVID-19 cellular phone data, explained why the data she analyzes are disseminated at the county level, instead of at the house or neighborhood level. One reason for this is that we would not want punitive action to take place against specific neighborhoods or households if they are not following mandates such as the shelter-in-place policy, because we don't know if these individuals are attending work or are performing essential tasks. It is important that the data we share does not result in discrimination.
Dr. Doug Richardson (Harvard) echoed these sentiments and provided information on a platform he is developing to promote data security and confidentiality: Geospatial Virtual Data Enclave (GVDE). Ethical and security standards embedded into this portal can help ameliorate issues of data confidentiality.
Another theme that was brought to light in this webinar was how narratives can become lost in COVID-19 geospatial data. Dr. Mei-Po Kwan (Chinese University of Hong Kong) mentioned how every dot on the map has a story – and these stories are steeped in inequity and inequality. Dr. Buckee reminded us that each transmission has important geographic context, and considering different risk factors (such as age and socioeconomic status) and covariates (such as population density) are key to interpreting these data. Finally, Dr. Este Geraghty (Esri) introduced several resources that Esri provides to incorporate and honor the stories of those who have battled against the virus (https://coronavirus-resources.esri.com/pages/resources).
Below are notes on some of the methodologies and resources discussed in the webinar.
Twitter COVID-19 Hot Spots (April 7-14, 2020) created using the Twitter geo-search API (spatial query, no semantic query), and the Getis Ord Gi* function (hot spot analysis) on the attribute: #relevantTweets/#allTweets (per cell). The methodology (semantic machine learning) is described in this publication: https://www.tandfonline.com/doi/full/10.1080/15230406.2017.1356242.
A location / allocation model in Esri's ArcGIS Pro for finding optimal testing sites, treatment sites, and food distribution sites during the pandemic in San Bernardino County. This model determines the population demand by creating a risk surface (including transmission, personal susceptibility, exposure, socioeconomic factors), and then calculates optimal locations (layers: road network data, risk surfaces, supply chain constraints for staffing facilities / administering tests).
- Esri COVID-19 resources: https://coronavirus-resources.esri.com/pages/resources
- ArcGIS implementation of the University of Pennsylvania's COVID-19 Hospital Impact Model for Epidemics (CHIME): https://www.arcgis.com/home/item.html?id=37ad6eb0d1034cd58844314a9b305de2
- COVID-19 Spatiotemporal Rapid Response Gateway: https://covid-19.stcenter.net/
- COVID-19-related big-data analytics demo by Todd Mostak of OmniSci Technologies portraying interactive exploration of 16 billion rows of location data from mobile phones, from their cell phone partner X-Mode: https://youtu.be/Oeg3jF5xs6o?t=147
In summary, this was a very informative seminar that acknowledged several critical topics in geospatial data analysis, highlighting strengths of data sources and methodologies, along with concerns and shortcomings in the current state of pandemic mapping.
Hello everyone, we hope you are all healthy, safe, sane, and being as productive as possible.
Here we provide a summary of some of the mapping technology that has been used in the past few weeks to understand the COVID-19 pandemic. This is not exhaustive! We pick three areas: map-based data dashboards, disease projections, and social distancing scorecards. We look at where the data comes from and how the sites are built. More will come on the use of remote sensing and earth observation data in support of COVID-19 monitoring, response or recovery, and some of the cool genome evolution and pandemic spread mapping work going on.
Before we begin, here is a nice primer from the cartonerd himself Kenneth Field from ESRI on how to map the coronavirus responsibly.
COVID-19 map-based data dashboards. You have seen these: data dashboards displaying interactive maps, charts, and graphs that are updated daily. They tell an important story well. They usually have multiple panels, each visualizing a dataset in either map or graph or tabular form. There are many many data dashboards out there. Two favorites are the Johns Hopkins site, and the NYTimes coronavirus outbreak hub.
Where do these sites get their data?
Most of these sites are using data from similar sources. They use data on number of cases, deaths, and recoveries per day. Most sites credit WHO, US CDC (Centers for Disease Control and Prevention), ECDC (European Centre for Disease Prevention and Control), Chinese Center for Disease Control and Prevention (CCDC), and other sources. Finding the data is not always straightforward. An interesting article came out in the NYTimes about their mapping efforts in California, and why the state is such a challenging case. They describe how “each county reports data a little differently. Some sites offer detailed data dashboards, such as Santa Clara and Sonoma counties. Other county health departments, like Kern County, put those data in images or PDF pages, which can be harder to extract data from, and some counties publish data in tabular form”. Alameda County, where I live, reports positive cases and deaths each day, but they exclude the city of Berkeley (where I live) because it has its own city health department, so the NYTimes team has to scrape the county and city reports and then combine the data. Remember that there are many people out there who think these reports actually underestimate the number of cases out there, partly due to insufficient ability to test for the disease.
Some of the sites turn around and release their curated data to us to use. JH does this (GitHub), as does NYTimes (article, GitHub). This is pretty important. Both of these data sources (JH & NYTimes) have led to dozens more innovative uses. See the Social Distancing Scorecard discussed below, and these follow-ons from the NYTimes data: https://chartingcovid.com/, and https://covid19usmap.com/.
- However… all these dashboards are starting with simple data: number of patients, number of deaths, and sometimes number recovered. Some dashboards use these initial numbers to calculate additional figures such as new cases, growth factor, and doubling time, for example. All of these data are summarized by some spatial aggregation to make them non-identifiable, and more easily visualized. In the US, the spatial aggregation is usually by county.
How do these sites create data dashboards?
The summarized data by county or country can be visualized in mapped form on a website via web services. These bits of code allow users to use and display data from different sources in mapped form without having to download, host, or process them. In short, any data with a geographic location can be linked to an existing web basemap and published to a website; charts and tables are also done this way. The technology has undergone a revolution in the last five years, making this very doable. Many of the dashboards out there use ESRI technology to do this. They use ArcGIS Online, which is a powerful web stack that quite easily creates mapping and charting dashboards. The Johns Hopkins site uses ArcGIS Online, the WHO does too. There are over 250 sites in the US alone that use ArcGIS Online for mapping data related to COVID-19. IGIS uses this platform to create most of our map-based websites and dashboards (for more info on that, see the bottom of the post). Other sites use open source or other software to do the same thing. The NYTimes uses an open source mapping platform called MapBox to create their custom maps. Tools like MapBox allow you to pull data from different sources, add those data by location to an online map, and customize the design to make it beautiful and informative. The NYTimes cartography is really lovely and clean, for example.
An open access peer reviewed paper just came out that describes some of these sites, and the methods behind them. Kamel Boulos and Geraghty, 2020.
COVID-19 disease projections. There are also sites that provide projections of peak cases and capacity for things like hospital beds. These are really important as they can help hospitals and health systems prepare for the surge of COVID-19 patients over the coming weeks. Here is my favorite one (I found this via Bob Watcher, @Bob_Wachter, Chair of the UCSF Dept of Medicine):
Institute for Health Metrics and Evaluation (IHME) provides a very good visualization of their statistical model forecasting COVID-19 patients and hospital utilization against capacity by state for the US over the next 4 months. The model looks at the timing of new COVID-19 patients in comparison to local hospital capacity (regular beds, ICU beds, ventilators). The model helps us to see if we are “flattening the curve” and how far off we are from the peak in cases. I've found this very informative and somewhat reassuring, at least for California. According to the site, we are doing a good job in California of flattening the curve, and our peak (projected to be on April 14), should still be small enough so that we have enough beds and ventilators. Still, some are saying this model is overly optimistic. And of course keep washing those hands and staying home.
Where do these sites get their data?
The IHME team state that their data come from local and national governments, hospital networks like the University of Washington, the American Hospital Association, the World Health Organization, and a range of other sources.
How do the models work?
The IHME team used a statistical model that works directly with the existing death rate data. The model uses the empirically observed COVID-19 population and calculates forecasts for population death rates (with uncertainty) for deaths and for health service resource needs and compare these to available resources in the US. Their pre-print explaining the method is here.
On a related note, ESRI posted a nice webinar with Lauren Bennet (spatial stats guru and all-around-amazing person) showing how the COVID-19 Hospital Impact Model for Epidemics (CHIME) model has been integrated into ArcGIS Pro. The CHIME model is from Penn Medicine's Predictive Healthcare Team and it takes a different approach than the IHME model above. CHIME is a SIR (susceptible-infected-recovery) model. A SIR model is an epidemiological model that estimates the probability of an individual moving from a susceptible state to an infected state, and from an infected state to a recovered state or death within a closed population. Specifically, the CHIME model provides estimates of how many people will need to be hospitalized, and of that number how many will need ICU beds and ventilators. It also factors social distancing policies and how they might impact disease spread. The incorporation of this within ArcGIS Pro looks very useful, as you can examine results in mapped form, and change how variables (such as social distancing) might change outcomes. Lauren's blog post about this and her webinar are useful resources.
Social distancing scorecards. This site from Unicast got a lot of press recently when it published a scoreboard for how well we are social distancing under pandemic rules. It garnered a lot of press because it tells an important story well, but also, because it uses our mobile phone data (more on that later). In their initial model, social distancing = decrease in distance traveled, as in, if you are still moving around as you were before the pandemic, then you are not socially distancing. There are some problems with this assumption of course. As I look out on my street now, I see people walking, most with masks, and no one within 10 feet of another. Social distancing in action. These issues were considered, and they updated their scorecard method. Now, in addition to reduction in distance traveled, they also include a second metric to social distancing scoring: reduction in visits to non-essential venues. Since I last blogged about this site nearly two weeks ago, California's score went from an A- to a C. Alameda County, where I live, went from an A to a B-. They do point out that drops in scores might be a result of their new method, so pay attention to the score and the graph. And stay tuned! Their next metric is going to be the change rate for the number of person-to-person encounters for a given area. Wow.
Where does this site get data?
The data on reported cases of COVID-19 is sourced from the Corona Data Scraper (for county-level data prior to March 22) and the Johns Hopkins Github Repository (for county-level data beginning March 22 and all state-level data).
How does Unicast create the dashboard?
They do something similar to the dashboard sites discussed above. They pull all the location data together from a range of sites, develop their specific metrics on movement (the scoring), aggregate by county, and then visualize on the web using custom web design. They use their own custom basemaps and design, keeping their cartography clean. I haven't dug into the methods in depth yet, but stay tuned.
If you want to talk more about these mapping tools, please reach out to IGIS. We have expanded office hours and training to teach you how to make data-driven dashboards, and we can also build them for you. Also, please let us know about other mapping resources out there.
Stay safe and healthy. Wash those hands, stay home as much as possible, and be compassionate with your community.
Got the Coronavirus blues? Stuck at home staring at that stubborn dataset that's been on the back-burner for a year or more?
The IGIS Team is pleased to announce an exciting expansion to our Online Office Hours. In addition to quadrupling our time slots, we are also introducing dedicated Office Hour appointments for consultations in the statistical programming language R. We are also lengthening the time slots from 20 minutes to 30 minutes.
IGIS's Online Office Hours are one of ANR's best kept secrets. Since 2018, the IGIS team has availed itself for free online consultations to all ANR employees in a range of technical and data analysis topics including:
- using GIS for research, needs assessment, communication, planning, evaluation, etc.
- where to find spatial data
- how to go from a question to a GIS analysis
- software recommendations
- ArcGIS questions and workflows
- WebGIS and Story Maps
- tools for mobile data collection
- how to store & share spatial data
- working with climate data from Cal-Adapt
- spatial stats
- data analysis with R
- R Shiny
- drone equipment, regulations and best practices
- analyzing drone data with Pix4D
- MS Access
- Google Apps
- proposal consultations
Who does what, you might ask? Sean Hogan is guru in all things related to drones and remote sensing. Shane Feirer and Robert Johnson are professional GIS developers and programmers, have deep expertise in the ESRI suite of GIS tools, and work with Python and Django on a daily basis. Andy Lyons is highly experienced in R, and has a background in GIS and remote sensing. Director Maggi Kelly has supervised hundreds of student projects and has a commanding view of the entire geospatial sciences. Lyons and Feirer also have strong backgrounds in MS Access, data modeling, advanced Google Apps, and are working together on a project to make climate data from Cal-Adapt easier to access and work with. While we certainly can't answer every question, we enjoy discussing all things related to GIS and informatics, and can probably point you in the right direction.
General Office Hours (all topics) are offered Mondays and Tuesdays from 3-4. R consultations Office Hours are available Monday and Tuesday from 4-5. To sign up for a 30-minute slot, please go the sign-up page. The Zoom link will be sent to you in the confirmation email.
Hope to see you in Office Hours!