Last week we held another bootcamp on Spatial Data Science. We had three packed days learning about the concepts, tools and workflow associated with spatial databases, analysis and visualizations. Our goal was not to teach a specific suite of tools but rather to teach participants how to develop and refine repeatable and testable workflows for spatial data using common standard programming practices.
On Day 1 we focused on setting up a collaborative virtual data environment through virtual machines, spatial databases (PostgreSQL/PostGIS) with multi-user editing and versioning (GeoGig). We also talked about open data and open standards, and moderndata formats and tools (GeoJSON, GDAL). On Day 2 we focused on open analytical tools for spatial data. We focused on Python (i.e. PySAL, NumPy, PyCharm, iPython Notebook), and R tools. Day 3 was dedicated to the web stack, and visualization via ESRI Online, CartoDB, and Leaflet. Web mapping is great, and as OpenGeo.org says: “Internet maps appear magical: portals into infinitely large, infinitely deep pools of data. But they aren't magical, they are built of a few standard pieces of technology, and the pieces can be re-arranged and sourced from different places.…Anyone can build an internet map."
All-in-all it was a great time spent with a collection of very interesting mapping professionals from around the country. Thanks to everyone!
/span>In response to Gov. Jerry Brown's announcement yesterday, calling all California residents to reduce water use by 25%, the folks at the New York Times put togther a nice interactive map. The map shows residential water use in California in gallons per day.
Take a look here!
/span>Here is Erez Cohen's excellent talk from the BIDS feed: http://bids.berkeley.edu/resources/videos/big-data-mapping-modern-tools-geographic-analysis-and-visualization
Title: Big Data Mapping: Modern Tools for Geographic Analysis and Visualization
Speaker: Erez Cohen, Co-Founder and CEO of Mapsense
We'll discuss how smart spatial indexes can be used for performant search and filtering for generating interactive and dynamic maps in the browser over massive datasets. We'll go over vector maps, quadtree indices, geographic simplification, density sampling, and real-time ingestion. We'll use example datasets featuring real-time maps of tweets, California condors, and crimes in San Francisco.
The BIDS Data Science Lecture Series is co-hosted by BIDS and the Data, Science, and Inference Seminar.
About the Speaker
Erez is co-founder and CEO at Mapsense, which is builds software for the analysis and visualization of massive spatial datasets. Previously Erez was an engineer at Palantir Technologies, where he worked with credit derivatives and mortgage portfolio datasets. Erez holds a BS/MS from UC Berkeley's Industrial Engineer and Operations Research Department. He was a PhD candidate in the same department at Columbia University.
/h3>FOSS4G NA 2015 is going on this week in the Bay Area, and so far, it has been a great conference.
Monday had a great line-up of tutorials (including mine on PySAL and Rasterio), and yesterday was full of inspiring talks. Highlights of my day: PostGIS Feature Frenzy, a new geoprocessing Python package called PyGeoprocessing, just released last Thurs(!) from our colleagues down at Stanford who work on the Natural Capital Project, and a very interesting talk about AppGeo's history and future of integrating open source geospatial solutions into their business applications.
The talk by Michael Terner from AppGeo echoed my own ideas about tool development (one that is also shared by many others including ESRI) that open source, closed source and commercial ventures are not mutually exclusive and can often be leveraged in one project to maximize the benefits that each brings.
In fact, at the end of my talk yesterday on Spatial Data Analysis in Python, someone had a great comment related to this: "Everytime I start a project, I always wonder if this is going to be the one where I stay in Python all the way through..." He encouraged me to be honest about that reality and also about how Python is not always the easiest or best option.
Similarly, in his talk about the history and future of PostGIS features, Paul Ramsey from CartoDB also reflected on how PostGIS is really great for geoprocessing because it leverages the benefits of database functionality (SQL, spatial querying, indexing) but that it is not so strong at spatial data analysis that requires mathematical operations like interpolation, spatial auto-correleation, etc. He ended by saying that he is interested in expanding those capabilities but the reality is that there are so many other tools that already do that. PostGIS may never be as good at mathematical functions as those other options, and why should we expect one tool to be great at everything? I completely agree.