First of all, Pearl Street Mall is just as lovely as I remember, but OMG it is so crowded, with so many new stores and chains. Still, good food, good views, hot weather, lovely walk.
Welcome to Day 2! http://neondataskills.org/data-institute-17/day2/
Our morning session focused on reproducibility and workflows with the great Naupaka Zimmerman. Remember the characteristics of reproducibility - organization, automation, documentation, and dissemination. We focused on organization, and spent an enjoyable hour sorting through an example messy directory of misc data files and code. The directory looked a bit like many of my directories. Lesson learned. We then moved to working with new data and git to reinforce yesterday's lessons. Git was super confusing to me 2 weeks ago, but now I think I love it. We also went back and forth between Jupyter and python stand alone scripts, and abstracted variables, and lo and behold I got my script to run.
The afternoon focused on Lidar (yay!) and prior to coding we talked about discrete and waveform data and collection, and the opentopography (http://www.opentopography.org/) project with Benjamin Gross. The opentopography talk was really interesting. They are not just a data distributor any more, they also provide a HPC framework (mostly TauDEM for now) on their servers at SDSC (http://www.sdsc.edu/). They are going to roll out a user-initiated HPC functionality soon, so stay tuned for their new "pluggable assets" program. This is well worth checking into. We also spent some time live coding with Python with Bridget Hass working with a CHM from the SERC site in California, and had a nerve-wracking code challenge to wrap up the day.
Fun additional take-home messages/resources:
- ISO International standard for dates = YYYY-MM-DD
- Missing values in R = NA, in Python = -9999
- For cleaning messy data - check out OpenRefine - a FOS tool for cleaning messy data http://openrefine.org/
- Excel is cray-cray, best practices for spreadsheets: http://www.datacarpentry.org/spreadsheet-ecology-lesson/
- Morpho (from DataOne) to enter metadata: https://www.dataone.org/software-tools/morpho
- Pay attention to file size with your git repositories - check out: https://git-lfs.github.com/. Git is good for things you do with your hands (like code), not for large data.
- Funny how many food metaphors are used in tech teaching: APIs as a menu in a restaurant; git add vs git commit as a grocery cart before and after purchase; finding GIS data is sometimes like shopping for ingredients in a specialty grocery store (that one is mine)...
- Markdown renderer: http://dillinger.io/
- MIT License, like Creative Commons for code: https://opensource.org/licenses/MIT
- "Jupyter" means it runs with Julia, Python & R, who knew?
- There is a new project called "Feather" that allows compatibility between python and R: https://blog.rstudio.org/2016/03/29/feather/
- All the NEON airborne data can be found here: http://www.neonscience.org/data/airborne-data
- Information on the TIFF specification and TIFF tags here: http://awaresystems.be/, however their TIFF Tag Viewer is only for windows.
Thanks for everyone today! Megan Jones (our fearless leader), Naupaka Zimmerman (Reproducibility), Tristan Goulden (Discrete Lidar), Keith Krause (Waveform Lidar), Benjamin Gross (OpenTopography), Bridget Hass (coding lidar products).

Our home for the week
I left Boulder 20 years ago on a wing and a prayer with a PhD in hand, overwhelmed with bittersweet emotions. I was sad to leave such a beautiful city, nervous about what was to come, but excited to start something new in North Carolina. My future was uncertain, and as I took off from DIA that final time I basically had Tom Petty's Free Fallin' and Learning to Fly on repeat on my walkman. Now I am back, and summer in Boulder is just as breathtaking as I remember it: clear blue skies, the stunning flatirons making a play at outshining the snow-dusted Rockies behind them, and crisp fragrant mountain breezes acting as my Madeleine. I'm back to visit the National Ecological Observatory Network (NEON) headquarters and attend their 2017 Data Institute, and re-invest in my skillset for open reproducible workflows in remote sensing.
Day 1 Wrap Up from the NEON Data Institute 2017
What a day! http://neondataskills.org/data-institute-17/day1/
Attendees (about 30) included graduate students, old dogs (new tricks!) like me, and research scientists interested in developing reproducible workflows into their work. We are a mix of ages and genders. The morning session focused on learning about the NEON program (http://www.neonscience.org/): its purpose, sites, sensors, data, and protocols. NEON, funded by NSF and managed by Battelle, was conceived in 2004 and will go online for a 30-year mission providing free and open data on the drivers of and responses to ecological change starting in Jan 2018. NEON data comes from IS (instrumented systems), OS (observation systems), and RS (remote sensing). We focused on the Airborne Observation Platform (AOP) which uses 2, soon to be 3 aircraft, each with a payload of a hyperspectral sensor (from JPL, 426, 5nm bands (380-2510 nm), 1 mRad IFOV, 1 m res at 1000m AGL) and lidar (Optech and soon to be Riegl, discrete and waveform) sensors and a RGB camera (PhaseOne D8900). These sensors produce co-registered raw data, are processed at NEON headquarters into various levels of data products. Flights are planned to cover each NEON site once, timed to capture 90% or higher peak greenness, which is pretty complicated when distance and weather are taken into account. Pilots and techs are on the road and in the air from March through October collecting these data. Data is processed at headquarters.
In the afternoon session, we got through a fairly immersive dunk into Jupyter notebooks for exploring hyperspectral imagery in HDF5 format. We did exploration, band stacking, widgets, and vegetation indices. We closed with a fast discussion about TGF (The Git Flow): the way to store, share, control versions of your data and code to ensure reproducibility. We forked, cloned, committed, pushed, and pulled. Not much more to write about, but the whole day was awesome!
Fun additional take-home messages:
- NEON is amazing. I should build some class labs around NEON data, and NEON classroom training materials are available: http://www.neonscience.org/resources/data-tutorials
- Making participants do organized homework is necessary for complicated workshop content: http://neondataskills.org/workshop-event/NEON-Data-Insitute-2017
- HDF5 as an possible alternative data format for Lidar - holding both discrete and waveform
- NEON imagery data is FEDExed daily to headquarters after collected
- I am a crap python coder
- #whofallsbehindstaysbehind
- Tabs are my friend
Thanks to everyone today, including: Megan Jones (Main leader), Nathan Leisso (AOP), Bill Gallery (RGB camera), Ted Haberman (HDF5 format), David Hulslander (AOP), Claire Lunch (Data), Cove Sturtevant (Towers), Tristan Goulden (Hyperspectral), Bridget Hass (HDF5), Paul Gader, Naupaka Zimmerman (GitHub flow).

Awesome new (ish?) R package from the gang over at rOpenSci
Tired of searching biodiversity occurance data through individual platforms? The "spocc" package comes to your rescue and allows for a streamlined workflow in the collection and mapping of species occurrence data from range of sites including: GBIF, iNaturalist, Ecoengine, AntWeb, eBird, and USGS's BISON.
There is a caveat however, since the sites use alot of the same repositories the authors of the package caution to check for dulicates. Regardless what a great way to simplify your workflow!
Find the package from CRAN: install.packages("spocc") and read more about it here!
/span>Former student and GIS expert Chippie Kislik alerted me to this video. She is working with others at NASA Ames on a Sierra Nevada DSS Ecological Forecasting Project. A video about the project is here.
The Sierra Nevada contains vital ecosystems that are experiencing changes in hydrologic regimes, such as decreases in snowmelt and peak runoff, which affect forest health and water resources. Currently, the U.S. Forest Service Region 5 office is undergoing Forest Plan revisions to integrate climate-change impacts into mitigation and adaptation strategies. However, there are few tools in place to conduct quantitative assessments of forest and surface conditions in relation to mountain hydrology, while easily and effectively delivering that information to forest managers. To assist the Forest Service, this research team created a Decision Support System (DSS) featuring data integration, data viewing, reporting, and forecasting of ecological conditions within all Sierra Nevada intersecting watersheds.
Clark Labs was awarded a million dollar grant from Esri to create a cloud-based version of their Land Change Modeler for ArcGIS. Land Change Modeler is suite of tools to assess and predict land change and evaluate the impacts of change and includes REDD (Reducing Emissions from Deforestation and Forest Degradation) tools for modeling the impact of land cover change on carbon emissions. Currently Land Change Modeler is only available in IDRISI and as a software extension for ArcGIS (the latest version is compatible with v10.2). This will make this tool more easily assessable to the wider public and scientific community.
From Clark Labs press release:
"Clark Labs was recently awarded a million dollar grant from Esri to create a cloud-based version of their Land Change Modeler for ArcGIS. Currently, Clark Labs’ extension is for the ArcGIS desktop.
Land Change Modeler for ArcGIS, first released in 2007 with Version 2 released this past month, is a software extension for ArcGIS users, offering a suite of tools to assess and predict land change and evaluate the impacts of such change. Clark Labs recent release includes many significant enhancements. The new version is compatible with ArcGIS Version 10.2
The Land Change Modeler offers an extensive suite of tools for land change research in a simple and automated workflow. It provides a variety of tools for land change analysis and prediction, as well as the impacts of those changes.
The new version release of this fall provides significant enhancements, particularly for its utility for REDD (Reducing Emissions from Deforestation and Forest Degradation). Land Change Modeler now includes functionality for modeling the impact of land cover change on carbon emissions. “Our world is changing rapidly, and technology to efficiently model and predict future land change is vital to addressing global challenges,’ said Jack Dangermond, Esri President. “We’re pleased to award this grant to Clark Labs to jumpstart their effort to utilize and provide rich content through ArcGIS Online.”
The new version also provides more capability for estimating land change impacts on habitat and biodiversity. With the grant from Esri, Clark Labs will be creating a cloud-based implementation of Land Change Modeler for their platform.
Clark Labs and Esri have been business partners for nearly ten years, working collaboratively on GIS research."
For the full news release see here.