Day 4 http://neondataskills.org/data-institute-17/day4/
This is it! Final day of LUV-DATA. Today we focused on hyperspectral data and vegetation. Paul Gader from the University of Florida kicked off the day with a survey of some of his projects in hyperspectral data, explorations in NEON data, and big data algorithmic challenges. Katie Jones talked about the terrestrial observational plot protocol at the NEON sites. Sites are either tower (in tower air-shed) or distributed (throughout site). She focused on the vegetation sampling protocols (individual, diversity, phenology, biomass, productivity, biogeochemistry). Data to be released in the fall. Samantha Weintraub talked to us about foliar chemistry data (e.g. C, N, lignin, chlorophyll, trace elements) and linking with remote sensing. Since we are still learning about fundamental controls on canopy traits within and between ecosystems, and we have a poor understanding of their response to global change, this kind of NEON work is very important. All these foliar chemistry data will be released in the fall. She also mentioned the extensive soil biogeochemical and microbial measurements in soil plots (30cm depth) again in tower and distributed plots (during peak greenness and 2 seasonal transitions).
The coding work focused on classifying spectra (Classification of Hyperspectral Data with Ordinary Least Squares in Python), (Classification of Hyperspectral Data with Principal Components Analysis in Python) and (Using SciKit for Support Vector Machine (SVM) Classification with Python), using our new best friend Jupyter Notebooks. We spent most of the time talking about statistical learning, machine learning and the hazards of using these without understanding of the target system.
Fun additional take-home messages/resources:
- NEON data seems like a tremendous resource for research and teaching. Increasing amounts of data are going to be added to their data portal. Stay tuned: http://data.neonscience.org/home
- NRC has collaborated with NEON to do some spatially extensive soil characterization across the sites. These data will also be available as a NEON product.
- Fore more on when data rolls out, sign up for the NEON eNews here: http://www.neonscience.org/
Thanks to everyone today! Megan Jones (ran a flawless workshop), Paul Gader (remote sensing use cases/classification), Katie Jones (NEON terrestrial vegetation sampling), Samantha Weintraub (foliar chemistry data).
And thanks to NEON for putting on this excellent workshop. I learned a ton, met great people, got re-energized about reproducible workflows (have some ideas about incorporating these concepts into everyday work), and got to spend some nostalgic time walking around my former haunts in Boulder.
Today we focused on uncertainty. Yay! http://neondataskills.org/data-institute-17/day3/
Tristan Goulden gave a talk on sources of uncertainty in the discrete return lidar data. Uncertainty comes from two main sources: geolocation - horizontal and vertical (e.g. distance from base station, distribution and number of satellites, and accuracy of IMU), and processing (e.g. classification of point cloud, interpolation method ). The NEON remote sensing team has developed tests for each of these error sources. NEON provides with all their lidar data a simulated point cloud error product, with horizontal and vertical error per point in LAS format (cool!). These products show the error is largest at the edges of scans, obvi.
- The take homes are: fly within 20km of a basestation; test your lidar sensor annually; check your boresight; dense canopy make ground point density more sparce, so DTM is problematic; and initial point cloud misclassification can lead to large errors in downstream products. So much more in my notes.
We then coded an example from the PRIN NEON site, where NEON captured lidar data twice within 2 days, and so we could explore how different the data were. Again, we used Jupyter Notebooks and explored the relative differences in DSM and DTM values between the two lidar captures. The differences are random, but non-negligible, at least for DSM. For the DTM, the range = 0.0-20cm; but for the DSM the range = 0.0-1.5. The mean DSM is 6.34m, so the difference can be ~20%. The take home is that despite a 15cm accuracy spec from vendors on vertical accuracies, you can get very different measures on different flights and those can be considerable, especially with vegetation. In fact, NEON meets its 15cm accuracy requirements only in non-vegetated areas. Note, when you download NEON data, you can get line-to-line differences in the NEON lidar metadata, to kind of explore this. But assume if you are in heavily vegetated areas you should expect higher than 15cm error.
After lunch we launched into the NEON Imaging Spectrometer data and uncertainty with Nathan This is something I had not really thought about before this workshop.
We talked about orthorectfication and geolocation, focal plan characterization, spectral calibration and radiometric calibration and all the possible sources of error that can creep into the data, like blurring and ghosting of light. NEON calibrates their data across these areas, and provided information on each. I don't think there are many standards for reporting these kinds of spectral uncertainties.
The first live coding exercise (Hyperspectral Variation Uncertainty Analysis in Python) looked at the NEON site F07A, at which NEON acquired 18 individual flights (for BRDF work) over an hour on one day. We used these data and plotted the different spectral reflectance curves for several pixels. For a vegetated pixel, the NIR can vary tremendously! (e.g. 20% reflectance compared to 50% reflectance, depending on time of day, solar angle, etc.) Wow! I should note that the related indices - NDVI, which are ratios, will not be as affected. Also, you can normalize the output using some nifty methods like the Standard Normal Variate (SNV) algorithm, if you have large areas over which you can gather multiple samples.
The second live coding exercise (Assessing Spectrometer Accuracy using Validation Tarps with Python) focused on a calibration experiment they conducted at CHEQ for the NIS instrument. They laid out two reflectance tarps - 3% (black) and 48% (white), measured reflectance with an ASD spectrometer, and flew over with the NIS. We compared the data across wavelengths. Results summary: small differences between ASD and NIS across wavelengths; water absorption bands play a role; % differences can be quite high - up to 50% for the black tarp. This is mostly from stray light from neighboring areas. NEON has a calibration method for this (they call it their "de-blurring correction").
Fun additional take-home messages/resources:
- All NEON point cloud classifications are done with LASTools. Go LASTools! https://rapidlasso.com/lastools/
- Check out pdal - like gdal for point clouds. It can be used from bash. Learned from my workshop neighbor Sergio Marconi https://www.pdal.io/
- Reflectance Tarps are made by GroupVIII http://www.group8tech.com/
- ATCOR http://www.rese.ch/products/atcor/ says we should be able to rely on 3-5% error on reflectance when atmospheric correction is done correctly (say that 10 times fast) with a well-calibrated instrument.
- NEON hyperspectral data is stored in HDF5 format. HDFView is a great tool for interrogating the metadata, among other things.
Thanks to everyone today! Megan Jones (our fearless leader), Tristan Goulden (Discrete Lidar Uncertainty and all the coding), Nathan Leisso (spectral data uncertainty), and Amanda Roberts (NEON intern - spectral uncertainty).
First of all, Pearl Street Mall is just as lovely as I remember, but OMG it is so crowded, with so many new stores and chains. Still, good food, good views, hot weather, lovely walk.
Welcome to Day 2! http://neondataskills.org/data-institute-17/day2/
Our morning session focused on reproducibility and workflows with the great Naupaka Zimmerman. Remember the characteristics of reproducibility - organization, automation, documentation, and dissemination. We focused on organization, and spent an enjoyable hour sorting through an example messy directory of misc data files and code. The directory looked a bit like many of my directories. Lesson learned. We then moved to working with new data and git to reinforce yesterday's lessons. Git was super confusing to me 2 weeks ago, but now I think I love it. We also went back and forth between Jupyter and python stand alone scripts, and abstracted variables, and lo and behold I got my script to run.
The afternoon focused on Lidar (yay!) and prior to coding we talked about discrete and waveform data and collection, and the opentopography (http://www.opentopography.org/) project with Benjamin Gross. The opentopography talk was really interesting. They are not just a data distributor any more, they also provide a HPC framework (mostly TauDEM for now) on their servers at SDSC (http://www.sdsc.edu/). They are going to roll out a user-initiated HPC functionality soon, so stay tuned for their new "pluggable assets" program. This is well worth checking into. We also spent some time live coding with Python with Bridget Hass working with a CHM from the SERC site in California, and had a nerve-wracking code challenge to wrap up the day.
Fun additional take-home messages/resources:
- ISO International standard for dates = YYYY-MM-DD
- Missing values in R = NA, in Python = -9999
- For cleaning messy data - check out OpenRefine - a FOS tool for cleaning messy data http://openrefine.org/
- Excel is cray-cray, best practices for spreadsheets: http://www.datacarpentry.org/spreadsheet-ecology-lesson/
- Morpho (from DataOne) to enter metadata: https://www.dataone.org/software-tools/morpho
- Pay attention to file size with your git repositories - check out: https://git-lfs.github.com/. Git is good for things you do with your hands (like code), not for large data.
- Funny how many food metaphors are used in tech teaching: APIs as a menu in a restaurant; git add vs git commit as a grocery cart before and after purchase; finding GIS data is sometimes like shopping for ingredients in a specialty grocery store (that one is mine)...
- Markdown renderer: http://dillinger.io/
- MIT License, like Creative Commons for code: https://opensource.org/licenses/MIT
- "Jupyter" means it runs with Julia, Python & R, who knew?
- There is a new project called "Feather" that allows compatibility between python and R: https://blog.rstudio.org/2016/03/29/feather/
- All the NEON airborne data can be found here: http://www.neonscience.org/data/airborne-data
- Information on the TIFF specification and TIFF tags here: http://awaresystems.be/, however their TIFF Tag Viewer is only for windows.
Thanks for everyone today! Megan Jones (our fearless leader), Naupaka Zimmerman (Reproducibility), Tristan Goulden (Discrete Lidar), Keith Krause (Waveform Lidar), Benjamin Gross (OpenTopography), Bridget Hass (coding lidar products).
Our home for the week
I left Boulder 20 years ago on a wing and a prayer with a PhD in hand, overwhelmed with bittersweet emotions. I was sad to leave such a beautiful city, nervous about what was to come, but excited to start something new in North Carolina. My future was uncertain, and as I took off from DIA that final time I basically had Tom Petty's Free Fallin' and Learning to Fly on repeat on my walkman. Now I am back, and summer in Boulder is just as breathtaking as I remember it: clear blue skies, the stunning flatirons making a play at outshining the snow-dusted Rockies behind them, and crisp fragrant mountain breezes acting as my Madeleine. I'm back to visit the National Ecological Observatory Network (NEON) headquarters and attend their 2017 Data Institute, and re-invest in my skillset for open reproducible workflows in remote sensing.
Day 1 Wrap Up from the NEON Data Institute 2017
What a day! http://neondataskills.org/data-institute-17/day1/
Attendees (about 30) included graduate students, old dogs (new tricks!) like me, and research scientists interested in developing reproducible workflows into their work. We are a mix of ages and genders. The morning session focused on learning about the NEON program (http://www.neonscience.org/): its purpose, sites, sensors, data, and protocols. NEON, funded by NSF and managed by Battelle, was conceived in 2004 and will go online for a 30-year mission providing free and open data on the drivers of and responses to ecological change starting in Jan 2018. NEON data comes from IS (instrumented systems), OS (observation systems), and RS (remote sensing). We focused on the Airborne Observation Platform (AOP) which uses 2, soon to be 3 aircraft, each with a payload of a hyperspectral sensor (from JPL, 426, 5nm bands (380-2510 nm), 1 mRad IFOV, 1 m res at 1000m AGL) and lidar (Optech and soon to be Riegl, discrete and waveform) sensors and a RGB camera (PhaseOne D8900). These sensors produce co-registered raw data, are processed at NEON headquarters into various levels of data products. Flights are planned to cover each NEON site once, timed to capture 90% or higher peak greenness, which is pretty complicated when distance and weather are taken into account. Pilots and techs are on the road and in the air from March through October collecting these data. Data is processed at headquarters.
In the afternoon session, we got through a fairly immersive dunk into Jupyter notebooks for exploring hyperspectral imagery in HDF5 format. We did exploration, band stacking, widgets, and vegetation indices. We closed with a fast discussion about TGF (The Git Flow): the way to store, share, control versions of your data and code to ensure reproducibility. We forked, cloned, committed, pushed, and pulled. Not much more to write about, but the whole day was awesome!
Fun additional take-home messages:
- NEON is amazing. I should build some class labs around NEON data, and NEON classroom training materials are available: http://www.neonscience.org/resources/data-tutorials
- Making participants do organized homework is necessary for complicated workshop content: http://neondataskills.org/workshop-event/NEON-Data-Insitute-2017
- HDF5 as an possible alternative data format for Lidar - holding both discrete and waveform
- NEON imagery data is FEDExed daily to headquarters after collected
- I am a crap python coder
- Tabs are my friend
Thanks to everyone today, including: Megan Jones (Main leader), Nathan Leisso (AOP), Bill Gallery (RGB camera), Ted Haberman (HDF5 format), David Hulslander (AOP), Claire Lunch (Data), Cove Sturtevant (Towers), Tristan Goulden (Hyperspectral), Bridget Hass (HDF5), Paul Gader, Naupaka Zimmerman (GitHub flow).