Meet

Instructor

Manil Maskey

Biography

Manil Maskey is a Senior Research Scientist with the National Aeronautics and Space Administration (NASA). He also leads the Advanced Concepts team, within the Inter Agency Implementation and Advanced Concepts at the Marshall Space Flight Center and Science Mission Directorate’s Artificial Intelligence initiative at NASA HQ.  His research interests include computer vision, visualization, knowledge discovery, cloud computing, and data analytics. Dr. Maskey's career spans over 21 years in academia, industry, and government. Dr. Maskey is an adjunct faculty at the UAH Atmospheric Science department, a senior member of Institute of Electrical and Electronics Engineers (IEEE), chair of the IEEE Geoscience and Remote Sensing Society (GRSS) Earth Science Informatics Technical Committee, member of American Geophysical Union (AGU) and AGU Fall Meeting Planning Committee, member of European Geosciences Union (EGU), and member of Association for Advancement of Artificial Intelligence (AAAI).

Brian Freitag

Biography

Dr. Brian Freitag is a Research Scientist with the Interagency Implementation and Advanced Concepts (IMPACT) Team, where he leads the Harmonized Landsat/Sentinel-2 (HLS) production project.  His research focused on the impacts of urbanization in regions of complex terrain using a combination of mesoscale numerical weather prediction and ground-based and satellite-based observations.  During his time with the IMPACT project, Brian has supported multiple efforts including the Analysis and Review of the Common Metadata Repository (ARC), machine learning for Earth science, the Satellite Needs Working Group (SNWG) biennial assessment, and the Commercial Smallsat Data Acquisition Program.

Sean Harkins

Biography

Sean is an engineer at Development Seed. He builds infrastructure, applications, and pipelines to make massive geospatial datasets more accessible for decision-makers. Sean is passionate about open source solutions and helping to produce an ecosystem of tools that are available to everyone. He is a strong believer in the power of data and visualizations to help educate people about issues in the larger world.  Sean is the technical lead for the NASA Harmonized Landsat/Sentinel-2 data production.

Muthukumaran Ramasubramanian

Biography

Muthukumaran Ramasubramanian received the M.S. degree incomputer science from the University of Alabama in Huntsville (UAH), where heis currently pursuing the Doctorate degree in computer science. He is also aComputer Science Researcher and leads the Machine Learning Team forNASA–Interagency Implementation and Advanced Concepts Team, UAH. His workfocuses on using deep-NLP techniques to surface novel relationships from largecorpora of text and to deploy deep learning solutions to detecting earthscience phenomena on a global scale. His research interests include machinelearning, big data, computer vision, and scalable cloud services.

Iksha Gurung

Biography

Iksha Gurung is a Computer Scientist working with University of Alabama in Huntsville, supporting National Aeronautics and Space Administration Inter-Agency Implementation of Advanced Concepts Team (NASA-IMPACT). He leads the development and machine learning team in NASA-IMPACT.  His projects include applying machine learning to Earth science phenomena studies and scaling the solutions to production.

Linsong Chu

Biography

Linsong is a Senior Technical Staff Member (STSM) in IBM Research, focusing on geospatial analysis and foundation model.

Paolo Fraccaro

Biography

Dr. Paolo Fraccaro is a Research Scientist with IBM Research Europe, where he leads the Geospatial Foundational Model finetuning and inference stack. Paolo has extensive research experience in Data Science and AI across a range of academic and business domains, including remote sensing, healthcare, climate and sustainability and agritech. This allowed him to achieve the Distinguished Data Scientist certification from the The Open Group.

Johannes Jakubik

Biography

Johannes is a 3rd year Ph.D. student in Machine Learning (ML) and Information Systems (IS) at Karlsruhe Institute of Technology (KIT) and a visiting researcher at IBM Research. He has a background in applied machine learning, having worked on a variety of topics at KIT, ETH Zurich, and during internships in industry. At IBM Research, Johannes is responsible for finetuning geospatial foundation models to downstream applications in collaboration with NASA Impact.

Blair Edwards

Biography

Dr. Blair Edwards is a Senior Technical Staff Member (STSM) in IBM Research.  His research focuses around geospatial data and modelling workflows.  Developing novel cloud-native tools for accelerating the use of geospatial data for various applications. This includes ways to use data, AI and simulation modelling at scale, in a federated, composable, user-friendly manner.  His recent focus has been centred around Climate & Sustainability, initially around climate impacts such as flooding and wildfire, but the tools and techniques have broad applicability to a wide range of challenges around workflows, cloud and data. Based at the Hartree Centre (Daresbury, UK) since joining IBM Research in 2016, leading data science efforts in the collaboration with STFC, including a range of client projects across a diverse range of industry sectors. Before joining IBM, Blair received an MPhys in Physics with Satellite Technology from the University of Surrey, followed by a PhD from Imperial College London in the field of experimental searches for WIMP Dark Matter (thesis). He continued this work at the Rutherford Appleton Laboratory and Yale University as a Postdoctoral Associate.

Lecture content

Data Science at Scale: Harmonized Landsat Sentinel (HLS) Case Study

Most people associate data science as the process of extracting knowledge and insights from data. While that is partly true, data science is a broader concept involving data collection, storage, integration, analysis, inference, communication, and ethics. Gaining a good grasp of these concepts is essential for anyone working in a data-rich field such as Earth science and remote sensing.

This session will explain the complexity of the data life cycle, the supporting data and analytical systems, and the research life cycle. The participants will get a "behind the curtain" view of science data production at scale using the Harmonized Landsat Sentinel-2 (HLS) data as a case study. The lectures will explain the challenges of designing and implementing large-scale processing pipelines on the cloud. A supporting cloud-native analytical platform to enable interactive analysis and visualization will be covered. The concepts of Foundation Model, pre-training, and fine-tuning the model will be covered. The participants will get hands-on experience fine-tuning a foundation model for specific use-cases using HLS data and deploy the trained model to an endpoint for inferencing.

Learning outcomes

  • Thorough understanding of data science
  • Understand the basics of data production, management, and governance
  • Understand what foundation models are and their capabilities
  • Hands-on experience in fine-tuning a HLS foundation model for specific use-cases

Participant Requirements

  • Basic understanding of cloud computing and python programming language
  • Interest in data science, managing and analyzing Earth science data at scale

Agenda

  • Data Science Overview (Manil Maskey) [9:00 - 9:30 GMT]
  • Data System: Processing and Analysis ( Brian Freitag and Sean Harkins)

                       - Large Scale Data Harmonization

                                        Management and Governance - Lecture [9:30 - 10:00 GMT]

.                                       Processing - Lecture/Demo [10:00 - 10:30 GMT]

                         - Break [10:30 - 11:00 GMT]

                         - Analysis and Exploration: Analytics platform, science analysis - Hands-on exercise (Brian Freitag) [11:00  - 12:00 GMT]

                                        NASA FIRMS (HLS applications and dynamic tiling capabilities)

                                         Interactive HLS notebook for analysis and visualization

  • Lunch (12:00 - 13:30 GMT)
  • Application: Geospatial Foundation Model 

                           - Overview - Lecture (Linsong Chu) [13:30 - 14:00 GMT]

                           - Fine-tune HLS foundation model for specific use-cases [14:00 - 15:30 GMT)

                                           SMCE environment access (Blair Edwards, Alex Corvin)

                                           Clone resources from GitHub (Paolo Fraccaro, Johannes Jakubik)

                                           Interactive HLS foundation model fine-tuning notebook (Paolo Fraccaro, Johannes Jakubik)

                                                         Flood detection

                                                         Fire burn scars detection

  • Break (15:30 - 16:00 PM)
  • Inference with fine-tuned model (Muthukumaran Ramasubramanian) [16:00 - 16:30 GMT)

                               - Interactive notebook and use fine-tuned model

                                             Inference on pre-listed images

                                             Insights from Inference - visualization

  • Conclusion (Manil Maskey) [16:30 - 17:00 GMT]