Extracting Climate Impacts from Text with LLMs

NoteSpeaker

Taís M. Nunes Carvalho | Helmholtz Centre for Environmental Research (UFZ) | ORCID iD GitHub

Overview

This presentation covers the use of Natural Language Processing (NLP) and Large Language Models (LLM) to extract structured, machine-readable information about climate hazards and their impacts from unstructured text (news articles, operational reports, scientific papers), organised around three areas:

  • The extraction pipeline, including keyword-based and classifier-based filtering of climate-related articles, text cleaning and deduplication, and the use of zero-shot LLM prompting with a constrained output schema to extract hazard codes, dates, locations, and quantitative and qualitative impact records as JSON

  • Evaluating LLM outputs, including the construction of gold standard annotations, field-level metrics, and the responsible use of LLMs in research

  • Applications and extensions, including multilingual processing of humanitarian reports, mapping of cascading impacts across sectors, and existing datasets built with NLP approach

No API key is required to follow the tutorial: pre-computed LLM outputs are included so you can follow the full evaluation workflow immediately. An optional section shows how to run live extractions using OpenAI, Groq, HuggingFace, or Anthropic.

Setup

You need Python 3.10+ (pip is included with Python).

  1. Download the environment file: requirements.txt

  2. Install the dependencies:

    pip install -r requirements.txt
  3. Launch Jupyter and open the notebook file:

    jupyter notebook

Notebook

View rendered jupyter notebook

Download jupyter notebook