Extracting Climate Impacts from Text with LLMs
Overview
This presentation covers the use of Natural Language Processing (NLP) and Large Language Models (LLM) to extract structured, machine-readable information about climate hazards and their impacts from unstructured text (news articles, operational reports, scientific papers), organised around three areas:
The extraction pipeline, including keyword-based and classifier-based filtering of climate-related articles, text cleaning and deduplication, and the use of zero-shot LLM prompting with a constrained output schema to extract hazard codes, dates, locations, and quantitative and qualitative impact records as JSON
Evaluating LLM outputs, including the construction of gold standard annotations, field-level metrics, and the responsible use of LLMs in research
Applications and extensions, including multilingual processing of humanitarian reports, mapping of cascading impacts across sectors, and existing datasets built with NLP approach
No API key is required to follow the tutorial: pre-computed LLM outputs are included so you can follow the full evaluation workflow immediately. An optional section shows how to run live extractions using OpenAI, Groq, HuggingFace, or Anthropic.
Setup
You need Python 3.10+ (pip is included with Python).
Download the environment file: requirements.txt
Install the dependencies:
pip install -r requirements.txtLaunch Jupyter and open the notebook file:
jupyter notebook
