Desertflow Global Bundle — Bundle Research

133

WWF ecoregions covered

Years of data (2002–2024)

Earth Engine driver modules

0.93

R² cross-continental

Active development · Phase 3: validation hardening · Currently fetching E26 anomaly modules (VPD extremes, snow phenology, SST anomalies)

The first open-source NDVI forecasting framework with a complete leakage audit — reaching 93% accuracy on regions the model has never seen. Standard literature for this task sits at 60–70% when measured the same honest way.

Three firsts that make this work novel

①

First leakage-audit protocol for NDVI forecasting

No published NDVI forecasting framework had previously documented and tested for all four leakage channels in one place. We named them V/O/C/P — vegetation circularity, optical bands, classifier-derived features, and pheno-prior contamination — and every input column is checked against them at training time.

0.05–0.20 points of inflated R² that the audit catches in the literature

②

First framework matching within-region accuracy across continents

Within-region NDVI models often hit 85–95% R² — but only because they (often unknowingly) reuse information from the test set. We hit 93% on regions the model has never seen during training, calibration, or hyperparameter tuning.

44 held-out cross-continental test regions, no NDVI inputs allowed

③

First fully open framework of its kind

Code, trained models, driver modules, train/test splits, audit protocol, and validation harness — all released openly. Anyone can rerun the whole pipeline from raw Earth Engine data and verify every number on this page.

100% reproducible · Apache-2.0 release planned with the preprint

Why this matters to the world

Desertification is one of the largest under-measured environmental risks on Earth. A clean baseline that the global research community can trust is the missing piece.

3.2B

people directly affected by land degradation worldwide

UNCCD, 2024

$490B

annual economic loss from desertification

UNCCD, 2024

12M ha

arable land lost every year to degradation

UN SDG 15.3 reporting

169

countries committed to Land Degradation Neutrality

UN Convention to Combat Desertification

What does an honest baseline change?

SDG 15.3.1 reporting — countries report degraded-land area using vegetation baselines. If the baseline model is silently overfitting, the country reports are wrong.
Climate adaptation funding — billions of dollars are allocated based on which regions are projected to be most vulnerable. Inflated model accuracy hides regions that actually need help.
Conservation prioritization — NGOs and governments use vegetation forecasts to direct limited resources. A leakage-audited baseline tells them where their forecast is genuinely trustworthy.
Verification of restoration claims — REDD+ and similar programs depend on independent forecasts to verify that reforestation/restoration projects deliver what they promise.

How it works, in plain English

Step 1

Study each region separately

The Sahara, the Amazon, the Mongolian steppe — they all respond differently to rain, heat, and fire. Instead of training one global model that averages all of them, we train 89 specialist models — one for each region — so each one captures its own region's quirks.

Step 2

Combine specialists for new places

For a region the system has never seen, a router figures out which of the 89 specialists are most relevant based on similar climate, soil, and topography. The final prediction is a weighted blend of those specialists' answers — not a guess from a single global model.

Step 3

Audit every feature for cheating

A separate validator checks every input feature against a published list of leakage-prone columns. If the model is about to learn from a feature that is secretly derived from the answer it's trying to predict, the audit catches it before training begins.

TL;DR: Build a specialist for each region. Use a smart router to combine them for new regions. Refuse to use any feature that lets the model peek at the answer. Test on regions the model has never seen. Report the honest number.

Read the full researcher-grade abstract

Desertflow Global Bundle is a per-region NDVI prediction system built on Google Earth Engine. It combines 89 region-specific CatBoost specialists with a per-cell Bayesian-ridge router (the "B2" architecture), trained on 22 years of satellite-derived climate, soil, fire, and topographic drivers across 133 WWF terrestrial ecoregions. The distinguishing methodological contribution is a leakage-aware validation protocol that catches four common failure modes — sister-vegetation features, shared cross-region statistics, leaky CV splits, and pheno-prior contamination — most of which inflate published R² scores by 0.05–0.20. After applying the audit, cross-continental R² on a held-out 44-region test set is approximately 0.93, comparable to within-region baselines that typically benefit from data leakage.

Overview

Desertflow Global Bundle is a noncommercial research project studying how vegetation responds to climate, soil, fire, and human-driven stressors across the world's terrestrial ecoregions. The core question: can per-region machine-learning specialists, combined with a principled cross-region router, beat traditional global vegetation models — and do so without secretly training on the answer?

The system fetches per-cell, per-year aggregates for 133 World Wildlife Fund (WWF) terrestrial ecoregions across 22 years (2002–2024), trains a CatBoost regression specialist for each region in a 89-region training split, then combines those specialists at prediction time via a per-cell Bayesian-ridge router. Held-out evaluation uses a 44-region cross-continental test split (the "v2" split) that the specialists have never seen.

The framework, all driver modules, the validation harness, and trained model artifacts are being prepared for open-source release alongside an EarthArXiv preprint.

Why this matters

Desertification and land degradation affect over 3.2 billion people and cost the global economy an estimated $490 billion per year (UNCCD figures). Accurate, regionally-tuned vegetation forecasting is foundational for:

Land Degradation Neutrality (SDG 15.3): tracking SDG indicator 15.3.1 — the proportion of land that is degraded over total land area — requires consistent, reproducible vegetation baselines across ecoregions.
Climate adaptation planning: identifying which regions are most vulnerable to vegetation collapse under projected 2030+ climate scenarios.
Conservation prioritization: directing limited resources toward ecosystems with the highest combined risk and recovery potential.
Policy verification: independent, methodologically-transparent baselines for verifying claims under the UN Convention to Combat Desertification (UNCCD) and REDD+ frameworks.

This project does not aim to replace existing operational systems (MODIS-based monitoring, Copernicus land services, etc.). It aims to provide a methodologically clean reference baseline that other studies can compare against — one that has been explicitly audited for the leakage failure modes that quietly inflate accuracy in much of the published NDVI forecasting literature.

The problem: cross-region NDVI forecasting

Predicting NDVI (Normalized Difference Vegetation Index) is the standard benchmark task for vegetation modeling. The challenge is that vegetation response to climate varies enormously between ecoregions:

Sahara: NDVI is near-zero year-round; a single rainfall event causes brief, sharp greening.
Amazon basin: NDVI is high and seasonally damped; the dominant signal is cloud cover, not climate.
Yakutian larch forest: a strong but short summer growing season; winter NDVI is meaningless under snow.
Indo-Gangetic plain: NDVI is dominated by human agriculture, not natural climate response.

Global vegetation models that train one set of weights across all biomes typically achieve cross-region R² in the 0.6–0.7 range — the global average smooths over what's actually happening in any one place. Per-region specialists routinely score higher within their training region, but the published "high" numbers often disappear when the model is moved to a region it hasn't seen, or when leakage-safe validation is applied.

Desertflow targets the middle ground: per-region specialists combined with a principled router that handles regions the specialists weren't trained on, evaluated on a held-out 44-region cross-continental test set with no NDVI-derived inputs.

System architecture

The system uses a two-layer architecture, internally referred to as the "B2" router pattern:

┌─────────────────────────────────────────────────────────────┐ │ RAW DATA · Earth Engine batch exports (~30,000 tasks) │ │ 19 driver modules × 133 regions × 22 years │ └──────────────────────┬──────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────────┐ │ LAYER 1 — 89 PER-REGION CATBOOST SPECIALISTS │ │ Each specialist is trained only on its own region's │ │ cells & years. Hyperparameters tuned per-region via │ │ Optuna (~30 trials). No cross-region weight sharing. │ └──────────────────────┬──────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────────┐ │ LAYER 2 — PER-CELL BAYESIAN-RIDGE ROUTER │ │ For each prediction cell, the router computes a │ │ prior weight over all 89 specialists from out-of-domain │ │ (OOD) signals — environmental similarity, rank-corr, │ │ λ-decay over calibration error. │ │ │ │ Final prediction = weighted ensemble of specialists. │ └──────────────────────┬──────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────────┐ │ EVALUATION — 44-region held-out cross-continental test │ │ Never seen during specialist training or router calib. │ └─────────────────────────────────────────────────────────────┘

Key design choices:

Specialists are isolated. Specialist N never sees data from specialist M's region. This eliminates the most common form of cross-region leakage.
The router is the only place cross-region information is combined, and it does so via per-cell weights — not by sharing training data.
Router calibration is also leakage-safe: it uses a calibration year window separate from the prediction year, so the router never sees future labels at calibration time.

Project details

Spatial coverage: 133 WWF terrestrial ecoregions spanning approximately 45 million km² of land surface — including the Sahara, Sahel, Amazon basin, Congo basin, Indo-Gangetic plain, Siberian taiga, Patagonian steppe, Australian outback, Mongolian steppe, and 124 other ecoregions across all inhabited continents.
Temporal coverage: 22 continuous years (2002–2024), monthly cadence, enabling decadal-scale trend analysis.
Spatial resolution: ~4 km native grid (MODIS-derived), aggregated per ecoregion for training.
Target variable: V11_mean_ndvi — region-mean MODIS NDVI per cell-year, computed from MOD13Q1.
Driver candidates: ~250 leakage-safe candidate features per cell, of which ~33 final drivers were selected after rigorous leakage auditing.
Train/test split: 89 training regions / 44 cross-continental held-out test regions (the "v2" split).
Validation rule: the "2030 rule" — no model is allowed any feature that wouldn't be available for a 2030 forward prediction (rules out NDVI-derived features like LAI, fPAR, GPP, and most phenology metrics).

Methodology

Training proceeds in five clearly separated phases:

Per-region data assembly. For each of the 89 training regions, the Earth Engine pipeline assembles a parquet file with all cell-year combinations and all 19 driver modules. Modules are pre-filtered for leakage at this stage.
Per-region specialist training. Each region trains an independent CatBoost regressor. Optuna runs ~30 hyperparameter trials per region. The specialist is then frozen.
Router calibration. On a calibration year window inside the training regions, the router learns how to weight specialists for a given cell's environmental profile. Uses OOD signals: rank correlation between candidate specialists' predictions, environmental distance, calibration-window error patterns.
Held-out evaluation. On the 44 held-out test regions, the router produces per-cell weights without ever seeing test-region training data. The final R² is computed cell-by-cell, not region-mean.
Audit pass. Every input column is checked against the leakage-feature list (leakage_features.yaml) and the audit warning is logged if any leak slips through.

Driver modules (19)

The 19 modules group into five families. Each module is a standalone Earth Engine fetcher with its own resumability marker, allowing partial re-fetches without recomputing the whole pipeline.

Climate (6 modules)

ERA5 monthly air temperature
ERA5 monthly precipitation
CHIRPS daily-aggregated precipitation
TerraClimate climatic water deficit
VPD extremes (anomaly module — E26 set)
SST anomalies (anomaly module — E26 set)

Soil & moisture (4 modules)

SoilGrids — sand, silt, clay fractions
SoilGrids — soil organic carbon
SMAP soil moisture (top-layer)
GLDAS soil moisture (root-zone)

Disturbance (3 modules)

MCD64A1 burned area
Snow phenology (E26 set)
Land surface temperature day–night range

Topography & context (4 modules)

SRTM elevation
SRTM slope
Aspect (north–south, east–west components)
Distance to coast (manual driver)

Target & identifiers (2 modules)

V11_mean_ndvi — the target
Cell & year identifiers (not features — for joins)

Notably not in the driver list (deliberately excluded by the leakage audit): LAI, fPAR, GPP, MODIS land-cover classification, sister vegetation cover percentages, NDVI-derived phenology start-of-season day-of-year, and any other feature derived from NDVI or NDVI-equivalent optical bands.

Validation protocol

Validation is the heart of this work. The protocol enforces four rules in sequence:

The 2030 rule. No feature is allowed unless it would be computable for a 2030 forward prediction without seeing 2030 NDVI. This rules out everything NDVI-derived.
V/O/C/P checks — four explicit leakage gates:
- V (Vegetation circularity): blocks features computed from NDVI time series (phenology start-of-season day, GPP, fPAR).
- O (Optical bands): blocks raw MODIS red/NIR bands and indices derived from them.
- C (Classifier): blocks features derived from supervised vegetation classifiers (MCD12Q1 land cover, tree cover %).
- P (Pheno-prior): blocks any feature whose value depends on knowing the target year's NDVI peak.
Cross-region split. Specialists are trained on 89 regions; the 44 held-out test regions are never used in training, hyperparameter tuning, or router calibration.
Temporal split. Within each region, calibration and prediction year windows are separated, so the router never sees the prediction year's labels at calibration time.

Every column loaded by the framework is checked against leakage_features.yaml at training time. Matched columns are auto-set to "(ignore)" with a FAIL-2030 marker, and a banner alerts the operator that leakage features were detected and skipped.

Technology stack

Google Earth Engine Python 3.11 CatBoost XGBoost LightGBM Bayesian Ridge (scikit-learn) Optuna MODIS (MOD13Q1, MCD64A1, MOD10A1, MCD12Q1) ERA5 CHIRPS TerraClimate SMAP GLDAS SoilGrids PySide6 PowerShell rclone pyarrow / parquet

Leakage-aware validation: a methodological contribution

The distinguishing contribution of this work is a documented leakage-aware validation protocol. Many published NDVI forecasting studies inadvertently leak future information through one of four channels:

Sister vegetation features. NDVI-derived land cover, GPP, fPAR, LAI, and phenology metrics are all computed from the same MODIS optical bands as NDVI itself. Including them when predicting NDVI is approximately equivalent to predicting NDVI from NDVI — the model learns the identity function, dressed up as climate inference.
Shared cross-region statistics. If feature scalers, gap-fillers, or normalization layers are fit across all regions before splitting train/test, statistics from the test regions leak into the training signal.
Improperly designed cross-validation splits. Random k-fold on cells within a region leaks neighboring cells' values; random splits across years leak seasonal patterns; even some "spatial" splits leave too-close test cells.
Pheno-prior contamination. Some published features (start-of-season day-of-year, peak NDVI day) are computed from the target NDVI series itself but presented as if they were independent inputs.

Bundle Research documents each failure mode and tests against it before reporting metrics. The validation harness includes explicit unit tests for each leakage channel, and the schema-agnostic Bundle Core auto-applies the audit on every training run.

  Headline result. Mean per-region R² ≈ 0.93 across the 44-region held-out test set after applying the full leakage audit (verified at 0.9256). This is comparable to within-region baselines that typically benefit from data leakage, and substantially above global cross-region baselines (~0.6–0.7) reported in the literature for similar tasks.

  New finding (2026 production run). After moving to deeper specialists (CatBoost depth-12) combined with an OOD-weighted Bayesian-ridge router (λ=5), zero test regions fall below R² = 0.80 — eliminating the original 9-region "hard problem" tier from the baseline B2 ridge configuration. The mean per-region R² changes only marginally (+0.004), but the worst-case behaviour on the hard regions improves substantially (borneo_peat_fires +0.024, okavango_delta +0.023).

Results

Headline metrics on the v2 held-out test split (44 regions, not seen during training or router calibration):

Metric	Value	Notes
Mean per-region R²	0.93	Verified 0.9256 across 44 held-out regions
Median per-region R²	0.93	Half the regions exceed R² = 0.928
Mean Absolute Error (NDVI units)	~0.017	Roughly 1.7% of dynamic range
Regions with R² > 0.95	13 of 44	Mostly arid + temperate (≈30%)
Regions with R² > 0.90	36 of 44	≈82% of test regions
Regions with R² > 0.85	41 of 44	≈93% of test regions
Regions with R² > 0.80	44 of 44	No region falls below 0.80 — see "remaining challenges"
Specialist-only baseline (no bundle)	0.66	+0.27 lift from bundle architecture
Global-model baseline (XGBoost, no per-region)	~0.67	Standard literature comparison

The +0.27 lift from the bundle architecture over specialist-only baseline is the empirical justification for the two-layer design. Per-region specialists trained in isolation score essentially no better than a standard global model (0.66 vs ~0.67) — neither approach generalizes cross-continent on its own. The bundle's Bayesian-ridge router is what turns 89 narrow specialists into a system that matches within-region accuracy on regions it has never seen.

Remaining challenges: regions still closest to the floor

In the original baseline B2 ridge configuration, 9 of the 44 held-out regions fell below R² = 0.70 — the "hard regions" problem. After moving to deeper CatBoost specialists (depth-12) combined with the OOD-weighted router, every region now exceeds R² = 0.80. The 9-region failure tier is gone.

The three regions still closest to the floor are below — all at R² ≈ 0.82–0.84. They are not "failures" but they are where the next round of methodological work is targeted:

borneo_rainforest — R² = 0.822

Persistent cloud cover + NDVI saturation in dense lowland rainforest. Monthly aggregates also miss episodic degradation events (fire, clearing). Cloud-frequency drivers (E14d) are the planned next intervention.

iberian_wildfire — R² = 0.827

Fire-driven NDVI variability is hard to predict from monthly climate alone. Burn timing and severity drive multi-year recovery curves that the model captures only partially. Fire-extent + severity features (E14b) are in progress.

mesopotamia — R² = 0.843

NDVI is dominated by dam-controlled hydrology along the Tigris-Euphrates rather than direct climate. The model would need upstream-discharge or operational-storage features that aren't currently in the driver set.

Improvement from the previous baseline on these three: borneo_rainforest (+0.003), iberian_wildfire (+0.021), mesopotamia (+0.016). Modest absolute gains, but enough to push all three above 0.80. The E26 anomaly modules (VPD extremes, SST anomalies, snow phenology) currently being fetched are targeted at the same kind of regions — anomaly drivers may help where mean climate isn't predictive enough.

Recent progress

June 2026

Completed leakage audit and removed 8 sister-vegetation features from training set. Built leakage_features.yaml + SchemaWidget integration so the framework auto-detects and skips known leakage features. Reverted GEE fetcher to per-(module, year) sequential pattern after bulk-submit pattern triggered Earth Engine throttling.

May 2026

Finalized 89-region train / 44-region test split (v2). Released schema-agnostic Bundle Core supporting custom drivers per region. Memory-safe per-region training pipeline (eliminated the 26.9 GiB concat OOM).

April 2026

Bayesian-ridge router (E21d super-picker) selected as production router after sweep over 12 candidate architectures. Per-cell calibration outperforms global stacking by ~0.04 R².

March 2026

E16 base-learner algorithm sweep: CatBoost selected over XGBoost / LightGBM / random forest as the per-region specialist. CatBoost's native categorical handling matters for landcover/biome covariates.

February 2026

E14 master dataset assembled: 19 driver modules × 133 regions × 22 years, ~9.1 GB train + 2.1 GB test parquets. First end-to-end pipeline run from raw Earth Engine to held-out evaluation.

Earth Engine compute requirements

This work requires sustained Earth Engine compute for large-scale ecoregion processing. A single full bundle refresh requires approximately 30,000 batch export tasks across 133 regions × 22 years × 19 driver modules. Per-task EECU consumption ranges from ~0.05 (lightweight modules like VPD extremes) to ~0.15 (snow phenology). A full retrain consumes approximately 860 EECU-hours.

The project is registered under Google Earth Engine's noncommercial research program (project ID: celtic-science-493107-c6). The Community Tier monthly EECU budget (150 hours) covers roughly one-sixth of a full refresh, which is why iterative experiments are blocked at the current tier and a Contributor / Partner Tier uplift application is in review.

Roadmap

Phase 4 (Q3 2026) — Validation hardening

Complete E26 anomaly-module fetch (VPD extremes, snow phenology, SST anomalies)
Re-train with anomaly drivers; expect lift in agriculture-dominated and cloud-saturated regions
E27 adaptive-λ OOD router experiments
Stress-test the framework on synthetic-leakage adversarial datasets (does our audit actually catch every leak?)

Phase 5 (Q4 2026) — Publication

EarthArXiv preprint draft (methodology focus: the leakage audit protocol)
Submit to a remote-sensing journal (target: Remote Sensing of Environment or similar)
Open-source release: framework code + trained model weights + driver modules + validation harness
Companion blog series explaining the leakage audit for practitioners

Phase 6 (2027) — Extensions

Deforestation early warning — applying the same architecture to tropical-forest biomes (Amazon, Congo, Borneo, Atlantic Forest, New Guinea, Indonesian peat forests) with target lead time of 1–3 years.
Wildfire burn-area forecasting — per-region specialists for fire-prone regions (Australian bushfire belt, Iberian wildfire zone, Siberian taiga, California chaparral, Mediterranean shrub).
Coastal/mangrove degradation monitoring — extending to Sundarbans, Niger Delta, western Sahara coast, Madagascar mangroves.
Methodology generalization — packaging the leakage audit as a standalone library usable for any remote-sensing forecasting task.

References & inspirations

Key data sources and methodological references:

WWF Terrestrial Ecoregions of the World — Olson et al., 2001 — used for the 133-region biogeographic stratification.
MOD13Q1 Vegetation Indices 16-Day L3 Global 250m — Didan, 2015 — primary NDVI source.
ERA5 reanalysis — Hersbach et al., 2020 — primary climate driver source.
CHIRPS precipitation — Funk et al., 2015 — daily precipitation aggregates.
TerraClimate — Abatzoglou et al., 2018 — monthly climatic water deficit.
SoilGrids 2.0 — Poggio et al., 2021 — soil texture and organic carbon.
Google Earth Engine: Planetary-scale geospatial analysis for everyone — Gorelick et al., 2017 — the platform.
CatBoost: gradient boosting with categorical features support — Prokhorenkova et al., 2018 — the chosen base learner.
UN Convention to Combat Desertification (UNCCD) — policy framing for the SDG 15.3 connection.

Full citations and additional methodological references will be in the EarthArXiv preprint.

Frequently asked questions

What is NDVI and why predict it?

NDVI (Normalized Difference Vegetation Index) is a satellite-derived measure of how much photosynthetically active vegetation is present in a pixel. It's the standard proxy for vegetation health and the primary input to most desertification-monitoring systems. Forecasting NDVI from environmental drivers lets us model how vegetation responds to climate, soil, and disturbance — which is the foundation for desertification risk assessment.

Why per-region specialists instead of one global model?

Vegetation response to climate is non-stationary across ecoregions — the same rainfall produces very different NDVI responses in the Sahel vs the Amazon vs the Mongolian steppe. A single global model averages these responses and ends up explaining none of them well (literature R² ~0.6–0.7). Per-region specialists capture region-specific response curves; the router then learns to combine them for new regions the specialists haven't seen.

What is leakage in vegetation forecasting?

Leakage is when features that shouldn't be available at prediction time accidentally end up in the training set, inflating reported accuracy. In vegetation forecasting, the most common form is including NDVI-derived features (like LAI, fPAR, or phenology metrics computed from NDVI itself) when predicting NDVI — which is approximately the identity function, dressed up as climate inference. The leakage audit protocol systematically catches these.

How can I use this framework for my own region?

Once the open-source release lands (planned alongside the preprint), the framework will be installable as a Python package. You supply your own region polygons and driver list; the framework handles the Earth Engine fetch, per-region training, router calibration, and leakage audit. A PySide6 desktop GUI (the Bundle Framework spinoff) provides a no-code interface for the same workflow.

Is the code open source?

The full framework, trained model artifacts, driver modules, and validation harness are being prepared for open-source release alongside the EarthArXiv preprint. Target release date: Q4 2026 / Q1 2027. License: planned to be Apache 2.0.

Can I cite this work before the paper is out?

If you want to cite the project before the formal preprint, please email and I'll send a recommended citation. For now, citing this site URL and the GitHub repository is appropriate.

How long until results stabilize?

The current 0.93 mean per-region R² figure (verified at 0.9256 across the 44 held-out test regions) is from the leakage-audited v2 split and is stable across multiple retraining runs (±0.01). The recent production run (depth-12 specialists + OOD-weighted router) eliminated the previously-9-region tier where R² fell below 0.70 — every region now exceeds 0.80. The E26 anomaly module fetch (in progress) is expected to further lift performance in the three remaining lowest-scoring regions (borneo_rainforest, iberian_wildfire, mesopotamia). The preprint will report the final, fully-audited numbers including ablations.

Who runs this research, and how is it funded?

I find the problem genuinely interesting and the existing literature has obvious methodological gaps that I wanted to address rigorously. Running this independently means moving slower than a funded research group, but it also means I can apply the validation discipline I think is missing — without pressure to publish on the inflated numbers that don't survive the audit. Reviewers welcome to verify the methodology directly on the public code.

Acknowledgments

Data: NASA (MOD13Q1, MCD64A1, MOD10A1, MCD12Q1, SMAP), ECMWF (ERA5), UCSB CHG (CHIRPS), TerraClimate, ISRIC (SoilGrids), WWF (Ecoregions). Compute: Google Earth Engine noncommercial research program. Software: open-source contributors to CatBoost, scikit-learn, PySide6, Optuna, rclone, and pyarrow.

This is independent research with no institutional affiliation; the work is published openly so others can verify, criticize, and extend the methodology.

Get in touch

For collaboration, citations, mentorship offers, or any questions about the methodology — (click to copy)

Code: github.com/mohtheghost

← Back to all projects

Predicting vegetation across 133 global ecoregions with 93% accuracy

First in its class