AUTHORS: Sean A. Woznicki, Jeremy Baynes, Stephanie Panlasigui, Megan Mehaffey, Anne Neale – U.S. EPA Office of Research and Development/National Exposure Research Laboratory/Systems Exposure Division/Ecological and Human Community Analysis Branch
ABSTRACT: Floodplains perform several important ecosystem services, including storing water during precipitation events and reducing peak flows, thereby reducing flooding of adjacent communities. Understanding the relationship between flood inundation and floodplains is critical for ecosystems’ and communities’ health and well-being, as well as targeting floodplain and riparian restoration. Many communities in the United States, particularly those in rural areas, lack flood inundation maps due to the high cost of flood modeling. Only 60% of the conterminous United States has been mapped through the Federal Emergency Management Agency (FEMA) Flood Insurance Rate Maps (FIRM) program. Therefore, we developed a 30-meter resolution flood inundation map of the conterminous United States using random forests with existing FIRM 100-year floodplains as training data. Random forests are an ensemble machine learning method for classification, and have been used in the past for applications such as land cover classification and disaster identification. Input datasets included digital elevation model (DEM)-derived variables, flood-related soil characteristics, and land cover. Models were trained and tested at the hydrologic unit code level two (HUC-2) scale and each 30-m pixel in the CONUS was classified as floodplain or not-floodplain. The most important variables were typically vertical distance to channel and overland flow distance (both DEM derivatives) and soils’ dominant flood frequency class (e.g. rare, occasional, frequent), although their relative importance varied by HUC. Classification accuracy was used the F1 statistic, which balances precision and recall of the model when compared to the FIRMs. The models performed well in the eastern and Midwest CONUS, but were less robust in the arid southwest, likely due to greater topographic complexity, coarser soils data, and lack of quality model training data. However, the overall performance of the random forest models in this context demonstrates the method’s ability to complete the remaining unmapped floodplains in the CONUS. Keywords: floodplains, machine learning, random forest, ecosystem services