Pensacola Bay FL - Detailed step-by-step

Standardize, clean and wrangle Water Quality Portal data in Pensacola and Perdido Bays into more analytic-ready formats using the harmonize_wq package

US EPA’s Water Quality Portal (WQP) aggregates water quality, biological, and physical data provided by many organizations and has become an essential resource with tools to query and retrieval data using python or R. Given the variety of data and variety of data originators, using the data in analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready format. Recognizing the definition of analysis-ready varies depending on the analysis, the harmonixe_wq package is intended to be a flexible water quality specific framework to help:

  • Identify differences in data units (including speciation and basis)

  • Identify differences in sampling or analytic methods

  • Resolve data errors using transparent assumptions

  • Reduce data to the columns that are most commonly needed

  • Transform data from long to wide format

Domain experts must decide what data meets their quality standards for data comparability and any thresholds for acceptance or rejection.

Detailed step-by-step workflow

This example workflow takes a deeper dive into some of the expanded functionality to examine results for different water quality parameters in Pensacola and Perdido Bays

Install and import the required libraries

[1]:
import sys
#!python -m pip uninstall harmonize-wq --yes
#!python -m pip install harmonize-wq --yes
# Use pip to install the package from pypi or the latest from github
#!{sys.executable} -m pip install harmonize-wq
# For latest dev version
#!{sys.executable} -m pip install git+https://github.com/USEPA/harmonize-wq.git@new_release_0-3-8
[2]:
import dataretrieval.wqp as wqp
from harmonize_wq import wrangle
from harmonize_wq import location
from harmonize_wq import harmonize
from harmonize_wq import visualize
from harmonize_wq import clean

Download location data using dataretrieval

[3]:
# Read geometry for Area of Interest from geojson file url and plot
aoi_url = r'https://raw.githubusercontent.com/USEPA/harmonize-wq/main/harmonize_wq/tests/data/PPBays_NCCA.geojson'
aoi_gdf = wrangle.as_gdf(aoi_url).to_crs(epsg=4326)  # already standard 4326
aoi_gdf.plot()
[3]:
<Axes: >
../_images/notebooks_Harmonize_Pensacola_Detailed_9_1.png
[4]:
# Note there are actually two polygons (one for each Bay)
aoi_gdf
# Spatial query parameters can be updated to run just one
bBox = wrangle.get_bounding_box(aoi_gdf)
# For only one bay, e.g., first is Pensacola Bay:
#bBox = wrangle.get_bounding_box(aoi_gdf, 0)
[5]:
# Build query with characteristicNames and the AOI extent
query = {'characteristicName': ['Phosphorus',
                                'Temperature, water',
                                'Depth, Secchi disk depth',
                                'Dissolved oxygen (DO)',
                                'Salinity',
                                'pH',
                                'Nitrogen',
                                'Conductivity',
                                'Organic carbon',
                                'Chlorophyll a',
                                'Turbidity',
                                'Sediment',
                                'Fecal Coliform',
                                'Escherichia coli']}
query['bBox'] = bBox
[6]:
# Query stations (can be slow)
stations, site_md = wqp.what_sites(**query)
[7]:
# Rows and columns for results
stations.shape
[7]:
(2874, 37)
[8]:
# First 5 rows
stations.head()
[8]:
OrganizationIdentifier OrganizationFormalName MonitoringLocationIdentifier MonitoringLocationName MonitoringLocationTypeName MonitoringLocationDescriptionText HUCEightDigitCode DrainageAreaMeasure/MeasureValue DrainageAreaMeasure/MeasureUnitCode ContributingDrainageAreaMeasure/MeasureValue ... AquiferName LocalAqfrName FormationTypeText AquiferTypeName ConstructionDateText WellDepthMeasure/MeasureValue WellDepthMeasure/MeasureUnitCode WellHoleDepthMeasure/MeasureValue WellHoleDepthMeasure/MeasureUnitCode ProviderName
0 USGS-AL USGS Alabama Water Science Center USGS-02376115 ELEVENMILE CREEK NR WEST PENSACOLA, FL Stream NaN 3140107.0 27.8 sq mi 27.8 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NWIS
1 USGS-AL USGS Alabama Water Science Center USGS-02377570 STYX RIVER NEAR ELSANOR, AL. Stream NaN 3140106.0 192.0 sq mi 192.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NWIS
2 USGS-AL USGS Alabama Water Science Center USGS-02377920 BLACKWATER RIVER AT US HWY 90 NR ROBERTSDALE, AL. Stream NaN 3140106.0 23.1 sq mi 23.1 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NWIS
3 USGS-AL USGS Alabama Water Science Center USGS-02377960 BLACKWATER RIVER AT CO RD 87 NEAR ELSANOR, AL. Stream NaN 3140106.0 56.6 sq mi 56.6 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NWIS
4 USGS-AL USGS Alabama Water Science Center USGS-02377975 BLACKWATER RIVER ABOVE SEMINOLE AL Stream NaN 3140106.0 40.2 sq mi NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NWIS

5 rows × 37 columns

[9]:
# Columns used for an example row
stations.iloc[0][['HorizontalCoordinateReferenceSystemDatumName', 'LatitudeMeasure', 'LongitudeMeasure']]
[9]:
HorizontalCoordinateReferenceSystemDatumName        NAD83
LatitudeMeasure                                 30.498252
LongitudeMeasure                               -87.335809
Name: 0, dtype: object
[10]:
# Harmonize location datums to 4326 (Note we keep intermediate columns using intermediate_columns=True)
stations_gdf = location.harmonize_locations(stations, out_EPSG=4326, intermediate_columns=True)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/clean.py:356: FutureWarning: Logical ops (and, or, xor) between Pandas objects and dtype-less sequences (e.g. list, tuple) are deprecated and will raise in a future version. Wrap the object in a Series, Index, or np.array before operating instead.
  cond_notna = mask & (df_out["QA_flag"].notna())  # Mask cond and not NA
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/clean.py:360: FutureWarning: Logical ops (and, or, xor) between Pandas objects and dtype-less sequences (e.g. list, tuple) are deprecated and will raise in a future version. Wrap the object in a Series, Index, or np.array before operating instead.
  df_out.loc[mask & (df_out["QA_flag"].isna()), "QA_flag"] = flag
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/clean.py:360: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'LatitudeMeasure: Imprecise: lessthan3decimaldigits' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[mask & (df_out["QA_flag"].isna()), "QA_flag"] = flag
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/clean.py:356: FutureWarning: Logical ops (and, or, xor) between Pandas objects and dtype-less sequences (e.g. list, tuple) are deprecated and will raise in a future version. Wrap the object in a Series, Index, or np.array before operating instead.
  cond_notna = mask & (df_out["QA_flag"].notna())  # Mask cond and not NA
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/clean.py:360: FutureWarning: Logical ops (and, or, xor) between Pandas objects and dtype-less sequences (e.g. list, tuple) are deprecated and will raise in a future version. Wrap the object in a Series, Index, or np.array before operating instead.
  df_out.loc[mask & (df_out["QA_flag"].isna()), "QA_flag"] = flag
[11]:
location.harmonize_locations?
[12]:
# Rows and columns for results after running the function (5 new columns, only 2 new if intermediate_columns=False)
stations_gdf.shape
[12]:
(2874, 42)
[13]:
# Example results for the new columns
stations_gdf.iloc[0][['geom_orig', 'EPSG', 'QA_flag', 'geom', 'geometry']]
[13]:
geom_orig         (-87.3358086, 30.49825159)
EPSG                                  4269.0
QA_flag                                  NaN
geom         POINT (-87.3358086 30.49825159)
geometry     POINT (-87.3358086 30.49825159)
Name: 0, dtype: object
[14]:
# geom and geometry look the same but geometry is a special datatype
stations_gdf['geometry'].dtype
[14]:
<geopandas.array.GeometryDtype at 0x7f8f1352abb0>
[15]:
# Look at the different QA_flag flags that have been assigned,
# e.g., for bad datums or limited decimal precision
set(stations_gdf.loc[stations_gdf['QA_flag'].notna()]['QA_flag'])
[15]:
{'HorizontalCoordinateReferenceSystemDatumName: Bad datum OTHER, EPSG:4326 assumed',
 'HorizontalCoordinateReferenceSystemDatumName: Bad datum UNKWN, EPSG:4326 assumed',
 'LatitudeMeasure: Imprecise: lessthan3decimaldigits',
 'LatitudeMeasure: Imprecise: lessthan3decimaldigits; HorizontalCoordinateReferenceSystemDatumName: Bad datum UNKWN, EPSG:4326 assumed',
 'LatitudeMeasure: Imprecise: lessthan3decimaldigits; LongitudeMeasure: Imprecise: lessthan3decimaldigits',
 'LongitudeMeasure: Imprecise: lessthan3decimaldigits',
 'LongitudeMeasure: Imprecise: lessthan3decimaldigits; HorizontalCoordinateReferenceSystemDatumName: Bad datum UNKWN, EPSG:4326 assumed'}
[16]:
# Map it
stations_gdf.plot()
[16]:
<Axes: >
../_images/notebooks_Harmonize_Pensacola_Detailed_22_1.png
[17]:
# Clip to area of interest
stations_clipped = wrangle.clip_stations(stations_gdf, aoi_gdf)
[18]:
# Map it
stations_clipped.plot()
[18]:
<Axes: >
../_images/notebooks_Harmonize_Pensacola_Detailed_24_1.png
[19]:
# How many stations now?
len(stations_clipped)
[19]:
1425
[20]:
# To save the results to a shapefile
#import os
#path = ''  #specify the path (folder/directory) to save it to
#stations_clipped.to_file(os.path.join(path, 'PPBEP_stations.shp'))

Retrieve Characteristic Data

[21]:
# Now query for results
query['dataProfile'] = 'narrowResult'
res_narrow, md_narrow = wqp.get_results(**query)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/dataretrieval/wqp.py:153: DtypeWarning: Columns (10,13,15,17,19,20,21,22,23,28,31,33,34,36,58,60,61,64,65,69,70,71,72,73) have mixed types. Specify dtype option on import or set low_memory=False.
  df = pd.read_csv(StringIO(response.text), delimiter=",")
[22]:
df = res_narrow
df
[22]:
OrganizationIdentifier OrganizationFormalName ActivityIdentifier ActivityStartDate ActivityStartTime/Time ActivityStartTime/TimeZoneCode MonitoringLocationIdentifier ResultIdentifier DataLoggerLine ResultDetectionConditionText ... AnalysisEndTime/TimeZoneCode ResultLaboratoryCommentCode ResultLaboratoryCommentText ResultDetectionQuantitationLimitUrl LaboratoryAccreditationIndicator LaboratoryAccreditationAuthorityName TaxonomistAccreditationIndicator TaxonomistAccreditationAuthorityName LabSamplePreparationUrl ProviderName
0 AWW_WQX Alabama Water Watch AWW_WQX-aww_0321:20131111121500:SR:WSO 2013-11-11 12:15:00 CST AWW_WQX-aww_0321 STORET-1079479903 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN STORET
1 21FLSEAS_WQX Florida Department of Environmental Protection 21FLSEAS_WQX-027950424132 2013-04-24 09:01:00 EST 21FLSEAS_WQX-02SEAS795 STORET-310551339 NaN NaN ... NaN NaN NaN https://www.waterqualitydata.us/data/providers... NaN NaN NaN NaN NaN STORET
2 21FLSEAS_WQX Florida Department of Environmental Protection 21FLSEAS_WQX-027400613134 2013-06-13 10:01:00 EST 21FLSEAS_WQX-02SEAS740 STORET-310489836 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN STORET
3 21FLPNS_WQX FL Dept. of Environmental Protection, Northwes... 21FLPNS_WQX-1536988F1 2013-09-17 11:01:00 EST 21FLPNS_WQX-33030019 STORET-308146602 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN STORET
4 AWW_WQX Alabama Water Watch AWW_WQX-aww_0330:20130112134500:SR:WSO 2013-01-12 13:45:00 CST AWW_WQX-aww_0330 STORET-1079461086 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN STORET
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
450314 USGS-AL USGS Alabama Water Science Center nwisal.01.99900500 1999-03-02 14:20:00 CST USGS-02376115 NWIS-104002666 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NWIS
450315 USGS-AL USGS Alabama Water Science Center nwisal.01.00201479 2001-11-28 12:05:00 CST USGS-02377570 NWIS-53918846 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NWIS
450316 USGS-AL USGS Alabama Water Science Center nwisal.01.00202076 2001-10-03 16:40:00 CDT USGS-02376115 NWIS-104000948 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NWIS
450317 USGS-AL USGS Alabama Water Science Center nwisal.01.00202072 2001-11-28 13:45:00 CST USGS-02376115 NWIS-104000936 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NWIS
450318 USGS-AL USGS Alabama Water Science Center nwisal.01.00201474 2001-10-03 14:15:00 CDT USGS-02377570 NWIS-53918826 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NWIS

450319 rows × 78 columns

[23]:
# Map number of usable results at each station
gdf_count = visualize.map_counts(df, stations_clipped)
legend_kwds = {"fmt": "{:.0f}", 'bbox_to_anchor':(1, 0.75)}
gdf_count.plot(column='cnt', cmap='Blues', legend=True, scheme='quantiles', legend_kwds=legend_kwds)
[23]:
<Axes: >
../_images/notebooks_Harmonize_Pensacola_Detailed_30_1.png

Harmonize Characteristic Results

Two options for functions to harmonize characteristics: harmonize_all() or harmonize_generic(). harmonize_all runs functions on all characteristics and lets you specify how to handle errors harmonize_generic runs functions only on the characteristic specified with char_val and lets you also choose output units, to keep intermediate columns and to do a quick report summarizing changes.

[24]:
# See Documentation
#harmonize.harmonize_all?
#harmonize.harmonize?
secchi disk depth
[25]:
# Each harmonize function has optional params, e.g., char_val is the characticName column value to use so we can send the entire df.
# Optional params: units='m', char_val='Depth, Secchi disk depth', out_col='Secchi', report=False)

# We start by demonstrating on secchi disk depth (units default to m, keep intermediate fields, see report)
df = harmonize.harmonize(df, 'Depth, Secchi disk depth', intermediate_columns=True, report=True)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/clean.py:360: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'ResultMeasureValue: "Not Reported" result cannot be used' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[mask & (df_out["QA_flag"].isna()), "QA_flag"] = flag
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[<Quantity(2.0, 'meter')> <Quantity(0.94, 'meter')>
 <Quantity(0.6, 'meter')> ... <Quantity(1.0, 'meter')>
 <Quantity(0.79, 'meter')> <Quantity(1.28, 'meter')>]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
-Usable results-
count    18762.000000
mean         1.153023
std          2.054083
min          0.000000
25%          0.600000
50%          1.000000
75%          1.400000
max        260.000000
dtype: float64
Unusable results: 79
Usable results with inferred units: 0
Results outside threshold (0.0 to 13.477523634821384): 1
../_images/notebooks_Harmonize_Pensacola_Detailed_35_2.png

The threshold is based on standard deviations and is currently only used in the histogram.

[26]:
# Look at a table of just Secchi results and focus on subset of columns
cols = ['MonitoringLocationIdentifier', 'ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Units']
sechi_results = df.loc[df['CharacteristicName']=='Depth, Secchi disk depth', cols + ['Secchi']]
sechi_results
[26]:
MonitoringLocationIdentifier ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Units Secchi
4 AWW_WQX-aww_0330 2 m NaN m 2.0 meter
96 21AWIC-7290 .94 m NaN m 0.94 meter
108 21FLPNS_WQX-33020JF1 0.60 m NaN m 0.6 meter
128 21FLGW_WQX-3565 .3 m NaN m 0.3 meter
139 21FLBFA_WQX-33010016 1.5 m NaN m 1.5 meter
... ... ... ... ... ... ...
448464 21AWIC-1122 1.65 m NaN m 1.65 meter
448472 21AWIC-7290 .9 m NaN m 0.9 meter
448473 21AWIC-1208 1 m NaN m 1.0 meter
448477 21AWIC-1207 .79 m NaN m 0.79 meter
448488 21AWIC-3016 1.28 m NaN m 1.28 meter

18841 rows × 6 columns

[27]:
# Look at unusable(NAN) results
sechi_results.loc[df['Secchi'].isna()]
[27]:
MonitoringLocationIdentifier ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Units Secchi
123035 21FLKWAT_WQX-OKA-CBA-GAP-3-1 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
123049 21FLCBA_WQX-OKA-CB-BASS-1 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
123401 21FLCBA_WQX-OKA-CBA-GAP-3-1 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
123611 21FLCBA_WQX-OKA-CB-BASS-1 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
124762 21FLKWAT_WQX-OKA-CB-BASS-1 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
... ... ... ... ... ... ...
274733 21FLCBA_WQX-OKA-CB-BASS-2 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
274812 21FLCBA_WQX-OKA-CBA-GAP-3-1 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
276406 21FLKWAT_WQX-SAN-SKI WATCH-5 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
277110 21FLCBA_WQX-OKA-CB-BASS-2 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
348933 21FLPNS_WQX-3302M13G *Not Reported m ResultMeasureValue: "*Not Reported" result can... m NaN

79 rows × 6 columns

[28]:
# look at the QA flag for first row from above
list(sechi_results.loc[df['Secchi'].isna()]['QA_flag'])[0]
[28]:
'ResultMeasureValue: "Not Reported" result cannot be used; ResultMeasure/MeasureUnitCode: MISSING UNITS, m assumed'
[29]:
# All cases where there was a QA flag
sechi_results.loc[df['QA_flag'].notna()]
[29]:
MonitoringLocationIdentifier ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Units Secchi
123035 21FLKWAT_WQX-OKA-CBA-GAP-3-1 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
123049 21FLCBA_WQX-OKA-CB-BASS-1 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
123401 21FLCBA_WQX-OKA-CBA-GAP-3-1 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
123611 21FLCBA_WQX-OKA-CB-BASS-1 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
124762 21FLKWAT_WQX-OKA-CB-BASS-1 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
... ... ... ... ... ... ...
274733 21FLCBA_WQX-OKA-CB-BASS-2 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
274812 21FLCBA_WQX-OKA-CBA-GAP-3-1 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
276406 21FLKWAT_WQX-SAN-SKI WATCH-5 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
277110 21FLCBA_WQX-OKA-CB-BASS-2 Not Reported NaN ResultMeasureValue: "Not Reported" result cann... m NaN
348933 21FLPNS_WQX-3302M13G *Not Reported m ResultMeasureValue: "*Not Reported" result can... m NaN

79 rows × 6 columns

If both value and unit are missing nothing can be done, a unitless (NaN) value is assumed as to be in default units but a QA_flag is added

[30]:
# Aggregate Secchi data by station
visualize.station_summary(sechi_results, 'Secchi')
[30]:
MonitoringLocationIdentifier cnt mean
0 11NPSWRD_WQX-GUIS_CMP_PKT01 12 2.333333
1 11NPSWRD_WQX-GUIS_CMP_PKT02 17 2.411765
2 11NPSWRD_WQX-GUIS_CMP_PKT03 3 2.333333
3 21AWIC-1063 124 0.775726
4 21AWIC-1122 64 2.981156
... ... ... ...
1115 NARS_WQX-NCCA10-1432 1 1.075000
1116 NARS_WQX-NCCA10-1433 1 1.423333
1117 NARS_WQX-NCCA10-1434 1 2.400000
1118 NARS_WQX-NCCA10-1488 1 0.736667
1119 NARS_WQX-NCCA10-2432 1 1.600000

1120 rows × 3 columns

[31]:
# Map number of usable results at each station
gdf_count = visualize.map_counts(sechi_results, stations_clipped)
gdf_count.plot(column='cnt', cmap='Blues', legend=True, scheme='quantiles', legend_kwds=legend_kwds)
[31]:
<Axes: >
../_images/notebooks_Harmonize_Pensacola_Detailed_43_1.png
[32]:
# Map average secchi depth results at each station
gdf_avg = visualize.map_measure(sechi_results, stations_clipped, 'Secchi')
gdf_avg.plot(column='mean', cmap='OrRd', legend=True)
[32]:
<Axes: >
../_images/notebooks_Harmonize_Pensacola_Detailed_44_1.png
Temperature

The default error=’raise’, makes it so that there is an error when there is a dimensionality error (i.e. when units can’t be converted). Here we would get the error: DimensionalityError: Cannot convert from ‘count’ (dimensionless) to ‘degree_Celsius’ ([temperature])

[33]:
#'Temperature, water'
# errors=‘ignore’, invalid dimension conversions will return the NaN.
df = harmonize.harmonize(df, 'Temperature, water', intermediate_columns=True, report=True, errors='ignore')
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[<Quantity(26.0555556, 'degree_Celsius')>
 <Quantity(12.35, 'degree_Celsius')> <Quantity(23.0, 'degree_Celsius')>
 ... <Quantity(25.0, 'degree_Celsius')> <Quantity(24.0, 'degree_Celsius')>
 <Quantity(20.5, 'degree_Celsius')>]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
-Usable results-
count    90932.000000
mean        22.019360
std          9.996736
min        -12.944444
25%         17.080000
50%         22.270000
75%         27.200000
max       1876.000000
dtype: float64
Unusable results: 2
Usable results with inferred units: 10
Results outside threshold (0.0 to 81.99977867287922): 9
../_images/notebooks_Harmonize_Pensacola_Detailed_47_2.png
[34]:
# Look at what was changed
cols = ['MonitoringLocationIdentifier', 'ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Temperature', 'Units']
temperature_results = df.loc[df['CharacteristicName']=='Temperature, water', cols]
temperature_results
[34]:
MonitoringLocationIdentifier ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Temperature Units
6 21FLCBA_WQX-BAS02 78.9 deg F NaN 26.0555555555556 degree_Celsius degF
8 21FLPNS_WQX-33020J10 12.35 deg C NaN 12.35 degree_Celsius degC
19 AWW_WQX-aww_0318 23 deg C NaN 23.0 degree_Celsius degC
26 21FLSEAS_WQX-02SEAS810 23 deg C NaN 23.0 degree_Celsius degC
27 21FLPNS_WQX-33010G10 11.23 deg C NaN 11.23 degree_Celsius degC
... ... ... ... ... ... ...
450314 USGS-02376115 23.0 deg C NaN 23.0 degree_Celsius degC
450315 USGS-02377570 20.0 deg C NaN 20.0 degree_Celsius degC
450316 USGS-02376115 25.0 deg C NaN 25.0 degree_Celsius degC
450317 USGS-02376115 24.0 deg C NaN 24.0 degree_Celsius degC
450318 USGS-02377570 20.5 deg C NaN 20.5 degree_Celsius degC

90934 rows × 6 columns

In the above we can see examples where the results were in deg F and in the result field they’ve been converted into degree_Celsius

[35]:
# Examine missing units
temperature_results.loc[df['ResultMeasure/MeasureUnitCode'].isna()]
[35]:
MonitoringLocationIdentifier ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Temperature Units
185466 NARS_WQX-OWW04440-0401 NaN NaN ResultMeasureValue: missing (NaN) result; Resu... NaN degC
251720 21FLCBA-RIV02 74.2 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 74.2 degree_Celsius degC
251725 21FLCBA-RIV02 74.2 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 74.2 degree_Celsius degC
252307 21FLCBA-FWB02 82.1 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 82.1 degree_Celsius degC
252308 21FLCBA-FWB02 82.6 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 82.6 degree_Celsius degC
252309 21FLCBA-FWB02 71.8 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 71.8 degree_Celsius degC
252310 21FLCBA-FWB02 79.4 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 79.4 degree_Celsius degC
253908 21FLCBA-FWB01 83.3 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 83.3 degree_Celsius degC
254733 21FLCBA-FWB05 79.8 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 79.8 degree_Celsius degC
255832 21FLCBA-FWB01 71.2 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 71.2 degree_Celsius degC
255837 21FLCBA-FWB05 81.7 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 81.7 degree_Celsius degC

We can see where the units were missing, the results were assumed to be in degree_Celsius already

[36]:
# This is also noted in the QA_flag field
list(temperature_results.loc[df['ResultMeasure/MeasureUnitCode'].isna(), 'QA_flag'])[0]
[36]:
'ResultMeasureValue: missing (NaN) result; ResultMeasure/MeasureUnitCode: MISSING UNITS, degC assumed'
[37]:
# Look for any without usable results
temperature_results.loc[df['Temperature'].isna()]
[37]:
MonitoringLocationIdentifier ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Temperature Units
57927 11NPSWRD_WQX-GUIS_NALO NaN deg C ResultMeasureValue: missing (NaN) result NaN degC
185466 NARS_WQX-OWW04440-0401 NaN NaN ResultMeasureValue: missing (NaN) result; Resu... NaN degC
[38]:
# Aggregate temperature data by station
visualize.station_summary(temperature_results, 'Temperature')
[38]:
MonitoringLocationIdentifier cnt mean
0 11NPSWRD_WQX-GUIS_ADEM_ALPT 30 24.986667
1 11NPSWRD_WQX-GUIS_BCCA 1 36.800000
2 11NPSWRD_WQX-GUIS_BISA 32 22.696250
3 11NPSWRD_WQX-GUIS_BOPI 1 32.000000
4 11NPSWRD_WQX-GUIS_CMP_PKT01 20 25.125000
... ... ... ...
2480 UWFCEDB_WQX-SRC-AI31-22 15 22.126667
2481 UWFCEDB_WQX-SRC-AI36-22 18 22.133333
2482 UWFCEDB_WQX-SRC-AI42-22 16 22.362500
2483 UWFCEDB_WQX-SRC-AI44-22 18 21.972222
2484 UWFCEDB_WQX-SRC-AK41-22 13 21.107692

2485 rows × 3 columns

[39]:
# Map number of usable results at each station
gdf_count = visualize.map_counts(temperature_results, stations_clipped)
gdf_count.plot(column='cnt', cmap='Blues', legend=True, scheme='quantiles', legend_kwds=legend_kwds)
[39]:
<Axes: >
../_images/notebooks_Harmonize_Pensacola_Detailed_55_1.png
[40]:
# Map average temperature results at each station
gdf_temperature = visualize.map_measure(temperature_results, stations_clipped, 'Temperature')
gdf_temperature.plot(column='mean', cmap='OrRd', legend=True)
[40]:
<Axes: >
../_images/notebooks_Harmonize_Pensacola_Detailed_56_1.png

Dissolved oxygen

[41]:
# look at Dissolved oxygen (DO), but this time without intermediate fields
df = harmonize.harmonize(df, 'Dissolved oxygen (DO)')
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[<Quantity(6.3, 'milligram / liter')> <Quantity(4.5, 'milligram / liter')>
 <Quantity(6.64, 'milligram / liter')> ...
 <Quantity(8.6, 'milligram / liter')> <Quantity(7.8, 'milligram / liter')>
 <Quantity(7.8, 'milligram / liter')>]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)

Note: Imediately when we run a harmonization function without the intermediate fields they’re deleted.

[42]:
# Look at what was changed
cols = ['MonitoringLocationIdentifier', 'ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'DO']
do_res = df.loc[df['CharacteristicName']=='Dissolved oxygen (DO)', cols]
do_res
[42]:
MonitoringLocationIdentifier ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag DO
2 21FLSEAS_WQX-02SEAS740 6.3 mg/L NaN 6.3 milligram / liter
7 21FLCMP_WQX-3201BM21 4.5 mg/L NaN 4.5 milligram / liter
15 21FLPNS_WQX-33030D71 6.64 mg/L NaN 6.64 milligram / liter
21 21FLBFA_WQX-33020057 1.17 mg/L NaN 1.17 milligram / liter
31 21FLNUTT_WQX-PB02 8.11 mg/L NaN 8.11 milligram / liter
... ... ... ... ... ...
450213 21AWIC-1152 8.6 mg/L NaN 8.6 milligram / liter
450215 21AWIC-1152 7.8 mg/L NaN 7.8 milligram / liter
450219 21AWIC-1152 8.6 mg/L NaN 8.6 milligram / liter
450237 21AWIC-1152 7.8 mg/L NaN 7.8 milligram / liter
450244 21AWIC-1152 7.8 mg/L NaN 7.8 milligram / liter

73360 rows × 5 columns

[43]:
do_res.loc[do_res['ResultMeasure/MeasureUnitCode']!='mg/l']
[43]:
MonitoringLocationIdentifier ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag DO
2 21FLSEAS_WQX-02SEAS740 6.3 mg/L NaN 6.3 milligram / liter
7 21FLCMP_WQX-3201BM21 4.5 mg/L NaN 4.5 milligram / liter
15 21FLPNS_WQX-33030D71 6.64 mg/L NaN 6.64 milligram / liter
21 21FLBFA_WQX-33020057 1.17 mg/L NaN 1.17 milligram / liter
31 21FLNUTT_WQX-PB02 8.11 mg/L NaN 8.11 milligram / liter
... ... ... ... ... ...
450213 21AWIC-1152 8.6 mg/L NaN 8.6 milligram / liter
450215 21AWIC-1152 7.8 mg/L NaN 7.8 milligram / liter
450219 21AWIC-1152 8.6 mg/L NaN 8.6 milligram / liter
450237 21AWIC-1152 7.8 mg/L NaN 7.8 milligram / liter
450244 21AWIC-1152 7.8 mg/L NaN 7.8 milligram / liter

50282 rows × 5 columns

Though there were no results in %, the conversion from percent saturation (%) to mg/l is special. This equation is being improved by integrating tempertaure and pressure instead of assuming STP (see DO_saturation())

[44]:
# Aggregate DO data by station
visualize.station_summary(do_res, 'DO')
[44]:
MonitoringLocationIdentifier cnt mean
0 11NPSWRD_WQX-GUIS_ADEM_ALPT 30 6.698000
1 11NPSWRD_WQX-GUIS_BCCA 1 0.270000
2 11NPSWRD_WQX-GUIS_BISA 32 7.194375
3 11NPSWRD_WQX-GUIS_BOPI 1 7.540000
4 11NPSWRD_WQX-GUIS_FPPO 1 9.950000
... ... ... ...
2092 UWFCEDB_WQX-SRC-AI31-22 30 3.711982
2093 UWFCEDB_WQX-SRC-AI36-22 36 3.470604
2094 UWFCEDB_WQX-SRC-AI42-22 32 3.706811
2095 UWFCEDB_WQX-SRC-AI44-22 36 3.702338
2096 UWFCEDB_WQX-SRC-AK41-22 26 2.625130

2097 rows × 3 columns

[45]:
# Map number of usable results at each station
gdf_count = visualize.map_counts(do_res, stations_clipped)
gdf_count.plot(column='cnt', cmap='Blues', legend=True, scheme='quantiles', legend_kwds=legend_kwds)
[45]:
<Axes: >
../_images/notebooks_Harmonize_Pensacola_Detailed_64_1.png
[46]:
# Map Averages at each station
gdf_avg = visualize.map_measure(do_res, stations_clipped, 'DO')
gdf_avg.plot(column='mean', cmap='OrRd', legend=True)
[46]:
<Axes: >
../_images/notebooks_Harmonize_Pensacola_Detailed_65_1.png

pH

[47]:
# pH, this time looking at a report
df = harmonize.harmonize(df, 'pH', report=True)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[<Quantity(7.29, 'dimensionless')> <Quantity(8.09, 'dimensionless')>
 <Quantity(7.45, 'dimensionless')> ... <Quantity(8.27, 'dimensionless')>
 <Quantity(8.47, 'dimensionless')> <Quantity(8.48, 'dimensionless')>]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
-Usable results-
count    58044.000000
mean         7.343410
std          0.901763
min          0.500000
25%          6.870000
50%          7.680000
75%          8.000000
max         16.200000
dtype: float64
Unusable results: 51
Usable results with inferred units: 57023
Results outside threshold (0.0 to 12.753988274241003): 1
../_images/notebooks_Harmonize_Pensacola_Detailed_67_2.png

Note the warnings that occur when a unit is not recognized by the package. These occur even when report=False. Future versions could include these as defined units for pH, but here it wouldn’t alter results.

[48]:
df.loc[df['CharacteristicName']=='pH', ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'pH']]
[48]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag pH
3 7.29 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 7.29 dimensionless
24 8.09 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 8.09 dimensionless
29 7.45 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 7.45 dimensionless
33 6.57 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 6.57 dimensionless
35 6.57 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 6.57 dimensionless
... ... ... ... ...
450283 7.25 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 7.25 dimensionless
450287 7 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 7.0 dimensionless
450289 8.27 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 8.27 dimensionless
450290 8.47 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 8.47 dimensionless
450293 8.48 NaN ResultMeasure/MeasureUnitCode: MISSING UNITS, ... 8.48 dimensionless

58095 rows × 4 columns

‘None’ is uninterpretable and replaced with NaN, which then gets replaced with ‘dimensionless’ since pH is unitless

Salinity

[49]:
# Salinity
df = harmonize.harmonize(df, 'Salinity', report=True)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[<Quantity(1.012, 'Practical_Salinity_Units')>
 <Quantity(18.9, 'Practical_Salinity_Units')>
 <Quantity(25.0, 'Practical_Salinity_Units')> ...
 <Quantity(2.11, 'Practical_Salinity_Units')>
 <Quantity(1.89, 'Practical_Salinity_Units')>
 <Quantity(2.12, 'Practical_Salinity_Units')>]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
-Usable results-
count    77267.000000
mean        15.807838
std        146.977757
min          0.000000
25%          6.000000
50%         16.000000
75%         23.100000
max      37782.000000
dtype: float64
Unusable results: 417
Usable results with inferred units: 10
Results outside threshold (0.0 to 897.6743827867203): 4
../_images/notebooks_Harmonize_Pensacola_Detailed_72_2.png
[50]:
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Salinity']
df.loc[df['CharacteristicName']=='Salinity', cols]
[50]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Salinity
0 1.012 ppt NaN 1.012 Practical_Salinity_Units
11 18.9 ppth NaN 18.9 Practical_Salinity_Units
12 25 ppt NaN 25.0 Practical_Salinity_Units
14 11.82 ppth NaN 11.82 Practical_Salinity_Units
22 .03 ppt NaN 0.03 Practical_Salinity_Units
... ... ... ... ...
450282 2.16 ppth NaN 2.16 Practical_Salinity_Units
450284 2.07 ppth NaN 2.07 Practical_Salinity_Units
450285 2.11 ppth NaN 2.11 Practical_Salinity_Units
450288 1.89 ppth NaN 1.89 Practical_Salinity_Units
450292 2.12 ppth NaN 2.12 Practical_Salinity_Units

77684 rows × 4 columns

Nitrogen

[51]:
# Nitrogen
df = harmonize.harmonize(df, 'Nitrogen', report=True)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/basis.py:343: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'as N' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[mask, basis_col] = basis
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:484: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '['as N' 'as N' 'as N' 'as N' nan 'as N' 'as N' nan nan nan nan 'as N' nan
 nan 'as N' nan 'as N' nan nan nan nan 'as N' 'as N' 'as N' 'as N' 'as N'
 'as N' 'as N' 'as N' 'as N' 'as N' 'as N' 'as N' 'as N' 'as N' 'as N'
 'as N' 'as N' 'as N' nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  self.df[c_mask] = basis.basis_from_method_spec(self.df[c_mask])
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:395: UserWarning: WARNING: 'cm3/g' UNDEFINED UNIT for Nitrogen
  warn("WARNING: " + problem)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[<Quantity(0.3, 'milligram / liter')>
 <Quantity(0.36, 'milligram / liter')>
 <Quantity(0.33875, 'milligram / liter')>
 <Quantity(0.53125, 'milligram / liter')>
 <Quantity(135.0, 'milligram / liter')>
 <Quantity(0.4075, 'milligram / liter')>
 <Quantity(0.35375, 'milligram / liter')>
 <Quantity(27.5, 'milligram / liter')>
 <Quantity(82.4, 'milligram / liter')>
 <Quantity(51.9, 'milligram / liter')>
 <Quantity(11.8, 'milligram / liter')>
 <Quantity(0.495, 'milligram / liter')>
 <Quantity(131.0, 'milligram / liter')>
 <Quantity(1630.0, 'milligram / liter')>
 <Quantity(0.4475, 'milligram / liter')>
 <Quantity(23.5, 'milligram / liter')>
 <Quantity(0.36125, 'milligram / liter')>
 <Quantity(49.8, 'milligram / liter')>
 <Quantity(83.6, 'milligram / liter')>
 <Quantity(197.0, 'milligram / liter')>
 <Quantity(314.0, 'milligram / liter')>
 <Quantity(1.5, 'milligram / liter')>
 <Quantity(0.44, 'milligram / liter')>
 <Quantity(0.68, 'milligram / liter')>
 <Quantity(0.93, 'milligram / liter')>
 <Quantity(0.26, 'milligram / liter')>
 <Quantity(0.68, 'milligram / liter')>
 <Quantity(0.26, 'milligram / liter')>
 <Quantity(0.64, 'milligram / liter')>
 <Quantity(1.1, 'milligram / liter')>
 <Quantity(0.31, 'milligram / liter')>
 <Quantity(1.0, 'milligram / liter')>
 <Quantity(0.38, 'milligram / liter')>
 <Quantity(1.7, 'milligram / liter')>
 <Quantity(0.65, 'milligram / liter')>
 <Quantity(0.636, 'milligram / liter')>
 <Quantity(0.27, 'milligram / liter')>
 <Quantity(0.86, 'milligram / liter')>
 <Quantity(1.5, 'milligram / liter')>
 <Quantity(0.87, 'milligram / liter')>
 <Quantity(0.76, 'milligram / liter')>
 <Quantity(1.12, 'milligram / liter')>
 <Quantity(0.33, 'milligram / liter')>
 <Quantity(1.3, 'milligram / liter')>
 <Quantity(0.222, 'milligram / liter')>
 <Quantity(0.37, 'milligram / liter')>
 <Quantity(0.31724, 'milligram / liter')>
 <Quantity(0.45668, 'milligram / liter')>
 <Quantity(0.909, 'milligram / liter')>
 <Quantity(0.67, 'milligram / liter')>
 <Quantity(0.67, 'milligram / liter')>
 <Quantity(1.13, 'milligram / liter')>
 <Quantity(0.45906, 'milligram / liter')>
 <Quantity(1.376, 'milligram / liter')>
 <Quantity(0.3675, 'milligram / liter')>
 <Quantity(1.2, 'milligram / liter')>
 <Quantity(0.30226, 'milligram / liter')>
 <Quantity(0.4263, 'milligram / liter')>
 <Quantity(0.32, 'milligram / liter')>
 <Quantity(0.531, 'milligram / liter')>
 <Quantity(0.68, 'milligram / liter')>
 <Quantity(0.61, 'milligram / liter')>
 <Quantity(0.16, 'milligram / liter')>
 <Quantity(0.55, 'milligram / liter')>
 <Quantity(0.652, 'milligram / liter')>
 <Quantity(0.629, 'milligram / liter')>
 <Quantity(0.622, 'milligram / liter')>
 <Quantity(0.62, 'milligram / liter')>
 <Quantity(0.69, 'milligram / liter')>
 <Quantity(0.62, 'milligram / liter')>
 <Quantity(0.6, 'milligram / liter')>
 <Quantity(0.57, 'milligram / liter')>
 <Quantity(0.48986, 'milligram / liter')>
 <Quantity(0.60326, 'milligram / liter')>
 <Quantity(0.60368, 'milligram / liter')>
 <Quantity(0.6, 'milligram / liter')>
 <Quantity(0.77, 'milligram / liter')>
 <Quantity(0.81, 'milligram / liter')>
 <Quantity(0.57, 'milligram / liter')>
 <Quantity(0.84, 'milligram / liter')>
 <Quantity(0.86, 'milligram / liter')>
 <Quantity(0.34846, 'milligram / liter')>
 <Quantity(0.67, 'milligram / liter')>
 <Quantity(0.96, 'milligram / liter')>
 <Quantity(0.47642, 'milligram / liter')>
 <Quantity(0.6, 'milligram / liter')>
 <Quantity(0.48678, 'milligram / liter')>
 <Quantity(0.5, 'milligram / liter')>
 <Quantity(0.72, 'milligram / liter')>
 <Quantity(0.41, 'milligram / liter')>
 <Quantity(1.1, 'milligram / liter')>
 <Quantity(0.65548, 'milligram / liter')>
 <Quantity(0.3031, 'milligram / liter')>
 <Quantity(0.52738, 'milligram / liter')>
 <Quantity(0.27552, 'milligram / liter')>
 <Quantity(0.28634, 'milligram / liter')>
 <Quantity(0.5697, 'milligram / liter')>
 <Quantity(0.0007, 'milligram / liter')>
 <Quantity(0.0146, 'milligram / liter')>
 <Quantity(0.0008, 'milligram / liter')>
 <Quantity(0.0158, 'milligram / liter')>
 <Quantity(16.46, 'milligram / liter')>
 <Quantity(18.82, 'milligram / liter')>
 <Quantity(17.76, 'milligram / liter')>
 <Quantity(18.69, 'milligram / liter')>
 <Quantity(16.18, 'milligram / liter')>
 <Quantity(18.99, 'milligram / liter')>
 <Quantity(18.72, 'milligram / liter')>
 <Quantity(17.61, 'milligram / liter')>]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/domains.py:277: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  sub_df[cols[2]] = sub_df[cols[2]].fillna(sub_df[cols[1]])  # new_fract
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/domains.py:277: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  sub_df[cols[2]] = sub_df[cols[2]].fillna(sub_df[cols[1]])  # new_fract
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/domains.py:277: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  sub_df[cols[2]] = sub_df[cols[2]].fillna(sub_df[cols[1]])  # new_fract
-Usable results-
count     109.000000
mean       26.920174
std       160.257726
min         0.000700
25%         0.410000
50%         0.629000
75%         1.120000
max      1630.000000
dtype: float64
Unusable results: 4
Usable results with inferred units: 0
Results outside threshold (0.0 to 988.4665321860789): 1
../_images/notebooks_Harmonize_Pensacola_Detailed_75_2.png
[52]:
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Nitrogen']
df.loc[df['CharacteristicName']=='Nitrogen', cols]
[52]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Nitrogen
56133 0.3 mg/L NaN 0.3 milligram / liter
56419 0.36 mg/L NaN 0.36 milligram / liter
56527 0.33875 mg/L NaN 0.33875 milligram / liter
56620 0.53125 mg/L NaN 0.53125 milligram / liter
57278 135 mg/kg NaN 135.00000000000003 milligram / liter
... ... ... ... ...
449577 18.69 mg/l NaN 18.69 milligram / liter
449584 16.18 mg/l NaN 16.18 milligram / liter
449585 18.99 mg/l NaN 18.99 milligram / liter
449588 18.72 mg/l NaN 18.72 milligram / liter
449590 17.61 mg/l NaN 17.61 milligram / liter

113 rows × 4 columns

Conductivity

[53]:
# Conductivity
df = harmonize.harmonize(df, 'Conductivity', report=True)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[<Quantity(19204.2, 'microsiemens / centimeter')>
 <Quantity(222.3, 'microsiemens / centimeter')>
 <Quantity(102.8, 'microsiemens / centimeter')> ...
 <Quantity(130.0, 'microsiemens / centimeter')>
 <Quantity(1000.0, 'microsiemens / centimeter')>
 <Quantity(650.0, 'microsiemens / centimeter')>]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
-Usable results-
count     1818.000000
mean     17085.221414
std      16116.889030
min          0.040000
25%        130.000000
50%      16994.750000
75%      30306.650000
max      54886.200000
dtype: float64
Unusable results: 8
Usable results with inferred units: 0
Results outside threshold (0.0 to 113786.55559242623): 0
../_images/notebooks_Harmonize_Pensacola_Detailed_78_2.png
[54]:
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Conductivity']
df.loc[df['CharacteristicName']=='Conductivity', cols]
[54]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Conductivity
16 19204.2 umho/cm NaN 19204.2 microsiemens / centimeter
105 222.3 umho/cm NaN 222.3 microsiemens / centimeter
215 102.8 umho/cm NaN 102.8 microsiemens / centimeter
421 11017.5 umho/cm NaN 11017.5 microsiemens / centimeter
867 32 umho/cm NaN 32.0 microsiemens / centimeter
... ... ... ... ...
450148 110 umho/cm NaN 110.0 microsiemens / centimeter
450163 1200 umho/cm NaN 1200.0 microsiemens / centimeter
450167 130 umho/cm NaN 130.0 microsiemens / centimeter
450169 1000 umho/cm NaN 1000.0 microsiemens / centimeter
450175 650 umho/cm NaN 650.0 microsiemens / centimeter

1826 rows × 4 columns

Chlorophyll a

[55]:
# Chlorophyll a
df = harmonize.harmonize(df, 'Chlorophyll a', report=True)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:395: UserWarning: WARNING: 'ug/cm2' UNDEFINED UNIT for Chlorophyll
  warn("WARNING: " + problem)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[<Quantity(0.0023, 'milligram / liter')>
 <Quantity(0.0029, 'milligram / liter')>
 <Quantity(0.0041, 'milligram / liter')> ...
 <Quantity(0.02, 'milligram / liter')>
 <Quantity(0.011, 'milligram / liter')>
 <Quantity(0.0013, 'milligram / liter')>]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
-Usable results-
count    9408.000000
mean        1.152038
std         1.199480
min        -0.840000
25%         0.007699
50%         0.950000
75%         1.820000
max         9.990000
dtype: float64
Unusable results: 628
Usable results with inferred units: 6175
Results outside threshold (0.0 to 8.34891706714834): 8
../_images/notebooks_Harmonize_Pensacola_Detailed_81_2.png
[56]:
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Chlorophyll']
df.loc[df['CharacteristicName']=='Chlorophyll a', cols]
[56]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Chlorophyll
274 NaN NaN ResultMeasureValue: missing (NaN) result; Resu... NaN
608 NaN NaN ResultMeasureValue: missing (NaN) result; Resu... NaN
720 2.3 mg/m3 NaN 0.0023000000000000004 milligram / liter
1329 NaN NaN ResultMeasureValue: missing (NaN) result; Resu... NaN
1490 2.9 mg/m3 NaN 0.0029000000000000007 milligram / liter
... ... ... ... ...
449731 NaN NaN ResultMeasureValue: missing (NaN) result; Resu... NaN
449746 NaN NaN ResultMeasureValue: missing (NaN) result; Resu... NaN
449756 NaN NaN ResultMeasureValue: missing (NaN) result; Resu... NaN
450212 NaN NaN ResultMeasureValue: missing (NaN) result; Resu... NaN
450225 1.3 mg/m3 NaN 0.0013000000000000004 milligram / liter

10036 rows × 4 columns

Organic Carbon

[57]:
# Organic carbon (%)
df = harmonize.harmonize(df, 'Organic carbon', report=True)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[<Quantity(5.4, 'milligram / liter')> <Quantity(2.6, 'milligram / liter')>
 <Quantity(3.9, 'milligram / liter')> ...
 <Quantity(3.264, 'milligram / liter')>
 <Quantity(6.206, 'milligram / liter')>
 <Quantity(8.609, 'milligram / liter')>]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
-Usable results-
count      4869.000000
mean       1128.337669
std       11539.141214
min           0.000000
25%           2.700000
50%           4.300000
75%           8.200000
max      410000.000000
dtype: float64
Unusable results: 165
Usable results with inferred units: 0
Results outside threshold (0.0 to 70363.18495043386): 22
../_images/notebooks_Harmonize_Pensacola_Detailed_84_2.png
[58]:
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Carbon']
df.loc[df['CharacteristicName']=='Organic carbon', cols]
[58]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Carbon
136 5.4 mg/L NaN 5.4 milligram / liter
138 2.6 mg/L NaN 2.6 milligram / liter
175 3.9 mg/L NaN 3.9 milligram / liter
233 5.2 mg/L NaN 5.2 milligram / liter
292 6.0 mg/L NaN 6.0 milligram / liter
... ... ... ... ...
449730 3.393 mg/L NaN 3.393 milligram / liter
449737 4.769 mg/L NaN 4.769 milligram / liter
449773 3.264 mg/L NaN 3.264 milligram / liter
450234 6.206 mg/L NaN 6.206 milligram / liter
450241 8.609 mg/L NaN 8.609 milligram / liter

5034 rows × 4 columns

Turbidity

[59]:
# Turbidity (NTU)
df = harmonize.harmonize(df, 'Turbidity', report=True)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[<Quantity(380.4523, 'Nephelometric_Turbidity_Units')>
 <Quantity(0.0, 'Nephelometric_Turbidity_Units')>
 <Quantity(190.2023, 'Nephelometric_Turbidity_Units')> ...
 <Quantity(2.4, 'Nephelometric_Turbidity_Units')>
 <Quantity(11.7, 'Nephelometric_Turbidity_Units')>
 <Quantity(7.1, 'Nephelometric_Turbidity_Units')>]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
-Usable results-
count    46901.000000
mean        30.847085
std        207.701610
min         -0.840000
25%          1.600000
50%          3.000000
75%          7.800000
max      32342.452300
dtype: float64
Unusable results: 596
Usable results with inferred units: 10
Results outside threshold (0.0 to 1277.0567463601408): 65
../_images/notebooks_Harmonize_Pensacola_Detailed_87_2.png
[60]:
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Turbidity']
df.loc[df['CharacteristicName']=='Turbidity', cols]
[60]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Turbidity
20 20 JTU NaN 380.4523 Nephelometric_Turbidity_Units
32 0 NTU NaN 0.0 Nephelometric_Turbidity_Units
44 10 JTU NaN 190.2023 Nephelometric_Turbidity_Units
54 1.4 NTU NaN 1.4 Nephelometric_Turbidity_Units
60 4.7 NTU NaN 4.7 Nephelometric_Turbidity_Units
... ... ... ... ...
450154 1 NTU NaN 1.0 Nephelometric_Turbidity_Units
450160 2.5 NTU NaN 2.5 Nephelometric_Turbidity_Units
450177 2.4 NTU NaN 2.4 Nephelometric_Turbidity_Units
450229 11.7 NTU NaN 11.7 Nephelometric_Turbidity_Units
450243 7.1 NTU NaN 7.1 Nephelometric_Turbidity_Units

47497 rows × 4 columns

Sediment

[61]:
# Sediment
df = harmonize.harmonize(df, 'Sediment', report=False)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
[62]:
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Sediment']
df.loc[df['CharacteristicName']=='Sediment', cols]
[62]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Sediment

Phosphorus

Note: must be merged w/ activities (package runs query by site if not already merged)

[63]:
# Phosphorus
df = harmonize.harmonize(df, 'Phosphorus')
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[<Quantity(0.061, 'milligram / liter')>
 <Quantity(0.03, 'milligram / liter')>
 <Quantity(0.13, 'milligram / liter')> ...
 <Quantity(0.42, 'milligram / liter')>
 <Quantity(0.18, 'milligram / liter')>
 <Quantity(0.041, 'milligram / liter')>]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
2 Phosphorus sample fractions not in frac_dict
2 Phosphorus sample fractions not in frac_dict found in expected domains, mapped to "Other_Phosphorus"

Note: warnings for unexpected characteristic fractions. Fractions are each seperated out into their own result column.

[64]:
# All Phosphorus
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'TDP_Phosphorus']
df.loc[df['Phosphorus'].notna(), cols]
[64]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag TDP_Phosphorus
43 .061 mg/L NaN NaN
76 0.03 mg/L NaN NaN
170 .13 mg/L NaN NaN
199 0.003 mg/L NaN NaN
350 0.002 mg/L NaN NaN
... ... ... ... ...
450165 .2 mg/L NaN NaN
450166 .3 mg/L NaN NaN
450168 .42 mg/L NaN NaN
450173 .18 mg/L NaN NaN
450222 .041 mg/L NaN NaN

6989 rows × 4 columns

[65]:
# Total phosphorus
df.loc[df['TP_Phosphorus'].notna(), cols]
[65]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag TDP_Phosphorus
43 .061 mg/L NaN NaN
76 0.03 mg/L NaN NaN
170 .13 mg/L NaN NaN
199 0.003 mg/L NaN NaN
350 0.002 mg/L NaN NaN
... ... ... ... ...
449527 0.08 mg/l as P NaN NaN
449534 0.05 mg/l as P NaN NaN
449614 0.110 mg/l as P NaN NaN
449733 .053 mg/L NaN NaN
450222 .041 mg/L NaN NaN

6202 rows × 4 columns

[66]:
# Total dissolved phosphorus
df.loc[df['TDP_Phosphorus'].notna(), cols]
[66]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag TDP_Phosphorus
4119 0.019 mg/L NaN 0.019 milligram / liter
9205 0.002 mg/L NaN 0.002 milligram / liter
15465 0.003 mg/L NaN 0.003 milligram / liter
18739 0.019 mg/L NaN 0.019 milligram / liter
49989 0.002 mg/L NaN 0.002 milligram / liter
52054 0.017 mg/L NaN 0.017 milligram / liter
66828 0.021 mg/L NaN 0.021 milligram / liter
69856 0.003 mg/L NaN 0.003 milligram / liter
77218 0.020 mg/L NaN 0.02 milligram / liter
83521 0.002 mg/L NaN 0.002 milligram / liter
185036 0.00806 mg/L NaN 0.00806 milligram / liter
189003 0.000031 mg/L NaN 3.1e-05 milligram / liter
190404 0.002542 mg/L NaN 0.002542 milligram / liter
191085 0.00341 mg/L NaN 0.00341 milligram / liter
234593 0.00372 mg/L NaN 0.00372 milligram / liter
236981 0.00961 mg/L NaN 0.00961 milligram / liter
238039 0.00124 mg/L NaN 0.00124 milligram / liter
239031 0.01271 mg/L NaN 0.01271 milligram / liter
448876 0.030 mg/l as P NaN 0.03 milligram / liter
448885 0.033 mg/l as P NaN 0.033 milligram / liter
448888 0.024 mg/l as P NaN 0.024 milligram / liter
448894 0.028 mg/l as P NaN 0.028 milligram / liter
448903 0.021 mg/l as P NaN 0.021 milligram / liter
448908 0.023 mg/l as P NaN 0.023 milligram / liter
448920 0.037 mg/l as P NaN 0.037 milligram / liter
448981 0.023 mg/l as P NaN 0.023 milligram / liter
448989 0.02 mg/l as P NaN 0.02 milligram / liter
449005 0.04 mg/l as P NaN 0.04 milligram / liter
449020 0.03 mg/l as P NaN 0.03 milligram / liter
449029 0.025 mg/l as P NaN 0.025 milligram / liter
449056 0.05 mg/l as P NaN 0.05 milligram / liter
449065 0.15 mg/l as P NaN 0.15 milligram / liter
449085 0.03 mg/l as P NaN 0.03 milligram / liter
449191 0.02 mg/l as P NaN 0.02 milligram / liter
449213 0.07 mg/l as P NaN 0.07 milligram / liter
449221 0.08 mg/l as P NaN 0.08 milligram / liter
449236 0.02 mg/l as P NaN 0.02 milligram / liter
449256 0.02 mg/l as P NaN 0.02 milligram / liter
449268 0.04 mg/l as P NaN 0.04 milligram / liter
449288 0.02 mg/l as P NaN 0.02 milligram / liter
449301 0.05 mg/l as P NaN 0.05 milligram / liter
449528 0.03 mg/l as P NaN 0.03 milligram / liter
449535 0.05 mg/l as P NaN 0.05 milligram / liter
[67]:
# All other phosphorus sample fractions
df.loc[df['Other_Phosphorus'].notna(), cols]
[67]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag TDP_Phosphorus
27378 .5 mg/L NaN NaN
27463 .036 mg/L NaN NaN
28951 .089 mg/L NaN NaN
29993 .017 mg/L NaN NaN
31237 .035 mg/L NaN NaN
... ... ... ... ...
450147 .32 mg/L NaN NaN
450165 .2 mg/L NaN NaN
450166 .3 mg/L NaN NaN
450168 .42 mg/L NaN NaN
450173 .18 mg/L NaN NaN

744 rows × 4 columns

Bacteria

Some equivalence assumptions are built-in where bacteria counts that are not equivalent are treated as such because there is no standard way to convert from one to another.

Fecal Coliform

[68]:
# Known unit with bad dimensionality ('Colony_Forming_Units * milliliter')
df = harmonize.harmonize(df, 'Fecal Coliform', report=True, errors='ignore')
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/convert.py:128: UserWarning: WARNING: 'MPN/100mL' converted to NaN
  warn(f"WARNING: '{unit}' converted to NaN")
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/convert.py:128: UserWarning: WARNING: 'CFU/100mL' converted to NaN
  warn(f"WARNING: '{unit}' converted to NaN")
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/convert.py:128: UserWarning: WARNING: 'cfu/100mL' converted to NaN
  warn(f"WARNING: '{unit}' converted to NaN")
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[nan nan nan ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
-Usable results-
count    10035.000000
mean        45.537618
std        448.839329
min          0.000000
25%          4.000000
50%          8.000000
75%         33.000000
max      33000.000000
dtype: float64
Unusable results: 40571
Usable results with inferred units: 0
Results outside threshold (0.0 to 2738.5735941387825): 6
../_images/notebooks_Harmonize_Pensacola_Detailed_103_2.png
[69]:
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Fecal_Coliform']
df.loc[df['CharacteristicName']=='Fecal Coliform', cols]
[69]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Fecal_Coliform
1 *Non-detect NaN ResultMeasureValue: "*Non-detect" result canno... NaN
5 80 cfu/100mL NaN NaN
9 *Non-detect NaN ResultMeasureValue: "*Non-detect" result canno... NaN
10 2 MPN/100mL NaN NaN
13 *Non-detect NaN ResultMeasureValue: "*Non-detect" result canno... NaN
... ... ... ... ...
450083 194 cfu/100mL NaN NaN
450089 226 cfu/100mL NaN NaN
450102 145 cfu/100mL NaN NaN
450128 317 cfu/100mL NaN NaN
450150 60 cfu/100mL NaN NaN

50606 rows × 4 columns

Escherichia coli

[70]:
# Known unit with bad dimensionality ('Colony_Forming_Units * milliliter')
df = harmonize.harmonize(df, 'Escherichia coli', report=True, errors='ignore')
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:158: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version.
  for bad_meas in pandas.unique(bad_measures):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/convert.py:128: UserWarning: WARNING: 'MPN/100mL' converted to NaN
  warn(f"WARNING: '{unit}' converted to NaN")
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/convert.py:128: UserWarning: WARNING: 'CFU/100mL' converted to NaN
  warn(f"WARNING: '{unit}' converted to NaN")
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/convert.py:128: UserWarning: WARNING: 'cfu/100mL' converted to NaN
  warn(f"WARNING: '{unit}' converted to NaN")
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wq_data.py:663: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '[nan nan nan ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  df_out.loc[m_mask, self.out_col] = convert_unit_series(**params)
-Usable results-
count      22.000000
mean      501.863636
std       610.053260
min         4.000000
25%         9.500000
50%        77.500000
75%      1000.000000
max      1700.000000
dtype: float64
Unusable results: 8967
Usable results with inferred units: 0
Results outside threshold (0.0 to 4162.183198738116): 0
../_images/notebooks_Harmonize_Pensacola_Detailed_106_2.png
[71]:
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'E_coli']
df.loc[df['CharacteristicName']=='Escherichia coli', cols]
[71]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag E_coli
25 0 cfu/100mL NaN NaN
73 1000 cfu/100mL NaN NaN
79 33.3333333333333 cfu/100mL NaN NaN
94 0 cfu/100mL NaN NaN
155 0 cfu/100mL NaN NaN
... ... ... ... ...
449750 100 MPN/100mL NaN NaN
449753 580 MPN/100mL NaN NaN
449775 82 MPN/100mL NaN NaN
450217 15 MPN/100mL NaN NaN
450235 82 MPN/100mL NaN NaN

8989 rows × 4 columns

Combining Salinity and Conductivity

Convert module has various functions to convert from one unit or characteristic to another. Some of these are used within a single characteristic during harmonization (e.g. DO saturation to concentration) while others are intended to model one characteristic as an indicator of another (e.g. estimate salinity from conductivity).

Note: this should only be done after both characteristic fields have been harmonized. Results before and after should be inspected, thresholds for outliers applied, and consider adding a QA_flag for modeled data.

Explore Salinity results:

[72]:
from harmonize_wq import convert
[73]:
# Salinity summary statistics
lst = [x.magnitude for x in list(df['Salinity'].dropna())]
q_sum = sum(lst)
print('Range: {} to {}'.format(min(lst), max(lst)))
print('Results: {} \nMean: {} PSU'.format(len(lst), q_sum/len(lst)))
Range: 0.0 to 37782.0
Results: 77267
Mean: 15.807838285426346 PSU
[74]:
# Identify extreme outliers
[x for x in lst if x >3200]
[74]:
[15030.0, 37782.0]

Other fields like units and QA_flag may help understand what caused high values and what results might need to be dropped from consideration

[75]:
# Columns to focus on
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Salinity']
[76]:
# Look at important fields for max 5 values
salinity_series = df['Salinity'][df['Salinity'].notna()]
salinity_series.sort_values(ascending=False, inplace=True)
df[cols][df['Salinity'].isin(salinity_series[0:5])]
[76]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Salinity
13952 15030 ppt NaN 15030.0 Practical_Salinity_Units
23980 322 ppth NaN 322.0 Practical_Salinity_Units
55600 2150 ppth NaN 2150.0 Practical_Salinity_Units
117644 37782 ppth NaN 37782.0 Practical_Salinity_Units
151116 2190 ppt NaN 2190.0 Practical_Salinity_Units

Detection limits may help understand what caused low values and what results might need to be dropped or updated

[77]:
from harmonize_wq import wrangle
[78]:
df = wrangle.add_detection(df, 'Salinity')
cols+=['ResultDetectionConditionText',
       'DetectionQuantitationLimitTypeName',
       'DetectionQuantitationLimitMeasure/MeasureValue',
       'DetectionQuantitationLimitMeasure/MeasureUnitCode']
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/harmonize_wq/wrangle.py:501: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  detection_df = pandas.concat(detection_list).drop_duplicates()
[79]:
# Look at important fields for min 5 values (often multiple 0.0)
df[cols][df['Salinity'].isin(salinity_series[-5:])]
[79]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Salinity ResultDetectionConditionText DetectionQuantitationLimitTypeName DetectionQuantitationLimitMeasure/MeasureValue DetectionQuantitationLimitMeasure/MeasureUnitCode
1303 0 ppt NaN 0.0 Practical_Salinity_Units NaN NaN NaN NaN
2416 0.00 ppth NaN 0.0 Practical_Salinity_Units NaN NaN NaN NaN
3902 0 ppt NaN 0.0 Practical_Salinity_Units NaN NaN NaN NaN
4257 0.00 ppth NaN 0.0 Practical_Salinity_Units NaN NaN NaN NaN
4475 0 ppt NaN 0.0 Practical_Salinity_Units NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
447419 0 PSS NaN 0.0 Practical_Salinity_Units NaN NaN NaN NaN
447430 0 PSS NaN 0.0 Practical_Salinity_Units NaN NaN NaN NaN
447568 0 PSS NaN 0.0 Practical_Salinity_Units NaN NaN NaN NaN
447601 0 PSS NaN 0.0 Practical_Salinity_Units NaN NaN NaN NaN
450224 0 ppt NaN 0.0 Practical_Salinity_Units NaN NaN NaN NaN

3050 rows × 8 columns

Explore Conductivity results:

[80]:
# Create series and inspect Conductivity values
cond_series = df['Conductivity'].dropna()
cond_series
[80]:
16        19204.2 microsiemens / centimeter
105         222.3 microsiemens / centimeter
215         102.8 microsiemens / centimeter
421       11017.5 microsiemens / centimeter
867          32.0 microsiemens / centimeter
                        ...
450148      110.0 microsiemens / centimeter
450163     1200.0 microsiemens / centimeter
450167      130.0 microsiemens / centimeter
450169     1000.0 microsiemens / centimeter
450175      650.0 microsiemens / centimeter
Name: Conductivity, Length: 1818, dtype: object

Conductivity thresholds from Freshwater Explorer: 10 > x < 5000 us/cm, use a higher threshold for coastal waters

[81]:
# Sort and check other relevant columns before converting (e.g. Salinity)
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Salinity', 'Conductivity']
df.sort_values(by=['Conductivity'], ascending=False, inplace=True)
df.loc[df['Conductivity'].notna(), cols]
[81]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Salinity Conductivity
136312 54886.2 umho/cm NaN NaN 54886.2 microsiemens / centimeter
139958 54871.3 umho/cm NaN NaN 54871.3 microsiemens / centimeter
132608 54860.6 umho/cm NaN NaN 54860.6 microsiemens / centimeter
138586 54859.3 umho/cm NaN NaN 54859.3 microsiemens / centimeter
131921 54850.8 umho/cm NaN NaN 54850.8 microsiemens / centimeter
... ... ... ... ... ...
92115 6.8 umho/cm NaN NaN 6.8 microsiemens / centimeter
65711 2 umho/cm NaN NaN 2.0 microsiemens / centimeter
151249 2 umho/cm NaN NaN 2.0 microsiemens / centimeter
40338 1 umho/cm NaN NaN 1.0 microsiemens / centimeter
152604 .04 umho/cm NaN NaN 0.04 microsiemens / centimeter

1818 rows × 5 columns

[82]:
# Check other relevant columns before converting (e.g. Salinity)
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'QA_flag', 'Salinity', 'Conductivity']
df.loc[df['Conductivity'].notna(), cols]
[82]:
ResultMeasureValue ResultMeasure/MeasureUnitCode QA_flag Salinity Conductivity
136312 54886.2 umho/cm NaN NaN 54886.2 microsiemens / centimeter
139958 54871.3 umho/cm NaN NaN 54871.3 microsiemens / centimeter
132608 54860.6 umho/cm NaN NaN 54860.6 microsiemens / centimeter
138586 54859.3 umho/cm NaN NaN 54859.3 microsiemens / centimeter
131921 54850.8 umho/cm NaN NaN 54850.8 microsiemens / centimeter
... ... ... ... ... ...
92115 6.8 umho/cm NaN NaN 6.8 microsiemens / centimeter
65711 2 umho/cm NaN NaN 2.0 microsiemens / centimeter
151249 2 umho/cm NaN NaN 2.0 microsiemens / centimeter
40338 1 umho/cm NaN NaN 1.0 microsiemens / centimeter
152604 .04 umho/cm NaN NaN 0.04 microsiemens / centimeter

1818 rows × 5 columns

[83]:
# Convert values to PSU and write to Salinity
cond_series = cond_series.apply(str)  # Convert to string to convert to dimensionless (PSU)
df.loc[df['Conductivity'].notna(), 'Salinity'] = cond_series.apply(convert.conductivity_to_PSU)
df.loc[df['Conductivity'].notna(), 'Salinity']
[83]:
136312    36.356 dimensionless
139958    36.345 dimensionless
132608    36.338 dimensionless
138586    36.336 dimensionless
131921     36.33 dimensionless
                  ...
92115      0.013 dimensionless
65711      0.012 dimensionless
151249     0.012 dimensionless
40338      0.012 dimensionless
152604     0.012 dimensionless
Name: Salinity, Length: 1818, dtype: object

Datetime

datetime() formats time using dataretrieval and ActivityStart

[84]:
# First inspect the existing unformated fields
cols = ['ActivityStartDate', 'ActivityStartTime/Time', 'ActivityStartTime/TimeZoneCode']
df[cols]
[84]:
ActivityStartDate ActivityStartTime/Time ActivityStartTime/TimeZoneCode
136312 2007-08-09 12:15:00 CST
139958 2007-08-09 12:15:00 CST
132608 2007-08-09 12:15:00 CST
138586 2007-08-09 12:15:00 CST
131921 2007-08-09 12:15:00 CST
... ... ... ...
450314 1999-03-02 14:20:00 CST
450315 2001-11-28 12:05:00 CST
450316 2001-10-03 16:40:00 CDT
450317 2001-11-28 13:45:00 CST
450318 2001-10-03 14:15:00 CDT

450319 rows × 3 columns

[85]:
# 'ActivityStartDate' presserves date where 'Activity_datetime' is NAT due to no time zone
df = clean.datetime(df)
df[['ActivityStartDate', 'Activity_datetime']]
[85]:
ActivityStartDate Activity_datetime
136312 2007-08-09 2007-08-09 18:15:00+00:00
139958 2007-08-09 2007-08-09 18:15:00+00:00
132608 2007-08-09 2007-08-09 18:15:00+00:00
138586 2007-08-09 2007-08-09 18:15:00+00:00
131921 2007-08-09 2007-08-09 18:15:00+00:00
... ... ...
450314 1999-03-02 1999-03-02 20:20:00+00:00
450315 2001-11-28 2001-11-28 18:05:00+00:00
450316 2001-10-03 2001-10-03 21:40:00+00:00
450317 2001-11-28 2001-11-28 19:45:00+00:00
450318 2001-10-03 2001-10-03 19:15:00+00:00

450319 rows × 2 columns

Activity_datetime combines all three time component columns into UTC. If time is missing this is NaT so a ActivityStartDate column is used to preserve date only.

Depth

Note: Data are often lacking sample depth metadata

[86]:
# Depth of sample (default units='meter')
df = clean.harmonize_depth(df)
#df.loc[df['ResultDepthHeightMeasure/MeasureValue'].dropna(), "Depth"]
df['ResultDepthHeightMeasure/MeasureValue'].dropna()
[86]:
1752       7.0
4102       7.0
66739      0.1
66823      2.2
66959      2.0
          ...
76868      2.2
93979      1.0
94553     16.0
95021     16.0
150617    35.0
Name: ResultDepthHeightMeasure/MeasureValue, Length: 179, dtype: float64

Characteristic to Column (long to wide format)

[87]:
# Split single QA column into multiple by characteristic (rename the result to preserve these QA_flags)
df2 = wrangle.split_col(df)
df2
[87]:
OrganizationIdentifier OrganizationFormalName ActivityIdentifier ActivityStartDate ActivityStartTime/Time ActivityStartTime/TimeZoneCode MonitoringLocationIdentifier ResultIdentifier DataLoggerLine ResultDetectionConditionText ... QA_E_coli QA_Conductivity QA_Temperature QA_Carbon QA_Salinity QA_TP_Phosphorus QA_TDP_Phosphorus QA_Other_Phosphorus QA_Nitrogen QA_Secchi
136312 21AWIC ALABAMA DEPT. OF ENVIRONMENTAL MANAGEMENT - WA... 21AWIC-51908_230231_173 2007-08-09 12:15:00 CST 21AWIC-1122 STORET-170383613 230231.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
139958 21AWIC ALABAMA DEPT. OF ENVIRONMENTAL MANAGEMENT - WA... 21AWIC-51908_230230_173 2007-08-09 12:15:00 CST 21AWIC-1122 STORET-170383607 230230.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
132608 21AWIC ALABAMA DEPT. OF ENVIRONMENTAL MANAGEMENT - WA... 21AWIC-51908_230228_173 2007-08-09 12:15:00 CST 21AWIC-1122 STORET-170383595 230228.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
138586 21AWIC ALABAMA DEPT. OF ENVIRONMENTAL MANAGEMENT - WA... 21AWIC-51908_230229_173 2007-08-09 12:15:00 CST 21AWIC-1122 STORET-170383601 230229.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
131921 21AWIC ALABAMA DEPT. OF ENVIRONMENTAL MANAGEMENT - WA... 21AWIC-51908_230227_173 2007-08-09 12:15:00 CST 21AWIC-1122 STORET-170383589 230227.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
450314 USGS-AL USGS Alabama Water Science Center nwisal.01.99900500 1999-03-02 14:20:00 CST USGS-02376115 NWIS-104002666 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
450315 USGS-AL USGS Alabama Water Science Center nwisal.01.00201479 2001-11-28 12:05:00 CST USGS-02377570 NWIS-53918846 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
450316 USGS-AL USGS Alabama Water Science Center nwisal.01.00202076 2001-10-03 16:40:00 CDT USGS-02376115 NWIS-104000948 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
450317 USGS-AL USGS Alabama Water Science Center nwisal.01.00202072 2001-11-28 13:45:00 CST USGS-02376115 NWIS-104000936 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
450318 USGS-AL USGS Alabama Water Science Center nwisal.01.00201474 2001-10-03 14:15:00 CDT USGS-02377570 NWIS-53918826 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

398510 rows × 117 columns

[88]:
# This expands the single col (QA_flag) out to a number of new columns based on the unique characteristicNames and speciation
print('{} new columns'.format(len(df2.columns) - len(df.columns)))
14 new columns
[89]:
# Note: there are fewer rows because NAN results are also dropped in this step
print('{} fewer rows'.format(len(df)-len(df2)))
51809 fewer rows
[90]:
#Examine Carbon flags from earlier in notebook (note these are empty now because NAN is dropped)
cols = ['ResultMeasureValue', 'ResultMeasure/MeasureUnitCode', 'Carbon', 'QA_Carbon']
df2.loc[df2['QA_Carbon'].notna(), cols]
[90]:
ResultMeasureValue ResultMeasure/MeasureUnitCode Carbon QA_Carbon

Next the table is divided into the columns of interest (main_df) and characteristic specific metadata (chars_df)

[91]:
# split table into main and characteristics tables
main_df, chars_df = wrangle.split_table(df2)
[92]:
# Columns still in main table
main_df.columns
[92]:
Index(['OrganizationIdentifier', 'OrganizationFormalName',
       'ActivityIdentifier', 'MonitoringLocationIdentifier', 'ProviderName',
       'Secchi', 'Temperature', 'DO', 'pH', 'Salinity', 'Nitrogen',
       'Speciation', 'TOTAL NITROGEN_ MIXED FORMS', 'Conductivity',
       'Chlorophyll', 'Carbon', 'Turbidity', 'Sediment', 'Phosphorus',
       'TP_Phosphorus', 'TDP_Phosphorus', 'Other_Phosphorus', 'Fecal_Coliform',
       'E_coli', 'DetectionQuantitationLimitTypeName',
       'DetectionQuantitationLimitMeasure/MeasureValue',
       'DetectionQuantitationLimitMeasure/MeasureUnitCode',
       'Activity_datetime', 'Depth', 'QA_pH', 'QA_Chlorophyll',
       'QA_Fecal_Coliform', 'QA_Turbidity', 'QA_DO', 'QA_E_coli',
       'QA_Conductivity', 'QA_Temperature', 'QA_Carbon', 'QA_Salinity',
       'QA_TP_Phosphorus', 'QA_TDP_Phosphorus', 'QA_Other_Phosphorus',
       'QA_Nitrogen', 'QA_Secchi'],
      dtype='object')
[93]:
# look at main table results (first 5)
main_df.head()
[93]:
OrganizationIdentifier OrganizationFormalName ActivityIdentifier MonitoringLocationIdentifier ProviderName Secchi Temperature DO pH Salinity ... QA_E_coli QA_Conductivity QA_Temperature QA_Carbon QA_Salinity QA_TP_Phosphorus QA_TDP_Phosphorus QA_Other_Phosphorus QA_Nitrogen QA_Secchi
136312 21AWIC ALABAMA DEPT. OF ENVIRONMENTAL MANAGEMENT - WA... 21AWIC-51908_230231_173 21AWIC-1122 STORET NaN NaN NaN NaN 36.356 dimensionless ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
139958 21AWIC ALABAMA DEPT. OF ENVIRONMENTAL MANAGEMENT - WA... 21AWIC-51908_230230_173 21AWIC-1122 STORET NaN NaN NaN NaN 36.345 dimensionless ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
132608 21AWIC ALABAMA DEPT. OF ENVIRONMENTAL MANAGEMENT - WA... 21AWIC-51908_230228_173 21AWIC-1122 STORET NaN NaN NaN NaN 36.338 dimensionless ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
138586 21AWIC ALABAMA DEPT. OF ENVIRONMENTAL MANAGEMENT - WA... 21AWIC-51908_230229_173 21AWIC-1122 STORET NaN NaN NaN NaN 36.336 dimensionless ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
131921 21AWIC ALABAMA DEPT. OF ENVIRONMENTAL MANAGEMENT - WA... 21AWIC-51908_230227_173 21AWIC-1122 STORET NaN NaN NaN NaN 36.33 dimensionless ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 44 columns

[94]:
# Empty columns that could be dropped (Mostly QA columns)
cols = list(main_df.columns)
x = main_df.dropna(axis=1, how='all')
[col for col in cols if col not in x.columns]
[94]:
['Sediment',
 'QA_Fecal_Coliform',
 'QA_E_coli',
 'QA_Conductivity',
 'QA_Carbon',
 'QA_TP_Phosphorus',
 'QA_TDP_Phosphorus',
 'QA_Other_Phosphorus',
 'QA_Secchi']
[95]:
# Map average results at each station
gdf_avg = visualize.map_measure(main_df, stations_clipped, 'Temperature')
gdf_avg.plot(column='mean', cmap='OrRd', legend=True)
[95]:
<Axes: >
../_images/notebooks_Harmonize_Pensacola_Detailed_146_1.png