Data Analysis Methodology

The provisions of sections 305(b) and 303(d) of the Clean Waters Act (CWA) require the TCEQ to identify water bodies in the state that do not satisfy the uses or meet the numerical criteria (parameter concentrations) defined in the Texas Surface Water Quality Standards (TSWQS). The TCEQ provides a report to the EPA (the Texas Integrated Report of Surface Water Quality, hereinafter referred to as the IR), every two years. The report includes identification of impaired waters (the 303(d) List), a list of water bodies evaluated, identification of water bodies either newly listed or removed from the 303(d) List, and other supporting information. References to impairments and concerns in the Basin Summary Report are based on information contained in the 2014 IR.

TCEQ has divided the water bodies of the state into distinct segments that generally represent natural watersheds. Each water body is given a number that indicates the river basin and segment. If Appendix A of the TSWQS (found in the Texas Administrative Code [TAC], Title 30, §307) specifies use and numerical criteria for the segment, it is considered a classified segment. Tributaries that flow into a classified segment are termed unclassified water bodies, and are generally assessed against the standards that apply to the classified segment. An exception would be water bodies that are listed in Appendix D of the TSWQS.

For rivers and streams, each segment is subdivided into hydrologically-distinct units referred to as an assessment unit (AU). The endpoints of an AU are typically at hydrologically distinct areas along a segment. Each segment may be divided into more than one AU, and each AU may contain more than one monitoring station. Ideally, an AU should be no more than 25 miles in length, and the monitoring station should be located in an area that is hydrologically representative of the AU. Larger bodies of water like reservoirs, bays, and estuaries, will generally have at least one AU that is representative of the central area of the water body, and one AU for each tributary arm feeding into it. The determination of standards attainment or non-attainment is based on water quality conditions in the AU. Impairments and concerns usually apply to the AU rather than the entire segment. Chloride, sulfate, and total dissolved solids standards attainment are based upon the average concentration in the entire segment, and impairments apply to the entire segment.

Numerical criteria assigned to a segment are tied to TSWQS defined uses, which include aquatic life use (ALU), recreational use (contact or noncontact), domestic water supply use, general use, and fish consumption use. Fish consumption uses and impairments are derived from the Department of State Health Services (DSHS) fish advisories based on DSHS fish surveys.

Each classified segment is assigned an ALU based on physical, chemical and biological characteristics. There are four ALU categories assessed: limited, intermediate, high, and exceptional. A number of methods are used to assess support of the ALU, including dissolved oxygen criteria, toxic substances in water criteria, ambient water and sediment toxicity test results, and indices for habitat, benthic macroinvertebrate and fish communities. This biological assessment involves an evaluation of the available instream habitat and the diversity and distribution of fish and macrobenthic invertebrates in a water body, which can vary as a result of changes in the aquatic environment caused by natural and/or non-natural events. Non-natural agents of change include anthropogenic pollution, instream habitat alteration, and disruption of the instream flow conditions to which the biological community is adapted. Species abundance, distribution, and diversity data are used to calculate a numerical indicator known as the index of biotic integrity (IBI). Separate IBIs are calculated for benthic macroinvertebrates and nekton (fish). A “habitat score” is also calculated. When the IBI or habitat score falls outside a statistically-derived range, a concern or impairment is identified. Biological assessment involves collecting, counting and identifying the organisms that constitute the biological community, and using these findings to calculate the IBI. Calculation of the IBI is complex and is discussed in detail in the TCEQ Surface Water Quality Monitoring Procedures, Volume 2: Methods for Collecting and Analyzing Biological Assemblage and Habitat Data.

TCEQ uses data collected during the most recent seven-year period (December 1 through November 30 of the preceding year) for the IR. An attempt is made to compile a spatially and temporally representative dataset. Samples must be collected over at least two years, no more than two-thirds of the data can represent a single year, and no more than one-third of the samples should be from the same season. Samples obtained during low flow events (as defined in the TCEQ guidance document) may be excluded. At least 10 samples (20 samples for bacteria) are required to determine standard attainment or nonattainment, but four or more samples can be used to establish a concern. If there is insufficient data to determine standards attainment and/or support of the designated use, that information is noted in the IR.

Nonsupport for most designated uses is established by evaluating the number of times a water quality standard was exceeded. Statistical methods are applied to the calculation of exceedance thresholds to ensure that the chance of claiming that an AU is impaired when in reality it is not (a Type I error), remains fixed. These exceedance thresholds (the number or exceedances required to consider an AU impaired) vary with sample size and parameter. Calculation of the threshold for delisting an AU follows the same logic, but the number of exceedances required to change the status of the AU is not the same in each case. Standards may also be stated in terms of the median concentration or geometric mean (the average of the logarithms of sample results, converted back to a natural number). For indicator bacteria (E. coli, enterococci), the evaluation of standards attainment involves comparison of the seven-year geometric mean to the WQS. For new bacteria listings, the value of the lower boundary of an 80 percent confidence interval must exceed the WQS, but an AU will only be delisted for bacteria if the actual geometric mean falls below the WQS. The current WQS includes a single-sample limit for bacteria in addition to the geometric mean limit, but the single-sample criterion has not been used for assessing CWA compliance since the 2010 assessment.

Each impairment in an AU is assigned to one of two categories; 4 or 5. Only those AUs with category 5 impairments are included on the 303(d) list of impaired waters. Category 5 includes:

  • Impairments of water quality parameters for which a total maximum daily load (TMDL) project is under way, scheduled or will be scheduled (5a)
  • Cases where review of water quality standards for that AU is appropriate (5b)
  • Cases where additional data must be collected before a TMDL is scheduled (5c)

In this report, H-GAC staff addresses the impairment and concern status in the 2014 IR for each classified segment and unclassified water body within the H-GAC service region. A total of 54 segment watersheds are discussed in detail in Watershed Summary documents that can be accessed through the Interactive Map or the Frog Chart.

The identification of long- and short-term trends is important to many of our stakeholders, and these trends are important components of H-GAC’s work, particularly in relation to the evaluation and revision of our regional monitoring efforts and priorities. Several methods of analyses were used by H-GAC to characterize surface water quality in the H-GAC region. Trend analysis can identify cases where the value of a water quality parameter is changing over time. Statistical tests are performed to distinguish “statistically significant” trends from random and seasonal variation. While it might seem reasonable to use all the data available for these analyses, as the amount of data increases the likelihood of finding a “statistically significant” but unimportant trend also increases. To minimize this, H-GAC performed trend analysis on the most recent fifteen years (June 2000-May 2015) of TCEQ-validated data to focus on recent trends in water quality in the H-GAC service area. The amount of data collected in this area more than doubled between 1997 and 2000, as samples were collected at more locations and in many cases more frequently, and additional parameters were added to routine monitoring and analytical protocols.

All data management and statistical analysis was performed using Statistical Analysis System (SAS) version 9.3. Complete details of data selection, preparation, and analysis can be found in the SAS code used to select, format, and analyze data for this report, which is included in the appendix.

Water quality data used for analyses in this report were extracted from a dataset downloaded from the Surface Water Quality Monitoring Information System (SWQMIS) on October 16, 2015. SWQMIS is a database that serves as the repository for surface water quality data for the state of Texas. All data used for these analyses were collected under a TCEQ-approved Quality Assurance Project Plan (QAPP). Qualified data (data added to SWQMIS with “qualifier” codes that identify quality, sampling, or other problems that may render the data unsuitable) were excluded from the download. All data for all stations in the H-GAC CRP region (in general, basins 9, 10, 11, 13, and 24), collected from June 1, 2000 through May 31, 2015 were combined. Flow data from 82 USGS gauging stations in TCEQ Region 12 were downloaded from the USGS website on September 16, 2015.

Variables in each data set were transformed as appropriate, and new variables were created to facilitate analysis and the graphical display of results. In some cases, data from two or more STORET (method) codes were combined because the results obtained from each method can be considered equivalent. Any data that were collected at a depth greater than 0.3 meters, or that were not collected under a routine ambient monitoring program, were deleted. If a parameter (or parameters) was measured multiple times per month in the same year at the same station, only one result for the same month, year, and parameter for the station was retained. This is because multiple results clustered in one temporal region of the time series have the potential to bias the interpretation of trend analysis and the representative station selection procedure (described below), if included.

Censored data (data reported as “< [parameter limit of quantitation (LOQ)]” were transformed to a value of one-half the parameter LOQ associated with the data, with some important exceptions. Because nutrient quantitation limits have been lowered over time, the presence of data censored at many different LOQs in the same dataset poses several problems. If the data for a given parameter are censored at values well above a later, lower LOQ value, trend analysis could suggest a trend where no real water quality trend is present. There is no ideal solution to this problem. Editing the censored data alone would limit but not eliminate false trends. In cases where some of the data reflected use of a lower LOQ than the current H-GAC CRP LOQ, values were transformed to one-half of the H-GAC CRP LOQ in an attempt to minimize the identification of trends caused by changing analytical methods. H-GAC does not believe the impact from this transformation is significant. For example, the ammonia mean value of all of the data prior to transformation was 0.052; the value for each after transformation was 0.05. In the case of nitrate-nitrogen and total phosphorus, if the result was reported at or below the highest commonly used LOQ, a value of one-half of that LOQ (0.1 and 0.03 mg/L, respectively) was used for trend analysis. The impact of this analysis would be most pronounced for parameter trends typically found at concentrations at or near the quantitation limit in that specific water body. In the case of nitrate-nitrogen trends, when the data at or below the LOQ were treated as one-half of the highest common LOQ and used in the trend analysis, 61 trends were identified. If a value of one-half of the associated censoring value (the censored value as reported in the raw data) was used in each case, an additional 17 trends appear (primarily in unclassified water bodies with limited data). Overall, 3.4 percent of the data were transformed (for the purpose of trend analysis) to minimize the impact of inconsistent quantitation limits in the original dataset.

The following parameters were selected for analysis:

  • Instantaneous flow (00061)
  • Specific conductance (00094)
  • Temperature (00010)
  • Dissolved Oxygen (00300)
  • Secchi transparency (00078)
  • pH (00400)
  • E. coli (31699)
  • Enterococci (31701)
  • Chlorophyll a (32211)
  • Total phosphorus (00665)
  • Ammonia-nitrogen (00610)
  • Nitrate + nitrite (00630) and nitrate (00620)
    • Only one of these parameters was used in our analyses. Nitrate+nitrite was selected when available, but some labs have reported nitrate rather than nitrate+nitrite. These two parameters were considered equivalent for the purpose of analysis.
  • Total Kjehldahl nitrogen (00625)
  • Total suspended solids (00530)
  • Days since last significant rainfall (72053)

Data from selected stations believed to be most representative of segment water quality were compiled for segment-level trend analysis. At each stage of the selection process, rankings were evaluated and were subject to revision on the basis of best professional judgment. A table of all data for stations that were monitored in each of the past three years (2013-2015) was compiled. The mean, median, and 95th percentile were calculated and added to the table for each station/parameter combination as well as for station/parameter combinations for each segment as a whole. The differences between station/parameter and segment/parameter metrics (both absolute and as a percentage of the segment mean) were calculated as were the number of observations (n) for each category. The number of USGS and ADP-obtained flow measurements made at that station was also determined. The stations with a mean value for a given parameter that was numerically closest to the mean value for the segment as a whole were ranked as “1” using SAS procedures (PROC SORT followed by PROC RANK). The station with a ranking of “1” on the most number of parameters was tentatively identified as the representative station for that segment. If two stations had the same number of highest-ranked parameters, the length of the time series (number of distinct years with data) was considered, and the station with data in the greatest number of years represented was selected. The results of this process were then reviewed for reasonableness and consistency. If doubt remained about which station to select, the station with the most flow measurements (preferably from USGS stations) was selected. The location of the selected stations were then mapped and evaluated in relation to segment boundaries as a further check on the process. In almost all cases, the station selected on the basis of numeric criteria was located near the downstream boundary of the segment. If the selected station was located far from the boundary, further evaluation was performed and another station was selected. Three stations that were selected using the process described above were replaced by nearby stations that were associated with a USGS gauging station. Refer to the flow chart below for a simplified explanation of the representative station selection process.

Process used to select representative stations for each segment

Process used to select representative stations for each segment

If there were no data for a parameter in a segment during one year in the 15-year time period, additional data were added from the geographically closest station in the segment (for that year and parameter only). This process continued until a time series with at least four data points for each of 15 years (up to180 data points) was produced, if possible. If there were fewer than 30 data points or a time series range of less than seven years for a parameter in a segment, it was deleted from the trend data subset and not included in the trend analysis.

For station-level trend analysis, a data set containing data for all stations in the 2015 Coordinated Monitoring Schedule (CMS) was compiled. The station-level trends supplemented the evaluation of segment-level trends. If the segment-level trend identified from analysis of data collected at the representative station(s) selected differed significantly from a trend (or absence of a trend) suggested by data collected at other stations within the segment, the discrepancy was noted in the water quality trend narratives or the dataset was re-evaluated. In addition, the station-level dataset was transposed to allow analysis of inter-parameter relationships, correlations with flow measurements and rain event reports, and other analyses as deemed appropriate. These analyses supplemented interpretation of observed trends, and in some cases suggested relationships that might not be evident from trend analysis alone.

A table of descriptive statistics for each parameter was produced for every monitoring station and segment (see Appendix). In addition to basic summary statistics, water quality standard and screening level exceedance statistics were calculated.

The first stage of trend analysis at both the segment- and the station-level was nonparametric correlation analysis (Kendall’s tau-b) of the parameter value with the sample collection date to identify correlations that were significant at p <0.054. These potential trends were then evaluated with up to four other methods. Simple linear regression of the natural log of the parameter value on the time variable was performed for all data in the subset selected by H-GAC for trend analysis. LOESS (locally-weighted least squares) regression and correlation of flow-adjusted residuals was applied to stations associated flow data. If there were no temporal gaps in the time-series (missing years, consistently missing seasons), seasonal Kendall/Sen Slope estimation/Theil regression was run. If more than 15 percent of the data were censored at the analytical limit of quantitation, survival analysis (Tobit analysis in SAS PROC LIFEREG) was performed.

Plots of selected statistically-significant trends are included in each Watershed Summary document along with a detailed qualitative discussion explaining whether the water quality parameter is increasing, decreasing, or stable over time, and why that may be the case. If the selected station is near a USGS gauging station, flow data is plotted as well. If the trend is described as “Increasing” or “Decreasing” it means the calculated p-value is below the threshold of 0.054 selected by H-GAC. Trends identified as “Stable” have a calculated p-value greater than 0.054. When evaluating the results of several trend analyses of a given parameter, H-GAC placed the most weight on the Kendall correlation because nonparametric methods are insensitive to outliers in the time series. If no flow data were available, the flow-adjusted trend appears as “Not calculated” (indicating no flow data is available) or “Insufficient Data” (indicating only one flow value exists and a correlation could not be calculated). If the seasonal Kendall/Sen Slope trend was not calculated due to gaps (missing seasons) in the time series, the seasonal Kendall trend appears as “Not Calculated.” Survival analysis was only applied in those cases where the amount of censored data could bias the results of the other methods; H-GAC set the threshold at 15 percent or more censored data. If fewer than 15 percent of the data were censored, survival analysis was not performed, and the trend appears as “Not Applicable.”

In addition, LOESS plots of the parameter value against time were made for every segment/parameter and station/parameter combination, whether a statistically significant trend was present or not. These graphs can be found in the appendices of this report.

A conservative trend analysis was performed using seven years of recent data (June 1, 2008 through May 31, 2015) at up to three representative monitoring stations on the 2015 CMS in the classified portion of each watershed to detect trends at the watershed level for the H-GAC Frog Chart. The representative stations used were the same as those selected for the trend analysis that appeared in the 2015 Basin Highlights Report Frog Chart, and were chosen by comparing parameter statistics for each station to the statistics for the segment as a whole. Trends were identified by nonparametric correlation analysis and simple linear regression. Because nonparametric methods are less sensitive to extreme values in the data than parametric techniques like linear regression, trends that were suggested by linear regression analysis alone were not included in the chart.

Trends (for the “Frog Chart” analysis) were considered statistically significant if the p-value was below 0.0545.

In addition to trend analysis, H-GAC created plots of seven-year geometric means for indicator bacteria for each segment. These are a type of moving- or rolling-average plot, and they are constructed by calculating the geometric mean of all data collected up to seven years before a given sample was collected and plotting it (on the y-axis) against the collection date (on the x axis) of the last sample in the series. A smoothed line (penalized B-spline) is fitted to the time series. One can assess the change over time in bacterial density from this sort of plot more easily than from a simple plot of density versus time. These plots are more meaningful for segments with a lot of historical bacteria data than it is for segments recently added to monitoring schedules (typically unclassified segments).