Sufficiency Criteria
Last updated: OpenDSM 1.2π
Most sufficiency criteria derive their origins from the CalTRACK specifications. Old reference numbers to the CalTRACK specifications are no longer valid and the new reference numbers should be used when discussing OpenDSM. A remnant of the old CalTRACK specifications is that there are two types of checks performed, disqualification and warnings. Disqualification is a hard line that means meters should not be used for measurement. A warning is purely for experts to take a deeper look at the data to possibly disqualify them. Only explicit disqualifications will be defined herein.
Many sufficiency criteria are duplicated between the various models, but for the sake of completeness they will be included in definitions for all models.
Nomenclatureπ
- Valid Data: Data which is not NULL, NaN, or otherwise empty
- Joint Data: The combination of all inputs
1. Data Sufficiencyπ
1.1 User Responsibilitiesπ
There are some checks that should be performed which cannot be performed within the confines of the data or model classes, but are critical for valid measurements
1.1.1 Period Definitionπ
1.1.1.1 Blackout Periodπ
The blackout period should be known, or at least estimated, and excluded from being included in the data.
1.1.1.2 Baseline Periodπ
The baseline period should be one year immediately prior to the blackout period
1.1.1.3 Reporting Periodπ
The reporting period should be one year immediately following the blackout period
1.1.2 Unitsπ
Units can be critically import to model performance. Convert your units accordingly.
1.1.2.1 Temperatureπ
Temperature data should in Β°Fahrenheit
1.1.2.1 Consumption/Usageπ
Consumption data is expected to be in some kind of units of energy
1.1.3 Model Resultsπ
1.1.3.1 Predicted Energy Aggregationπ
Predicted energy can be aggregated through simple summation
1.1.3.2 Predicted Energy Uncertainty Aggregationπ
Predicted energy uncertainty should be aggregated by summing in quadrature
1.1.4 Locationπ
There are two options for location data, but Hourly DQ 1.1.4.1 is greatly preferred.
1.1.4.1 Latitude and Longitudeπ
Latitude and longitude should be known to three decimal places
1.1.4.2 ZIP Code Tabulation Area (ZCTA)π
If absolutely necessary, the centroid of the ZCTA may be used in place of latitude and longitude
1.1.5 Non-Routine Eventsπ
Identifying and addressing non-routine events (NRE) is a best practice for making measurements, but is not required by OpenDSM.
1.1.5.1 Net Metering Status Changeπ
If a meter’s net metering status changes during a period, the meter should be disqualified as an NRE. Negative meter data is indicative of net metering, but a meter may have an undersized system and remain positive at all datetimes.
1.1.5.2 Electric Vehicle Status Changeπ
If a meter’s elecric vehicle charging status changes during a period, the meter should be disqualified as an NRE.
1.1.5.3 Heuristic-Based Identificationπ
Observed values which fall outside of \([Q_1 - 3\times IQR, Q_3 + 3\times IQR]\), a modification of the general 1.5 IQR Rule, can be investigated for disqualification as an NRE.
1.2 Commonπ
Common data sufficiency are prerequisites to both the baseline and reporting data sufficiency checks.
1.2.1: Blackout Exclusionπ
Blackout period data should not be included in either the baseline or reporting periods.
1.2.2: Data Existsπ
Input is not an empty dataset
1.2.3: Datetime Time Zone-Awareπ
Datetimes must include time zone information and all data must have the same time-zone information
1.2.4: Duplicate Dataπ
No duplicated datetimes are allowed
1.2.5: High-Frequency Dataπ
At least 50% of high-frequency data must be valid. Missing data must be imputed for aggregations
1.2.6: Billing Period Lengthπ
1.2.6.1: Minimum Billing Periodπ
All billing periods must be greater or equal than 25 days
1.2.6.2: Maximum Billing Periodπ
All billing periods must be less than or equal to 35 days (if monthly cadence) or 70 days (if bimonthly cadence)
1.2.6.3: Combining Estimated Periodsπ
Estimated periods should be combined with the next period up to a 70 day limit. Estimated periods are considered as missing data for the purpose of determining data sufficiency.
1.2.7: Missing Temperatureπ
Missing temperature data will result in the entire datetime to be considered missing
1.2.8: Minimum Daily Temperature Coverageπ
The percentage of valid days (days with greater than 90% valid temperature data coverage) must be greater than 90%
1.2.9: Minimum Daily Joint Coverageπ
The percentage of valid days (days with greater than 90% valid joint data coverage) must be greater than 90%
1.2.10: Minimum Monthly Temperature Coverageπ
Each month in the period must have at least 90% valid temperature data for all datetimes
1.3 Baseline Periodπ
The baseline period must meet both the Common and the baseline period sufficiency criteria.
1.3.1: Baseline Lengthπ
The baseline length must be of an appropriate length
1.3.1.1: Maximum Baseline Lengthπ
The baseline length must be less than 366 days. This is 1 day longer than a standard year to account for leap years
1.3.1.2: Minimum Baseline Lengthπ
The baseline length must be at least the floor of 90% of the maximum baseline length as defined in Billing DQ 1.3.1.1, floor(366*0.9) = 329 days
1.3.1.3: Full Datetime Rangeπ
A full year of datetimes should be provided
1.3.2: Negative Gas Dataπ
For gas data, observed values may not be less than 0
1.3.3: Minimum Daily Observed Coverageπ
The percentage of valid days (days with greater than 90% valid observed data coverage) must be greater than 90%
1.4 Reporting Periodπ
The reporting period must meet the Common sufficiency criteria.
2. Model Sufficiencyπ
A fit billing model must meet either CVRMSE or PNRMSE criteria to be qualified for measurement.
2.1 CVRMSEπ
2.1.1: Maximum CVRMSEπ
The adjusted CVRMSE must be less than or equal to 1.0
2.1.2: Minimum CVRMSEπ
The adjusted CVRMSE must be greater than or equal to 0.0
2.2 PNRMSEπ
2.2.1: Maximum PNRMSEπ
The adjusted PNRMSE must be less than or equal to 1.6