Skip to content

Sufficiency Criteria
Last updated: OpenDSM 1.2
🔗

Most sufficiency criteria derive their origins from the CalTRACK specifications. Old reference numbers to the CalTRACK specifications are no longer valid and the new reference numbers should be used when discussing OpenDSM. A remnant of the old CalTRACK specifications is that there are two types of checks performed, disqualification and warnings. Disqualification is a hard line that means meters should not be used for measurement. A warning is purely for experts to take a deeper look at the data to possibly disqualify them. Only explicit disqualifications will be defined herein.

Many sufficiency criteria are duplicated between the various models, but for the sake of completeness they will be included in definitions for all models.


Nomenclature🔗

  • Valid Data: Data which is not NULL, NaN, or otherwise empty
  • Joint Data: The combination of all inputs

1. Data Sufficiency🔗

1.1 User Responsibilities🔗

There are some checks that should be performed which cannot be performed within the confines of the data or model classes, but are critical for valid measurements

1.1.1 Period Definition🔗

1.1.1.1 Blackout Period🔗

The blackout period should be known, or at least estimated, and excluded from being included in the data.

1.1.1.2 Baseline Period🔗

The baseline period should be one year immediately prior to the blackout period

1.1.1.3 Reporting Period🔗

The reporting period should be one year immediately following the blackout period

1.1.2 Units🔗

Units can be critically import to model performance. Convert your units accordingly.

1.1.2.1 Temperature🔗

Temperature data should in Fahrenheit

1.1.2.1 Consumption/Usage🔗

Consumption data is expected to be in some kind of units of energy

1.1.3 Model Results🔗

1.1.3.1 Predicted Energy Aggregation🔗

Predicted energy can be aggregated through simple summation

1.1.3.2 Predicted Energy Uncertainty Aggregation🔗

Predicted energy uncertainty should be aggregated by summing in quadrature

1.1.4 Location🔗

There are two options for location data, but Hourly DQ 1.1.4.1 is greatly preferred.

1.1.4.1 Latitude and Longitude🔗

Latitude and longitude should be known to three decimal places

1.1.4.2 ZIP Code Tabulation Area (ZCTA)🔗

If absolutely necessary, the centroid of the ZCTA may be used in place of latitude and longitude

1.1.5 Non-Routine Events🔗

Identifying and addressing non-routine events (NRE) is a best practice for making measurements, but is not required by OpenDSM.

1.1.5.1 Net Metering Status Change🔗

If a meter’s net metering status changes during a period, the meter should be disqualified as an NRE. Negative meter data is indicative of net metering, but a meter may have an undersized system and remain positive at all datetimes.

1.1.5.2 Electric Vehicle Status Change🔗

If a meter’s elecric vehicle charging status changes during a period, the meter should be disqualified as an NRE.

1.1.5.3 Heuristic-Based Identification🔗

Observed values which fall outside of \([Q_1 - 3\times IQR, Q_3 + 3\times IQR]\), a modification of the general 1.5 IQR Rule, can be investigated for disqualification as an NRE.


1.2 Common🔗

Common data sufficiency are prerequisites to both the baseline and reporting data sufficiency checks.

1.2.1: Blackout Exclusion🔗

Blackout period data should not be included in either the baseline or reporting periods.

1.2.2: Data Exists🔗

Input is not an empty dataset

1.2.3: Datetime Time Zone-Aware🔗

Datetimes must include time zone information and all data must have the same time-zone information

1.2.4: Duplicate Data🔗

No duplicated datetimes are allowed

1.2.5: High-Frequency Data🔗

At least 50% of high-frequency data must be valid. Missing data must be imputed for aggregations

1.2.6: Missing Temperature🔗

Missing temperature data will result in the entire datetime to be considered missing

1.2.7: Minimum Daily Temperature Coverage🔗

The percentage of valid days (days with greater than 90% valid temperature data coverage) must be greater than 90%

1.2.8: Minimum Daily Joint Coverage🔗

The percentage of valid days (days with greater than 90% valid joint data coverage) must be greater than 90%

1.2.9: Minimum Monthly Temperature Coverage🔗

Each month in the period must have at least 90% valid temperature data for all datetimes


1.3 Baseline Period🔗

The baseline period must meet both the Common and the baseline period sufficiency criteria.

1.3.1: Baseline Length🔗

The baseline length must be of an appropriate length

1.3.1.1: Maximum Baseline Length🔗

The baseline length must be less than 366 days. This is 1 day longer than a standard year to account for leap years

1.3.1.2: Minimum Baseline Length🔗

The baseline length must be at least the floor of 90% of the maximum baseline length as defined in Daily DQ 1.3.1.1, floor(366*0.9) = 329 days

1.3.1.3: Full Datetime Range🔗

A full year of datetimes should be provided

1.3.2: Negative Gas Data🔗

For gas data, observed values may not be less than 0

1.3.3: Minimum Daily Observed Coverage🔗

The percentage of valid days (days with greater than 90% valid observed data coverage) must be greater than 90%


1.4 Reporting Period🔗

The reporting period must meet the Common sufficiency criteria.


2. Model Sufficiency🔗

A fit daily model must meet either CVRMSE or PNRMSE criteria to be qualified for measurement.

2.1 CVRMSE🔗

2.1.1: Maximum CVRMSE🔗

The adjusted CVRMSE must be less than or equal to 1.0

2.1.2: Minimum CVRMSE🔗

The adjusted CVRMSE must be greater than or equal to 0.0


2.2 PNRMSE🔗

2.2.1: Maximum PNRMSE🔗

The adjusted PNRMSE must be less than or equal to 1.6