EPW export - "Move first item to end position" explanation

jerome · April 2, 2021, 3:57pm

Hi.

I’m trying to understand the value shift/permutation in EPW export:

ladybug-tools/ladybug/blob/1e4493f5acc6a458a8744f33e6c924f225b9cfc2/ladybug/epw.py#L1538-L1543


# if the first value is at 1AM, move first item to end position
for field in xrange(0, self._num_of_fields):
    point_in_time = self._data[field].header.data_type.point_in_time
    if point_in_time:
        first_hour = self._data[field]._values.pop(0)
        self._data[field]._values.append(first_hour)

I assumed it is due to data for the first hour being marked at 0:00 in Ladybug and 1:00 in E+. Searching the git repo, I stumbled upon

https://github.com/ladybug-tools/ladybug/pull/55 and 99 (sorry, can’t put more than 2 links in a post)

which explain that things are a bit more complicated, with E+ interpreting time differently depending on the value type (radiation and illuminance being exceptions to the rule).

I couldn’t find a single source explaining how Ladybug and E+ expect the data to be timestamped (beginning of hour, half hour, hour).

Also I find it strange to move the first data to the end of the file. Assuming the first temperature of the year is 10°C and the last is -10°C, this will introduce a 20°C temperature step in 1 hour which is physically wrong. Wouldn’t we be better off just duplicating the last value if we really need a full year?

I’m trying to write .epw from real-life data, and I think it should be possible to produce a file with each sample at the right time. However, AFAIU, even if I have more than one year of data, when going through Ladybug I’m cropping the data to one year, not more, then moving the first sample at the end.

I’ve been tempted to remove this part from the code but as I wrote above I don’t really know what I’m doing and I’d probably end up with a 1 hour shift in the end.

chris · April 3, 2021, 1:18am

You found the correct culprit, @jerome . It all goes back to the fact that the EPW format does not use “real” date/times and instead uses times like 24:00, which do not exist on any clock that I know of or in any datetime module of any major computer language. I think the number of on this Github issue to change E+ shows that I’m definitely not alone in expressing the headache that these datetimes have caused.

We struggled with a few different ways to work around this issue. In Legacy Ladybug, we tried following the EPW structure and made 1:00 the first time of the year (instead of 0:00) and we made the start hour-of-the-year (HOY) equal to 1 instead of 0. But we now see this as a mistake since it ultimately created several cases of mismatched datetimes across the plugin. So we knew that the start datetime definitely needed to be 0:00.

We could have just treated the EPW value at 1:00 as if it were at 0:00 but that would almost certainly result in mismatches between the imported EPW data and other timeseries data that might not originate from an EPW.

We looked at what E+ assumes and, given that every annual E+ run starts off with the simulation “warming up” by running through the first day of the year over and over until it converges, the value at 0:00 on Jan 1 is technically the value at 24:00 on Jan 1 (or I guess 0:00 on Jan 2?). We considered doing something like what E+ does but, then, that would result in a net loss of data because the value at Dec 31 24:00 of an imported EPW would effectively get deleted from the imported data collections upon import. So, when you went to write the imported EPW object back to a file, you would have replaced the real value that existed at Dec 31 24:00 (aka. 0:00 on Jan 1) with a duplicated value of whatever was at 24:00 on Jan 1 (aka. 0:00 on Jan 2).

So, ultimately, we just decided that “ladybug world” is separate from “epw file world”. In ladybug world, we use real datetimes and, whenever we import an EPW file, we perform the operation of moving the end value to the start to make things align with the datetimes. When we export the EPW object back to an .epw file, we put the start value back to the end. I know it can sometimes result in a jump between values for the first hour in the data collection but it’s really the only way that we could think of to “keep the worlds separate” and keep each world consistent with its own logic.

Could you describe further how this decision isn’t working out well for what you are trying to accomplish? Are you just questioning whether you should build an annual EPW by going from:

0:00 Jan 1, 2020 — 23:00 Dec 31, 2020
vs.
1:00 Jan 1, 2020 — 0:00 Jan 1, 2021

I’m sorry to say that, for reasons that were decided over 25 years ago when epw files were first made, it’s the latter. Welcome to our collective struggle with EPW datetimes!

jerome · April 9, 2021, 12:25pm

Hi @chris.

First of all, thank you for this thorough answer. Things are much clearer to me now and I agree time representation in EPW is a pain (and I added my thumb up to that GH issue).

I tend to forget the first intent of Ladybug here was to import/modify/export EPW files, not to create files from scratch, and I missed the fact that the last item is moved to first position on import (because I don’t use the import).

I’ve been thinking this over and over for several hours now. I had trouble understanding because I was thinking in terms or time intervals (first step being [0:00 Jan 1, 2020 - 1:00 Jan 1, 2020]) rather than points in time and I thought the first interval was the same everywhere except EPW would express it with the end time while everyone would express it with the start time. I realize I was wrong and your message above makes much more sense to me now.

It is not only a time representation issue. It is not about replacing 24:00 with 00:00. In fact, we don’t care much about the serialization of the date because AFAIU, the timestamp on each line is ignored and only the analysis period matters. It is about having the analysis period bounds correct.

The issue with the current solution is that the representation in the Ladybug world is kinda wrong, with the first value of the list being the last of the simulation. So we have an internal representation format that is twisted to account for a specificity of the external format.

I think this is what you address in your second paragraph:

In Legacy Ladybug, we tried following the EPW structure and made 1:00 the first time of the year (instead of 0:00) and we made the start hour-of-the-year (HOY) equal to 1 instead of 0. But we now see this as a mistake since it ultimately created several cases of mismatched datetimes across the plugin. So we knew that the start datetime definitely needed to be 0:00.

Defining the analysis period starting at 1 would allow to keep values in order with datetimes being consistently expressed. I can’t imagine the issues “across the plugin” because I’m only using a small subpart of it, so I’ll trust you on that one. Well, at least if the data is meant to be used in another simulation tool (I saw Radiance mentioned) and that tool starts at 0, I can see trouble coming.

Anyway, your explanation clearly presents the implementation choice as a compromise and my partial understanding hardly allows me to challenge it.

I still have a concern.

The docstring for from_missing_values reads

from ladybug.epw import EPW
from ladybug.location import Location
epw = EPW.from_missing_values()
epw.location = Location(‘Denver Golden’,‘CO’,‘USA’,39.74,-105.18,-7.0,1829.0)
epw.dry_bulb_temperature.values = [20] * 8760

IIUC, when doing so, the dry_bulb_temperature values are wrong because the last value should be moved to first position. To avoid discrepancies with other values. And because it will be moved back on export. I didn’t see any setter in the code doing this automatically. Obviously, it doesn’t matter in this example since the value is constant but you get the idea.

In fact, again IIUC, this makes for a terrible API because the user is allowed to manipulate the values but without explicit knowledge of the internals, he can’t imagine the first value will be sent to the end. This can only work if the arrays that is passed comes from an EPW import.

My personal use case being the creation of files from scratch, I can live with this and

query my weather database starting at 1:00 and ending at 0:00 inclusive
modify the export function to not move the first value to the end

But I’m interested in your feedback because either I’m still misunderstanding, or the internal representation is problematic and in fact should probably be hidden from the user.

The second and unrelated aspect of the problem when merging data from different sources, which I initially overlooked, is knowing which time period is covered by a timestamp.

In EPW, all values are point in time except illuminance/radiation which are aggregations over the last time step, which is equivalent to half-hour (from #55 and EnergyPlus Weather File (EPW) Data Dictionary: Auxiliary Programs — EnergyPlus 8.3).

My data source is Oikolab. It is unclear to me what exact time interval is represented by 0:00 Jan 1, 2020. From their docs (https://docs.oikolab.com), it looks like they have all values representing a point in time except radiation/illuminance (and precipitation but we don’t need those) being values aggregated from the last hour. This seems to match with EPW. It might not be a coincidence. Either this is common practice, or they did it to match EPW. EPW export is quoted as a use case in their FAQ and they recommend LadyBug for the job. I sent them an email for confirmation/clarification.

Hopefully, all values match already and I’ll get away with it. Otherwise, I might have to shift some values. Or maybe there’ll be only a 30 minutes shift (if the value is point in time in Oikolab and half-hour in EPW) in which case I may interpolate or perhaps just let it go.

I can’t believe how much time I spent trying to understand this. I won’t blame it on LadyBug, rather EPW and myself. It’s not a total loss as I understand much more clearly what I’m doing now.

chris · April 9, 2021, 2:55pm

Thanks @jerome .

All good points.

We can definitely add a method to the HourlyContinuousCollection class that automatically moves the end value to the start so that you can just take your data from Jan 1 1:00 to Jan 1:0:00 and assign it to the EPW data collection using that method. I opened a issue for it here and I’ll try to add it soon:

This raises a really important point, which is that all of the EPW data collections move the start value to the end before export EXCEPT the radiation and illuminance data fields. These ones are totally unaltered between import and export for the reasons that you specified. You can use the point_in_time property on the data type class to verify whether the data type is one that gets shifted to align with datetimes or not:

https://www.ladybug.tools/ladybug/docs/ladybug.datatype.base.html#ladybug.datatype.base.DataTypeBase.point_in_time

jerome · April 30, 2021, 3:10pm

Hi @chris.

My gut feeling about this is that the internal representation should reflect the truth rather than try to hide it. This implies having the analysis period start at 1. Unfortunately, you said this brought another pile of troubles. But I’m afraid #490 might just bring another layer of issues.

Our use case was initially to store weather data as .epw, which implied reads and writes. This has changed and now we just write it. Considering the issues above and in Create EPW files for arbitrary dates potentially spanning multiple years - #5 by jerome, I realized it would be easier for us to just write the file without Ladybug, as we’re not using 5% of its features and the little we use, I have to override half of it to address said issues.

Anyway, time spent investigating this was not a loss and things are much clearer to me now.

Thanks again for your time.

Hopefully, these explanations can also help other users.

Bye.

Jérôme