LB Deconstruct Data: output wrapping

savage · May 8, 2024, 10:44pm

There is a rather simple way to speed up the LB Deconstruct Data component by wrapping the output values in GH_Number objects:

from Grasshopper.Kernel.Types import GH_Number
...
    values = [GH_Number(v) for v in _data.values]

Here is an example of the speedup for extracting hourly data from 19 zones in an energy simulation (top is original component, bottom is with the above changes):

The reason for the speedup is that Grasshopper does a lot of checking to determine what type of object is being passed, on an item-by-item basis, so it can wrap them in the appropriate GH_ objects. By explicitly doing the wrapping ourselves, Grasshopper does not need to do the slow checking step. The wrapping is mostly helpful for components passing around thousands of objects, which is often the case with the LB Deconstruct Data component.

Any chance this change can get added to the Ladybug component?

Full Component Code

# Ladybug: A Plugin for Environmental Analysis (GPL)
# This file is part of Ladybug.
#
# Copyright (c) 2024, Ladybug Tools.
# You should have received a copy of the GNU Affero General Public License
# along with Ladybug; If not, see <http://www.gnu.org/licenses/>.
# 
# @license AGPL-3.0-or-later <https://spdx.org/licenses/AGPL-3.0-or-later>

"""
Deconstruct a Ladybug DataCollection into a header and values.
-

    Args:
        _data: A Ladybug DataCollection object.
    Returns:
        header: The header of the DataCollection (containing metadata).
        values: The numerical values of the DataCollection.
"""

ghenv.Component.Name = "LB Deconstruct Data"
ghenv.Component.NickName = 'XData'
ghenv.Component.Message = '1.8.0'
ghenv.Component.Category = 'Ladybug'
ghenv.Component.SubCategory = '1 :: Analyze Data'
ghenv.Component.AdditionalHelpFromDocStrings = '1'

from Grasshopper.Kernel.Types import GH_Number

try:
    from ladybug.datacollection import BaseCollection
except ImportError as e:
    raise ImportError('\nFailed to import ladybug:\n\t{}'.format(e))

try:
    from ladybug_rhino.grasshopper import all_required_inputs
except ImportError as e:
    raise ImportError('\nFailed to import ladybug_rhino:\n\t{}'.format(e))


if all_required_inputs(ghenv.Component):
    assert isinstance(_data, BaseCollection), \
        '_data must be a Data Collection. Got {}.'.format(type(_data))
    header = _data.header
    values = [GH_Number(v) for v in _data.values]

mostapha · May 10, 2024, 6:48pm

Hi @savage,

Let’s wait for @chris and see what he thinks but unless the gain is substantial I would suggest not using Grasshopper-specific modules in the components. If we decide to do so, they should probably live in the ladybug-rhino module so other packages like ladybug-blender can overwrite them in the respective module.

savage · May 13, 2024, 11:02pm

Hi @mostapha,

The speed improvement probably doesn’t matter for simple models, but if working directly with 8760 data outside of LBT components on more complicated models, the extra time can add up. This change shaved 5-10 seconds off of the post-processing for the building I am currently working on, and I am running dozens of simulations.

Do the grasshopper and blender components share code? I was having a little trouble finding what Github repo the python embedded in the Grasshopper components is coming from. I assumed it was purely in the ladybug-grasshopper component, but your response is making me second-guess that. Do you have some sort of build process for the ghuser objects, and is it documented somewhere?

chris · May 30, 2024, 7:13pm

Hi @savage ,

Sorry for the late response here and this is a really good suggestion. I am familiar with wrapping objects to be dumped into the Grasshopper UI and we actually already have a function that does this in our core libraries:

github.com

ladybug-tools/ladybug-rhino/blob/master/ladybug_rhino/grasshopper.py#L251-L268


      
          def wrap_output(output):
              """Wrap Python objects as Grasshopper generic objects.
          
              Passing output lists of Python objects through this function can greatly reduce
              the time needed to run the component since Grasshopper can spend a long time
              figuring out the object type is if it is not recognized.  However, if the number
              of output objects is usually < 100, running this method won't make a significant
              difference and so there's no need to use it.
          
              Args:
                  output: A list of values to be wrapped as a generic Grasshopper Object (GOO).
              """
              if not output:
                  return output
              try:
                  return (Goo(i) for i in output)
              except Exception as e:
                  raise ValueError('Failed to wrap {}:\n{}.'.format(output, e))

In that case, you see that we just wrap it to a Goo object and this is really useful for speeding up cases where Grasshopper doesn’t have a native object type for the thing we are dumping (causing it to waste a lot of time checking the object type). For example, you can see that we use it here to wrap the datetime objects that the Analysis Period component produces:

github.com

ladybug-tools/ladybug-grasshopper/blob/b9095a6d8ff1c379e970715efe8785f501b912de/ladybug_grasshopper/src/LB Analysis Period.py#L56


      
              raise ImportError('\nFailed to import ladybug_rhino:\n\t{}'.format(e))
          turn_off_old_tag(ghenv.Component)
          
          
          anp = ap.AnalysisPeriod(
              _start_month_, _start_day_, _start_hour_,
              _end_month_, _end_day_, _end_hour_, _timestep_)
          
          if anp:
              period = anp
              dates = wrap_output(anp.datetimes)
              hoys = anp.hoys

I haven’t really considered this wrapping to a GH_Number so far because we are not as type-strict in Python world and, if we wrap something that’s not a number, we’ll get an exception. But you’d be right to point to point out that Data Collections are always supposed to use numbers. I’ll admit that there are a couple of places under the hood where you have some data collections with strings (for special EPW fields) but these are not exposed in Grasshopper.

Plus, you’ve clearly shown that there’s a significant performance benefit, especially if you have lots of data collections coming from a large energy simulation.

Give me a few hours and I’ll add everything that’s needed to do this number-wrapping and implement it on the Deconstruct Data component.

chris · May 30, 2024, 8:04pm

I have added a new core library method that wraps outputs to GH_Number:

… and I have implemented that method in the LB Deconstruct Data component:

I also verified that it produces the desired performance enhancement and, on my nice desktop, a single hourly data collection can now be deconstructed in less than a millisecond:

This improvement is now available via the LB Versioner component.

I’m sure that people doing analysis of large data sets in LBT-Grasshopper are going to appreciate this. Thanks again for the suggestion, @savage !