queenBee multithreading, number of tasks

Hi all,

I was testing the sample files (annual_daylight) and I decided to benchmark the sample file with a much lowered grid size, to increase the load. CPU count was set to 5 as I have 6 physical cores (or 12 with HT, but that’s not the topic).

With 44,770 points and -ab 2:
sensor_count = 30 --> 55min simulation (this is default in the bundled .gh file)
sensor_count = 200 --> 12min simulation (this is LBT default)
sensor_count = 1000 --> 6 min simulation
sensor_count = 8954 (points/cpu count) --> 6 min simulation

I’ll suggest to have the default set to somewhere around points/cpu count

image

1 Like

@Mathiassn ,

Yes, the overhead of subdividing the grids into smaller chunks can come to dominate the simulation runtime if it’s not properly set. I think your suggestion is a good one but it’s not the easiest to implement in a way that always works. Let us think about it and maybe I will implement a hack to do it for the time being, which we will have to replace with a “correct” way to do it later.

Thanks! this aided hugely!

1 Like

Hi @Mathiassn, in case you only have one sensor grid the logic is a straightforward as you mentioned. You basically want to distribute the sensors between the CPUs equally.

However this can get complicated quickly if you have multiple sensor grids with different number of sensors. This is a common case in the full building with different rooms with different size. That’s why I think this is something that the user should set up instead of us trying to automate it.

You should also consider the post-processing step in the overall optimization. The most efficient option for multi-processing is to merge all the grids initially and then break them down between CPUs equally. But that will add an extra step to bring them back again and align them with sensor grids eventually. It’s something that we did in Honeybee[+] with the idea of pushing the results to a database so we could quickly put them back together but that didn’t scale well using sqlite. Using other databases adds complexity to installation and cost in cloud solutions. There are still options that we can try but what I’m trying to say is that it’s not as simple as dividing the number of sensors by the number of CPUs.

4 Likes

I sort of agree, but it maybe calls for different workflows if you need a quick study or a large scale study.

The merge part sounds ok for smaller workflows imo. (As was done in legacy)

It just adds unnecessary complexity for the users (specially the novice ones who want to run small studies). I would aim for a default value and an override.

Hi guys,

I just wrote the following post in another topic:

but I find this discussion important and very related.

@Mathiassn I think your conclusion above is holds true when you are running low accuracy calculations (ab 2). If you were to increase the ab’s to 6, I think your results would be very different, as your simulation would be dominated by the most complex piece of the calculation, which would take way more time.

I’ve done the exact same test as you, and ended up with very different results. For my example, the sensor_count of 200 reduced the speed by 50%. See here for more info:

Unfortunately, I’m unable to reproduce those tests as the sensor_count is gone, and that leads me to my question: why is that?

1 Like

Looks like its still around, the input is just hidden i guess

I’m excited to see how the accelerations and parallelization features of E+ 10.x affect the speed!

In the latest version of the recipes, there’s a min-sensor-count that is under the hood, @Mathiassn . That’s probably what you are seeing there but it’s different than the old sensor-count input and changing it will likely just make the simulation less efficient (hence, why it is not exposed).

You should find that the defaults of the latest recipes get very close to the “optimum” scenario that you described that the top of this post @Mathiassn . The grid-splitting now aligns with the number of workers/CPUs specified. You can still adjust the number of CPUs that get used in the simulation using the _workers_ if you really need to control the usage of your machine:

… but the default is to use one less than the total number of processors available on your machine. So just accepting the defaults should give you close to the best performance that’s possible on your system.

Hi Chris this sounds great. Glad you got it optimized :slight_smile:
I still have a feeling though that running 10 smaller grids takes much longer time than merging the points together into one grid.

In my current workflow i actually merge them and divide them later post-simulation.

My tests were on heavy geometry and low radiance settings just for testing overhead. I’ll have to redo them and provide some numbers, I know ;-).