The work I did was not to validate EnergyPlus or Daysim. Both have been validated for many years. What I did was to look at what the error was when I used a context element as a shading system, instead of the predefined shading device component. The value of this is that there are limitation in using the predefined shading device if you want a free-form facade or when you want to generate a complex shading using optimization without having to generate BSDFs beforehand.
@az.sanei, in building simulation there are two ways to validate results. The first one is to measure experimental data and compare to simulation results. If you read my paper, which @minggangyin was so kind to link, you will actually realize that I did consider the thermal side of the tool too. Experimental validation is expensive (because you have to rent a test facility with many sensors) and so you will mostly find other validation work on shorter simulation periods, as I did. The other way to validate a software for longer time periods and more cases is to use the BESTEST cases (here is an article about them https://www.researchgate.net/publication/287369055_Twenty_years_on_Updating_the_iea_bestest_building_thermal_fabric_test_cases_for_ashrae_standard_140)
The procedure here is to use standardized input for several benchmark cases and compare the results of different simulation engines. This is for energy simulations. As @AbrahamYezioro mentioned, Honeybee uses EnergyPlus as an engine, so there shouldn’t be much of a difference unless you are actually not giving it the right input for the case you are checking. If this is what you are looking for, then you should start looking in that direction, or you can even do it yourself (we give this as an exercise in a master’s degree I teach, it’s not very difficult ) (edit, there is an example of it here https://en.sj.dk/media/2517/building-performance-simulation-in-arcitectural-design.pdf)