Introduction

Hello again, Everyone! Now I had this problem, when I had to copy parts of an HDF5 file. But since the file structure is quite big, I did not want to hardcode all the paths, just discover the structure of the HDF5 and then copy the parts I need using a simple “if” in the code. It wasn’t that trivial, so I thought I share the snippet, that does exactly this.

The solution

Actually, I found more than one solution to discover the structure of an HDF5 file, see this link.

What I like the most is this one.

def get_dataset_keys(f):
    keys = []
    f.visit(lambda key : keys.append(key) if isinstance(f[key], h5py.Dataset) else None)
    return keys

Simple and elegant, isn’t it? :) Now I have all the dataset paths in my HDF5 file, I only need to select the ones I’d like to copy and copy them.

The copy can be a little tricky, so I was looking for some useful snippets again. I found this link on Stackoverflow. Using the info here, I compiled the following script.

def copy_hdf5_with_constraints(file_source, file_dest):
    # First getting all the dataset keys.
    keys = get_dataset_keys(file_source)

    fs = h5py.File(file_source, 'r')
    fd = h5py.File(file_dest, 'w')

    for key in keys:
        # Get the name of the parent for the group we want to copy
        print(key)
        if [your_condition_here!]:
            name = key.split("/")[-1]
            group_path = fs[key].parent.name

            # Check that this group exists in the destination file; if it doesn't, create it
            # This will create the parents too, if they don't exist
            group_id = fd.require_group(group_path)
            print(group_path, group_id, name)
            print("-"*50)

            fs.copy(key, group_id, name=name)

    fs.close()
    fd.close()

Aaaaaand…. that’s it! :)