Thursday, November 22, 2012

Reduce directory nesting

Once I encountered a problem with image files that were named according to their checksum of the content chunked by 2 symbols 6 times, so it gets like 2f/d4/e1/c6/7a/2d/28fced849ee1bb76e7391b93eb12.jpg.

To have this structure with approx. 13M files, I got short in inodes. After a quick research and calculations I realized that such nesting depth is redundant and easily can be reduced. So I need to turn all the existing files of form 2f/d4/e1/c6/7a/2d/28fced849ee1bb76e7391b93eb12.jpg to 2f/d4/e1c67a2d28fced849ee1bb76e7391b93eb12.jpg.

To do that, I wrote a script that renames the files and removes empty directories.

TARGET_DIR = '/full/path/to/dir/containing/dirs/to/reduce/'
TARGET_DIR_LEN = len(TARGET_DIR) + 1
CURRENT_DEPTH = 6
DESIRED_DEPTH = 2

last_dir = ''
for dirname, _, filenames in os.walk(TARGET_DIR):
    path = dirname[TARGET_DIR_LEN:].split('/')
    if filenames and len(path) >= CURRENT_DEPTH:
        cur_dir = '/'.join(path[:DESIRED_DEPTH - CURRENT_DEPTH])
        if last_dir != cur_dir:
            last_dir = cur_dir
        for filename in filenames:
            fullpath = os.path.join('/'.join(path), filename)
            parts = fullpath.split('/')
            prefix = ''
            if len(parts[0]) > 2: # cache
                prefix = parts[0]
                parts = parts[1:]
            old = os.path.join(dirname, filename)
            new = os.path.join(TARGET_DIR, prefix,
                               '/'.join(parts[:DESIRED_DEPTH]),
                               ''.join(parts[DESIRED_DEPTH:]))
            pass
            os.rename(old, new)
        for i in range(CURRENT_DEPTH - DESIRED_DEPTH):
            d = os.path.join(TARGET_DIR,
                             dirname[0:(DESIRED_DEPTH - CURRENT_DEPTH + 1) * i]
                             if i else dirname)
            try:
                os.rmdir(d)
            except OSError:
                break